Wednesday, April 27, 2016

Data tools pay for themselves over time

Several years ago I worked on a project that required we pick apart Excel .XLS files at the byte level. We didn't use Microsoft's COM components to read the file. We didn't use a third-party library. We read the file (in C++ and in binary mode) and did exactly what Microsoft recommends we do not do: we parsed the file ourselves.

As part of this effort, I built a small utility to "dump" the contents of an Excel file. The common programs extract cell values, but I wanted something more detailed. I wanted a list of the various BIFF records in the file. (Microsoft Excel .XLS files consist of a series of binary records, each record describing some aspect of the contents. Each record contains a type field, a length field, and a set of fields that depends on the type of record, and some fields have a variable size. It's a compact and powerful format.)

The utility program (called "DumpXls", cleverly enough) was useful to identify the different records needed to construct a proper .XLS file. It took some time to create, time that turned out to be an investment.

It was a useful investment, because this past week I had a problem with the Python 'xlwt' library. (The 'xlwt' library lets Python create an .XLS file.) My old "DumpXls" utility helped me diagnose the problem and find a solution... and quickly. The time spent creating that program years ago was more than made up by the time saved in finding this week's problem.

The "dump" program was simple, yet it required a good understanding of the file format. When working with a new file format, it helps one understand the format to build such a "dump" program. The program is useful, as is the knowledge of the format.

Sunday, March 13, 2016

Success with Ruby and BASIC

A while back, I started a project to learn Ruby. The project was to implement a version of BASIC, using that as a means to learn the Ruby language.

I am happy with my progress. I've learned a great deal about Ruby -- and learned more about programming. Ruby has a different "take" on programming than C++, Java, and C#, and those differences forced me to re-think a lot of ideas about programming. The design patterns for Ruby are different from the patterns of the "classical" object-oriented languages. I found that much of my code followed the classic patterns, and it worked poorly. Only after I changed my thinking about programming and my designs in Ruby did my code "work".

Rubocop helped too. It complained about many things, from spacing around operators (or lack thereof) to the number of lines and complexity of functions. At first I thought that some long functions would have to remain long -- in C++ or Java I would leave them as long functions. But with Rubocop's prodding, I redesigned them and made them smaller, many times breaking them into smaller functions.

i started using Rubocop after a significant amount of code had been written, and Rubocop reported somewhere north of 80 "violations". Over time, with thought and experimentation, I have reduced that number to one. The changes have, I believe, improved the code and made it more reliable and more readable.

My success has been due to the design of the Ruby language, the Rubocop tool, the Ruby documentation pages, and StackOverflow. These made it easy to develop in Ruby. Yet there were two other things: an open mind and time. I needed to change my thinking about programming, and accept the Ruby way of code. I also needed time. Time to think about the code, time to try things, and time to revise my initial code into something better.



Monday, January 18, 2016

Impressed with Python

I've been impressed with Python of late.

I'm using Python on a project for a client. The work involves computations with data in Microsoft Excel spreadsheets. Processing involves, at a basic level, the extraction of data from a spreadsheet, some calculations, and then the generation of a result spreadsheet (different from the first spreadsheet).

Python has been surprisingly useful here. The libraries 'xlrd' and 'xlwt' handle the reading and writing of data in spreadsheets, allowing me to focus on the computations.

Python has helped in other ways. It runs a lot of our tests, and summarizes the test results. It also drives the tests of an old MFC Windows application.

This last item is important. We had no way to test the GUI for this program -- other than manual testing. The 'PyWinAuto' package lets us use Python to "drive" the GUI and run tests.

I am impressed with PyWinAuto. Perhaps more impressed than others; some years ago I created a library (in C++, not Python) to drive MFC GUI programs. The task was not trivial and my attempt was clunky and inadequate. Yet it was that foray into Windows API calls and Windows controls that gives me the appreciation for the effort of PyWinAuto.

Over the years, I have worked with many languages: BASIC, Pascal, Fortran, C, C++, Java, Perl, C#, and even a little COBOL. My experiences with each language vary. Some were fun, others were frustrating. Python is in the former category, as I have been able to do just about everything I wanted, and with little effort.