Wednesday, March 17, 2010

The magic twenty percent

The current project is maintenance on a library that reads and writes spreadsheet files (Lotus 1-2-3 and Microsoft Excel). The term 'maintenance' is a bit misleading, as the project is really a re-write of the existing code.

But here's the interesting thing: our new code is much smaller than the previous version. Much, much smaller. Not ten percent or twenty percent smaller, but eighty percent smaller. The original code was 98,000 lines (roughly). The new code is about 20,000 lines. the new code does everything that the old code does.

So the question is: how can I write code that is so small, so compact, so efficient? Am I that good of a programmer?

This is not the first time that I have replaced code with more efficient code. A previous project saw me re-writing a project and reducing the code from almost 30,000 lines to about 3000 lines. On that project, I replaced Java code with C#. Since Java and C# are at the same level (just about), the reduction could not have been due to the change in language. I think the benefit came from the change to the data format. The original system read XML files and processed them as scripts, using a lot of auxiliary Java classes and a limited scripting language. The replacement system (in C#/.NET) used plain text scripts and required no auxiliary classes. This was due to the capabilities of the scripting language, not the capabilities of C# or limitations of Java. Changing the data (the scripts) to a simpler and more expressive form made the coding much easier.

The current assignment has a different twist. In this assignment, the data is not changing (it's all spreadsheet files for Lotus 1-2-3 or Excel) but the language is helping. The previous program used "straight" C++; the new program uses C++ with STL. Changing the language made the coding much easier.

There are some other factors that contributed. On both assignments, I was the sole programmer. As the only person coding, I was able to make decisions quickly and without debate.

But the real factor here was the automated testing. Both projects had automated tests in place. With them, I was able to make sweeping changes as I needed. Most programmers want to make these, but refrain because of the risk. Managers and code architects shy away from major changes, because major changes have large risks.

Yet with automated tests, the risks were small. I could make large changes (I made some large changes today) and then run tests to ensure that the system was working as expected. My changes were not correct, initially, and I made corrections. With automated tests I can make grand, sweeping changes that improve the code... and know that the changes are correct.

So with two data points (not a lot, I admit) I am looking at large reductions in code due to automated testing. I couple that with the conjecture that reduced code size leads to reduced development costs (fewer deveopers, fewer defects) and better software.


Wednesday, March 10, 2010

Modest success (again) with XLS

Last week I posted about modest success in the task of creating XLS files. At that time, we were able to take an XLS file, convert it to text, and then re-convert it to XLS format. The text format was very detailed, with specific information about each record in the starting XLS file.

Today I had more success. After a week of coding and testing, I took a plain text file, one with a minimal amount of information, and used it to generate an XLS file. With this program in our toolbox, we can start on the major task of updating several of our key applications.

The job is not complete. There is one item remaining: a fix-up for some records that point to later records in the file. Right now I handle this by adjusting the generated file. (A cumbersome process of exporting the XLS file to detail-heavy text, adjusting the specific records, and then re-importing the detail-heavy text file back into XLS format.) It's an interesting problem, but quite solvable. My experience with assembly language programming is helpful for this problem; the task is identical to the work performed by a linking loader.


Sunday, March 7, 2010

Getting away from the office

I've been posting about my trials (and successes) at the office. I have to remind myself that this blog is not solely about accomplishments at the office, but about my progress as a professional. Time for a step back, and another look at myself.

I've been working with the local ACM chapter to set up a 'career day' for students at a local school. This has been an interesting task, as it is not technical but managerial. (Collecting speakers, arranging for a meeting space, etc.) It's a nice break from the day-to-day office work, which is technical.

I'm reading Roszak's The Cult of Information, an interesting book from the mid-1980s. The author makes some good points about our enthusiasm with computers and our eagerness to teach programming to everyone. Data is not information, which is not knowledge, which in turn is not wisdom. I'm enjoying it.

This week-end I spent a little time playing with Ruby. I wrote some simple scripts to compare documents and create a commonality metric. I took the algorithm from O'Reilly's PDF Hacks book. The book had a routine in bash; writing it in Ruby was easy. A simple project, requiring little time, but a fun diversion.


Wednesday, March 3, 2010

Tests to the rescue!

I learned more about Microsoft's Excel files today... and what I learned lead to changes in a library and several programs. The change had to do with text strings and Excel's methods of encoding. (It's complicated, and I won't go into it here.)

I had made some assumptions about text strings, and it turns out that they were wrong. I corrected them today.

The corrections had some far-reaching effects. On a typical project, I would have to make a small set of changes and compromise the design of the software. But on my current project, I was able to make the proper changes (or at least "proper" as I think of "proper") and change multiple components.

These changes were possible with back=up from our automated tests. As I made changes, I ran the tests and verified that my changes were affecting the right areas -- and nothing else.

Yay! for automated tests!


Tuesday, March 2, 2010

Modest success with XLS

Today, shortly after 14h (that's 2:00 PM in civilian notation) I had a modest success with spreadsheet utilities.

I've been working with spreadsheet files since October. With hard work, creative programming, documentation, and a few guesses, I have successfully put together programs to read Lotus 1-2-3 files (WK1, WK3, WK4, and 123 formats) and Microsoft Excel files (XLS and XLSX formats). The programs read the files directly, yank out the contained values, and write them to text.

I've also been working on programs to do the reverse: Start with text and create the spreadsheet file. This is a bit trickier, since spreadsheet files contain lots and lots of information, and when reading the file one can ignore most of it. The task of reading the file is a matter of finding the important bits, converting them to plain text, and skipping everything else.

Creating a spreadsheet file requires generating all of the information for the file. You can't leave out records because you don't understand them. The file must contain enough information to be acceptable to the target program. To create an acceptable file, you have to understand a lot more about the file that when converting it to plain text.

In the past few weeks, I've successfully written Lotus 1-2-3 WK1 and WK3 files. Now I am working on Excel XLS files. (I'm not writing Lotus WK4 or 123 files because we don't have enough information about the file formats.)

Today I had a minor success. This afternoon I ran a series of programs that started with a text file and produced an .XLS file. I ran MS Excel and opened the .XLS file, and it worked! MS-Excel displayed the file with the proper values.

I say "minor success" because the starting file (the text file) was very carefully crafted to result in a proper XLS file. In fact, our "source file" was derived from an XLS file created by MS-Excel. We took an existing file, converted it to text, and then converted it back to XLS. (The process was a little more complex than that, but the sentiment is correct.) One might say that this was cheating, or perhaps a simple matter of conversions between encoding.

Yet the work is not without value. We now have a process to take a text file and create an XLS file. With this process, we can experiment and learn more about the XLS format. The Microsoft documentation is detailed, but it is vague on several topics. With some experiments, we can get a reliable XLS file creator. This process lets us create files at a high level (higher than the bits-and-bytes of the Microsoft document) and quickly test our theories.

My success in this was due, as usual, to our test framework. I created programs, but before that, I created tests and expected results. Once I had those in place, it was easy to create a program that read the input and provided the output. (Okay, perhaps not "easy"... but at least I knew when changes got me closer to the goal.) Our framework is simple: a few scripts (batch files), some test data, and expected results. Simple, yet adequate.

So today was a pretty good day.



Sunday, February 28, 2010

The end of Windows 7 RC

Windows 7 gave me a message today: "This version of Windows expires in 1 days."

I'm using the release candidate of Windows 7, the free version that Microsoft distributed last year. I knew that it had an expiration date built into it; I thought it expired in June. Apparently, my memory is incorrect.

I won't say that I will miss Windows 7. It had a nice, shiny interface but it did little else above the old Windows XP.

This computer was the only computer with Windows. I have a MacBook and several PCs running Linux.

The question now is: Do I acquire a real copy of Windows 7? //syssiphus (the Windows 7 PC) came without an operating system, so the release candidate was a good match at the time.

The only thing that Windows 7 gives me is access to servers in the office. I have successfully set up the proper access accounts and tested access from //syssiphus. I tried access from Linux computers, but access is not complete. I can get so far but not to the virtual desktop.

I'm not sure that I want an operating system that uses the plural form when the singular form is required. The expiration warning message read " 1 days", not "1 day". Displaying the proper form is not hard, but does take some effort. Another shortcut by Microsoft, I suppose.


Tuesday, February 23, 2010

The possibility of web services

Yesterday, I spoke with other members of the team about the future of out applications. The agreement is that we want to move away from C++ and towards Java. We need to run on multiple platforms (Windows and Linux) and Java and C#/.NET/Mono are the two possibilities -- except that Mono is not allowed on our servers.

We also agree that web services are a good way to break the current application into smaller pieces. Instead of one PC-based app that reads a spreadsheet, performs calculations, and generates another spreadsheet, we can have three web services: read spreadsheet, perform calculations, create spreadsheet.

In other news, today I completed enough logic to convert a plain text file into a Lotus 1-2-3 WK3 file. (And have Lotus 1-2-3 read said file.) This is a big accomplishment, and helps us on our way to a redesigned system. Woo-hoo!