Sunday, March 28, 2010

My PC is not compatible

A long time ago, at the dawn of the PC era (1981), the fate of PC producers depended on compatibility with the IBM PC. While many companies made computers, Compaq was the first company to produce a "PC compatible", the original Campaq PC. (It was a portable PC, in the sense that it had a handle and could be moved from one location to another. It was not a laptop in any sense of the word.)

Compatibility with the original IBM PC (the model 5150) was important. Many folks wrote software that was specific to the IBM PC, and that software would not work on similar computers such as the DEC PRO-325 or the Zenith Z-100. Hardware designed for the IBM PC would not work in other brands of PCs, either.

Yet a funny thing has happened in the past twenty-nine years: The standard definition of a PC has changed.

The original IBM PC had an Intel 8088 CPU, 64K of memory (in 4146 DIP chips), ports for keyboard and cassette tape, and slots that could hold adapter cards. IBM provided adapters for monochrome video, color-graphics video, and floppy discs. The floppy disks were the 5.25 inch, 360K "true" floppy disks. (Or "mini-floppy" disks, as the 8 inch floppy discs from the IBM 3740 were never supported by IBM PCs -- of any flavor.)

The changes have been gradual. The first change occurred in the IBM PC XT (model 5160), which allowed for a hard drive. The IBM PC supported cassette tapes for storage, and floppy disks if you purchased the optional adapter card; the IBM PC XT dropped support for cassette tape and had a BIOS that supported floppy controller cards and hard disk controller cards. The hard disc was a then-whopping 10 MB.

The IBM PC AT which supported the 1.2MB high-density floppy discs and had a different keyboard. It used different memory chips and had a larger buss, although it accepted cards from the IBM PC and IBM PC XT. The IBM PC AT had a 20 MB hard drive. IBM introduced the Enhanced Graphics Adapter around the same time.

In 1987, IBM introduced the PS/2 (and later, with Microsoft, OS/2). With the PS/2 IBM changed the keyboard layout (again) and also the keyboard connector. IBM added the mouse port, and introduced the Virtual Graphics Adapter (VGA) which supported the (now puny) resolution of 640x480. IBM also changed the floppy disk from 5.25 inch to 3.5 inch with the hard plastic shell, and changed the buss to the Micro Channel Architecture buss.

The PS/2 was received poorly in the market, especially by other manufacturers. They banded together briefly and formed a standards group that produced "EISA", a buss that accepted old PC cards but also accepted new, high performance cards. The solution was a kludge and died a quick death. Yet the damage to IBM had been done, and the definition of a standard PC was no longer IBM's to make.

A few years later, Microsoft discontinued the agreement to develop OS/2, introduced Windows. Microsoft gained enough popularity and power to define the hardware platform. IBM lost its leadership role in the market, and eventually sold its PC business to Lenovo in China.

Looking at a PC today, one sees a modern machine that is nothing like the original IBM PC. Nothing from the original IBM PC works in a "modern" PC -- at least none of the hardware. The keyboard is different, the display monitor is different, the floppy discs (if present, most current PCs don't have them) are different. Adapter cards from the original IBM PC will not work in today's PCI buss. Hard drives from the IBM PC XT or PC AT will not work with today's SATA ports. Memory chips will not fit.

Nothing --again, in terms of hardware -- is compatible.

Software, on the other hand, tends to work. You can buy an IBM PC clone today and run DOS on it. (Well, a later version of DOS, one that understands CD drives. You can't boot DOS 1.0 from a 5.25 inch floppy disc.) I suspect that the original Lotus 1-2-3 program would run, if you could get past the copy-protection built into the software. (The copy protection depended on various quirks of floppy discs, which don't exist in today's PCs.)

The entire set of hardware has changed, yet the software continues. Software has a longer life that hardware. (I'm not the first to recognize this phenomenon.)

But if software is so long-lived, how do we manage it? How should we design something that will outlive the hardware, and probably the initial designers? How do we ensure that it will continue to work on successive generations of hardware?


Wednesday, March 17, 2010

The magic twenty percent

The current project is maintenance on a library that reads and writes spreadsheet files (Lotus 1-2-3 and Microsoft Excel). The term 'maintenance' is a bit misleading, as the project is really a re-write of the existing code.

But here's the interesting thing: our new code is much smaller than the previous version. Much, much smaller. Not ten percent or twenty percent smaller, but eighty percent smaller. The original code was 98,000 lines (roughly). The new code is about 20,000 lines. the new code does everything that the old code does.

So the question is: how can I write code that is so small, so compact, so efficient? Am I that good of a programmer?

This is not the first time that I have replaced code with more efficient code. A previous project saw me re-writing a project and reducing the code from almost 30,000 lines to about 3000 lines. On that project, I replaced Java code with C#. Since Java and C# are at the same level (just about), the reduction could not have been due to the change in language. I think the benefit came from the change to the data format. The original system read XML files and processed them as scripts, using a lot of auxiliary Java classes and a limited scripting language. The replacement system (in C#/.NET) used plain text scripts and required no auxiliary classes. This was due to the capabilities of the scripting language, not the capabilities of C# or limitations of Java. Changing the data (the scripts) to a simpler and more expressive form made the coding much easier.

The current assignment has a different twist. In this assignment, the data is not changing (it's all spreadsheet files for Lotus 1-2-3 or Excel) but the language is helping. The previous program used "straight" C++; the new program uses C++ with STL. Changing the language made the coding much easier.

There are some other factors that contributed. On both assignments, I was the sole programmer. As the only person coding, I was able to make decisions quickly and without debate.

But the real factor here was the automated testing. Both projects had automated tests in place. With them, I was able to make sweeping changes as I needed. Most programmers want to make these, but refrain because of the risk. Managers and code architects shy away from major changes, because major changes have large risks.

Yet with automated tests, the risks were small. I could make large changes (I made some large changes today) and then run tests to ensure that the system was working as expected. My changes were not correct, initially, and I made corrections. With automated tests I can make grand, sweeping changes that improve the code... and know that the changes are correct.

So with two data points (not a lot, I admit) I am looking at large reductions in code due to automated testing. I couple that with the conjecture that reduced code size leads to reduced development costs (fewer deveopers, fewer defects) and better software.


Wednesday, March 10, 2010

Modest success (again) with XLS

Last week I posted about modest success in the task of creating XLS files. At that time, we were able to take an XLS file, convert it to text, and then re-convert it to XLS format. The text format was very detailed, with specific information about each record in the starting XLS file.

Today I had more success. After a week of coding and testing, I took a plain text file, one with a minimal amount of information, and used it to generate an XLS file. With this program in our toolbox, we can start on the major task of updating several of our key applications.

The job is not complete. There is one item remaining: a fix-up for some records that point to later records in the file. Right now I handle this by adjusting the generated file. (A cumbersome process of exporting the XLS file to detail-heavy text, adjusting the specific records, and then re-importing the detail-heavy text file back into XLS format.) It's an interesting problem, but quite solvable. My experience with assembly language programming is helpful for this problem; the task is identical to the work performed by a linking loader.


Sunday, March 7, 2010

Getting away from the office

I've been posting about my trials (and successes) at the office. I have to remind myself that this blog is not solely about accomplishments at the office, but about my progress as a professional. Time for a step back, and another look at myself.

I've been working with the local ACM chapter to set up a 'career day' for students at a local school. This has been an interesting task, as it is not technical but managerial. (Collecting speakers, arranging for a meeting space, etc.) It's a nice break from the day-to-day office work, which is technical.

I'm reading Roszak's The Cult of Information, an interesting book from the mid-1980s. The author makes some good points about our enthusiasm with computers and our eagerness to teach programming to everyone. Data is not information, which is not knowledge, which in turn is not wisdom. I'm enjoying it.

This week-end I spent a little time playing with Ruby. I wrote some simple scripts to compare documents and create a commonality metric. I took the algorithm from O'Reilly's PDF Hacks book. The book had a routine in bash; writing it in Ruby was easy. A simple project, requiring little time, but a fun diversion.


Wednesday, March 3, 2010

Tests to the rescue!

I learned more about Microsoft's Excel files today... and what I learned lead to changes in a library and several programs. The change had to do with text strings and Excel's methods of encoding. (It's complicated, and I won't go into it here.)

I had made some assumptions about text strings, and it turns out that they were wrong. I corrected them today.

The corrections had some far-reaching effects. On a typical project, I would have to make a small set of changes and compromise the design of the software. But on my current project, I was able to make the proper changes (or at least "proper" as I think of "proper") and change multiple components.

These changes were possible with back=up from our automated tests. As I made changes, I ran the tests and verified that my changes were affecting the right areas -- and nothing else.

Yay! for automated tests!


Tuesday, March 2, 2010

Modest success with XLS

Today, shortly after 14h (that's 2:00 PM in civilian notation) I had a modest success with spreadsheet utilities.

I've been working with spreadsheet files since October. With hard work, creative programming, documentation, and a few guesses, I have successfully put together programs to read Lotus 1-2-3 files (WK1, WK3, WK4, and 123 formats) and Microsoft Excel files (XLS and XLSX formats). The programs read the files directly, yank out the contained values, and write them to text.

I've also been working on programs to do the reverse: Start with text and create the spreadsheet file. This is a bit trickier, since spreadsheet files contain lots and lots of information, and when reading the file one can ignore most of it. The task of reading the file is a matter of finding the important bits, converting them to plain text, and skipping everything else.

Creating a spreadsheet file requires generating all of the information for the file. You can't leave out records because you don't understand them. The file must contain enough information to be acceptable to the target program. To create an acceptable file, you have to understand a lot more about the file that when converting it to plain text.

In the past few weeks, I've successfully written Lotus 1-2-3 WK1 and WK3 files. Now I am working on Excel XLS files. (I'm not writing Lotus WK4 or 123 files because we don't have enough information about the file formats.)

Today I had a minor success. This afternoon I ran a series of programs that started with a text file and produced an .XLS file. I ran MS Excel and opened the .XLS file, and it worked! MS-Excel displayed the file with the proper values.

I say "minor success" because the starting file (the text file) was very carefully crafted to result in a proper XLS file. In fact, our "source file" was derived from an XLS file created by MS-Excel. We took an existing file, converted it to text, and then converted it back to XLS. (The process was a little more complex than that, but the sentiment is correct.) One might say that this was cheating, or perhaps a simple matter of conversions between encoding.

Yet the work is not without value. We now have a process to take a text file and create an XLS file. With this process, we can experiment and learn more about the XLS format. The Microsoft documentation is detailed, but it is vague on several topics. With some experiments, we can get a reliable XLS file creator. This process lets us create files at a high level (higher than the bits-and-bytes of the Microsoft document) and quickly test our theories.

My success in this was due, as usual, to our test framework. I created programs, but before that, I created tests and expected results. Once I had those in place, it was easy to create a program that read the input and provided the output. (Okay, perhaps not "easy"... but at least I knew when changes got me closer to the goal.) Our framework is simple: a few scripts (batch files), some test data, and expected results. Simple, yet adequate.

So today was a pretty good day.