Fitzpatrick's Leap: May 2010

Monday, May 31, 2010

A break from the routine

I took a break from the routine this week-end. I attended a science fiction convention.

The phrase "science fiction convention" often brings up images of nerdy people walking around in costumes from television shows. The picture is accurate, but by no means complete. Yes, there were people walking around in costumes. (There was a costume contest, with prizes for the most elaborate and most popular costumes.) There were speakers presenting on various topics, from science fiction to writing to hard science. There was a vendor room with people selling books, t-shirts, jewelry, and games. There were large events including awards presentations, author readings, and interviews.

Science fiction "cons" are a gathering of people interested in science fiction. Therefore, they are gatherings of people who are well-read, intelligent, and interested in many things. At a science fiction convention, you can find people who are knowledgeable in just about any topic, from Roman engineering to rainforest biology to socio-economic trends in ancient India. (If you find two people interested in your topic, they will probably disagree. But disagree politely and with facts and arguments to support their position.)

I myself met intelligent and interesting people. As I was walking from one session to another, I chatted first with the person running the Library of Congress digital archive project, and then discussed curriculum directions with a professor of computer science at a local college.

This was a week-end well spent. I learned a lot, and contributed to discussions (I hope) in meaningful ways.

Saturday, May 29, 2010

About an hour

Are compile times for projects constant? We like to think that a faster compute will give us better performance, including faster build times. (Those of us with the title "Buildmaster" are very conscious of performance and we constantly tune the build environment.) Yet the evidence does not support the claim.

Many moons ago, I worked on a project that was built in C (yes, C and not C++) and ran under MS-DOS. The source was about 65,000 lines of code (I know because I measured it) and it took about an hour to build (I know because I ran the build). This was in 1992, and we used PCs with Intel 80386 processors running at 25 MHz. Keep that figure of "an hour" in mind.

A few moons ago (not so many as "many moons ago), I worked on a project that was built in C++ and ran under Windows. The source was 750,000 lines of code and the build time was about an hour. This was in 2007; our PCs had Intel Pentium whatever processors running at 500 MHz.

Currently I am working on a project that builds in C++ and runs under Windows. The source is about 200,000 lines of code and build time is ... about an hour. Our PCs are Intel Pentium whatevers and running at 850 MHz. The performance "drop" from the previous project to this project may be due to two things: our use of STL and our build environment. STL and the use of templates places an additional load on the compiler, essentially expanding templates and creating code that is not visible to the programmers. Our environment uses servers for storage, not local disks, so reads and writes must travel "across the wire".

So across a span of almost twenty years, the hardware has improved but build times remain at a fixed time. How can this be?

Here are my thoughts:

Our systems are larger Code size has increased. Larger code takes longer to compile. (And to write, and to debug, and to maintain... but that is another discussion.)

Our languages are more complex The simple code of twenty years ago (mostly C) has been replaced with the more complex languages of C++, Java, and C#. More complex languages require more complex compilers.

Our environments are more complex The (relatively) simple build environment of a PC running MS-DOS 5.0 and some NetWare drivers has morphed into a the complex workstation environment of a PC running Windows with its multitude of services and processes, all competing for processing time. In addition, the authorization model has changed, with current systems talking to the central controller for confirmation of access to each and every file operation.

One interesting thing is the hour time frame. There may be psychology involved here. Perhaps we accept an hour as a reasonable amount of time to build something of perceived value. We (as a group) will let build times increase, until they reach the threshold of an hour. At that point, we look to improve the build process.

In the future, I expect the trends to continue. I expect our platforms to become more complex. I expect virtualization and the cloud to play a role in builds. (A few companies have had cloud-like build technologies for years.) People have been developed new source-control systems that are distributed and not centralized. We have new languages such as C# and Ruby. The Agile Development proponents have pushed for smaller systems and automated tests. ll of these will affect the time it takes to build the system.

And I also expect build times to remain at... about an hour.

Friday, May 28, 2010

Minor success with Microsoft Excel XLSX files

Today, after much work, I had some success with Microsoft Excel and its XLSX format. For the first time, I successfully created an OOXML XLSX file that was acceptable to MS Excel.

I had tried this in September of last year. At the time, I had a file that could be read by OpenOffice but not by Microsoft Office.

Today I have a program that takes a plain text file, performs various manipulations, and produces a file that is usable by Microsoft Excel.

With this accomplishment, I now have libraries that can read and write spreadsheets for Lotus 1-2-3 and Excel. (OpenOffice is a possibility, but the client doesn't need that format. At least not yet.)

The work for OOXML was a different than the work for XLS format files. For the latter, I had documentation from Microsoft and a set of source code that produced the files. The work was mostly one of research and reading. For the XLSX files, the work was mostly reverse-engineering the OOXML files. Both approaches were helped by our test framework.

Saturday, May 22, 2010

OSCON 2010 - registration

I signed up for OSCON 2010 today. I've been to OSCONs in the past and find them very informative. The con is all about open source and might be the premiere meeting for open source information. They have competent speakers and interesting topics.

The con is in Portland this year. Last year it was in San Jose, and I think a number of people complained. (It had been in Portland for a number of years.) I was one of the complainers. I complained about the food, which I find more appealing in Portland.

I also booked hotel rooms and air travel. I tacked on a side trip to Los Angeles to visit my friend KK. (Since I'm funding the trip, I can take all the side trips that I want.) I got some good deals for the air fare (Southwest) and some ok deals for the hotel rooms. (In previous years the hotels have been cheaper. But I should not grumble.)

The internet made the task possible. At least, possible without the assistance of a travel agent. The arrangements took longer than I wanted, but not too long. (About an hour, altogether. About the same amount of time I would spend telling a travel agent my preferences.)

Friday, May 14, 2010

Microsoft Excel is not quite as accepting

One of the rules of thumb of good programming is: be liberal in what you accept (in terms of input data) and be conservative in what you generate. In other words, don't be fussy with the data coming in because the poor folks who are preparing the data may not understand all of your validation rules. However, be very particular when creating output, because the poor folks consuming your output may not understand all of the possible combinations of your data.

Microsoft didn't follow this rule with their OOXML parsers in MS-Excel. At least, that how it appears to me.

The OOXML formats are advertised as "open", meaning that they specifications are available and people are free to use the files outside of MS-Excel. The OOXML specification is not for a the faint of heart, though. With determination and stamina, one learns that the MS-Excel .xlsx file is a ZIP file with several contained XML files.

Reading an .xlsx file is fairly straightforward. You unzip the container file and then pick through the unpacked XML files with whatever parser you like.

Creating an .xlsx file is a bit trickier. I ran into problems last year, when I was using Ruby to create such a file. I used an MS-Excel-generated file as a model, and created the necessary XML files with the content and cross-references within files. I then zipped the files together and handed the resulting file to MS-Excel. Excel wanted nothing to do with the file. (Yet Open Office would happily open the file and render the content properly.)

Even the simple act of unzipping an .xlex file and then re-zipping it yields a file unacceptable to MS-Excel.

Eventually I was able to create an .xlsx file, by creating an empty ZIP file and adding files to it, one at a time.

The problem is not the contents of the XML files, but the contents of the ZIP file. ZIP files contain not only the (compressed) files but also directory entries. The popular WinZIP program, when zipping up a tree of files, records the files and the directory names of the files, and the directory names get their own entry in the ZIP file.

MS-Excel is unprepared to handle directory entries in the .xlsx file. Creating the ZIP file "by hand" (and not with WinZIP) yields a file without directory entries.

The interesting part in all of this is the behavior of Open Office. It handled the "incorrect" file correctly. Open Office is more accepting of its input than Microsoft Excel. If I wanted to, I could create an .xlex file that was readable by Open Office and not readable by Microsoft Office.

Thursday, May 13, 2010

A lesson from open source

Yesterday I attended the Columbia Area Linux User Group meeting. The speaker was MacKenzie Morgan, a contributor to the Ubuntu project. (Specifically, her title is "MOTU", for "Master of the Universe".)

The Ubuntu prject is run much like a professional development project. With almost 150 contributors, they use tools and processes to coordinate the effort of people. For tools, they use IRC, version control, build scripts, and wiki pages. For techniques, they have "freeze points" for several decisions including tool sets, packages, text strings, UI components, and code. These notions parallel the typical commercial effort, although with perhaps more use of automation between tools and fewer meetings.

Two bug-fixing techniques are worth mentioning. First, bugs are fixed by a team but not assigned to specific individuals. Developers work on the bugs that they think that they can fix. (In the commercial world, bugs are assigned to individuals based on a manager's assessment of individual capabilities.) The Ubuntu approach seems to have less stress.

The second notion involves the code fix for bugs. In the Ubuntu project, defects are fixed in the project under development and then back-ported to prior projects, including the "current release". In commercial projects, the "current release" is fixed first and then the fix is migrated to the version in progress. (Usually. Sometimes the fix is not migrated into the next release.) The Ubuntu approach ensures that the next release has all of the fixes.

Commercial projects could learn a lot from the Ubuntu project, especially those commercial projects that are outsourcing large portions of their coding. Most of the Ubuntu project is aggregating packages for deployment and ensuring that the packages work together. With outsourcing, a commercial project is in a similar role, aggregating components and ensuring that they work together.

Oh, there is one other difference with the Ubuntu project: The work is done (mostly) by volunteers. This also changes the dynamics of the project. (You can't force someone to meet a schedule by threatening to fire them, for example.) Yet Ubuntu delivers releases twice a year, on time and with high quality.

Monday, May 10, 2010

Let OSX handle it

I spent a few days away from the office, visiting parents for birthdays and Mothers' Day holidays. Which means that I got to work with their computers and not mine for a while.

My parents have a collection of computers. Some are Macs running OSX, some are PCs running Linux, and one is a PC running Windows XP. Our goal was to share printers. Easier said than done, as the printer configurations eluded us. One printer on an Apple Airport Express would work from a Mac with "Leopard" but not on the Mac with "Snow Leopard". Three printers on the Linux PC would work from Linux but not from any of the Macs. (We didn't try anything from Windows.)

We did get things working. It seems that we were trying too hard. We were manually configuring printers on the Macs (and must have picked a wrong option somewhere). We removed our printer definitions, executed the line 'cupsctl BrowserPreference="cups dnssd"' and the Macs then found the published printers and picked the right drivers automatically. No muss, no fuss!

Sometimes its better to let the operating system handle things.

Tuesday, May 4, 2010

Perl made me a better C++ programmer

I continue to work in C++. The assignment these past few weeks has been made easier by my experience with Perl, specifically the lists and hashes that Perl offers. Now that I can use the STL collections in C++, my experience with Perl collections pays off. The challenge at the office was solved quite readily with a few STL collections. In the "bad old days", I would have created classes in C++. With the STL, I can use the vectors, deques, and maps with little effort.

I credit Perl for two reasons: it has these classes, and it takes effort to create your own classes -- more effort than C++ (at least for me). When working with Perl, I was pretty much forced to learn the collection classes. They gave me a different perspective on programming and the organization of data. Those perspectives carry over to the C++/STL world.

Sunday, May 2, 2010

The uphill climb to Kubuntu 10.04

I spent a lot of this weekend attempting to update Kubuntu 9.10 to 10.04. I say "attempting" because after several tries, it still refuses to work.

Try number one ended abruptly, with the upgrade telling me that I needed more free space on / and then not-quite restoring Kubuntu 9.10. The broken system would boot but not let me sign in -- X complained that it could not write to /tmp.

I re-installed Kubuntu 9.10 (a two-hour process by itself) and then re-attempted the update. This time the update told me I needed more free space on /usr. (OK, this failure was my fault. When I installed 9.10 I set up the partitions incorrectly.)

So I re-re-installed Kubuntu 9.10 and then attempted the update. This try worked better, in that it allowed the update, downloaded all of the files, and ran all of the installs. But in the end, Kubuntu 10.04 is broken. It boots and lets me sign on. It then gives me a wallpaper (and a pretty wallpaper it is) and nothing else. No menu, no icons, no desktop folder. Nada. The only way to log out is with the power switch.

It looks like I have two choices here: install Kubuntu 9.10 and never upgrade, or get a Kubuntu 10.04 install CD and install that (without using the upgrade).

Or perhaps I should consider another distro. Debian comes to mind, as this machine is really a server and does not need half the applications that Kubuntu installs.