Sunday, May 19, 2019

What I learned this week - National Geographic CD ROM set

National Geographic issued a CD ROM set in the late 1990s. It was an impressive collection, a box set of 30 discs with all of the National Geographic magazines issued to date.

The magazines offer a glimpse of the past, from the late 1800s to the late 1990s. The software is also a slice of the past, with special options to install internet link software.

It ran on Windows 3.1 and Windows 95, and also on Macintosh systems. For those operating systems, it is clearly 16-bit software.

Windows 10 does not support 16-bit software. Windows has not supported 16-bit software for quite some time.

But that doesn't mean that this CD collection is unusable. Only the software that comes with the CD set is unusable. (I have no desire to install Windows 95 and use the old software.) The data on the CDs is still usable.

Examining the CDs shows that each CD includes a copy of the SETUP.EXE program (which we'll ignore) and a nunch of DLLs (which we'll also ignore) and a few directories. One of those directories is named 'images' and it contains usable data.

Each disk has a directory named 'images'. In those directories is a collection of oddly-named directories. The names are all three-digit numbers and a single letter, starting with '188a' and working upwards. Each disk has a different set of directory names.

Those oddly named subdirectories contain .JPG files. Each file is a scan of a single page of an issue of the National Geographic magazine. Thus, we have the images of the pages, and we can still display them. (The .JPG format has remained constant, or at least compatible, over time.)

Examination of the directories show that each directory contains a single issue of a magazine. Thus the odd name for the directory is really a year and a month, encoded as a three-digit number and a single letter. The first year, 1880, is encoded as 188; the first month, January, encoded as 'a'. The directory 188a is for January 1880.

The pattern is easy to decode. The directory 189a is January of 1881, and so on. Not every year and month combination exists -- the National Geographic Society did not publish monthly at the beginning.

So now the task becomes a simple matter of programming. Can we write a program to ask for a year and month (or better, present a list of year-month combinations and let the user select one), then display a window and let a user scroll through a series of JPG images? That seems quite possible.

Notice that we are not using the old 16-bit software from the CD ROMs, just the data. We're not reverse-engineering the code, or cracking encrypted data. We are simple reading data files.

I'm looking forward to this project.

Sunday, November 19, 2017

What I learned this week: blockchain and golang flag and WCF headers

Things I learned this week:

I learned how to set headers on a response in Microsoft's WCF. This helps by letting requests that result in web pages appear in browsers, instead of being downloaded as files. Most requests return data, so only a few request types need headers set.

I attended the PSU "distinguished speakers" lecture on BitCoin and blockchain. I have a much better understanding of BitCoin now. Some folks are concerned that BitCoin will have deflationary aspect as there are a finite number. I'm not worried. If BitCoin were the only currency, the concern is valid. We can create other crypto-based currencies (we already have) so the economy is bigger than any one currency.

My progress with the Go programming language continues. This week I added the "flag" package to my virtual processor, to add a command-line option for tracing execution. I was a little disappointed that "flag" doesn't handle non-options (like Ruby's argparse does) but maybe I just need to learn "the Go way" of doing things.

Sunday, November 5, 2017

This week I learned about Go and Python and IDLE

This week I learned about the Go language ('golang' for search engines) as I worked on my assembler and virtual processor. I like the Go language; it is an improved version of C without the complexity of C++. I wonder what would have happened if Go had been introduced prior to C++.

I also learned about IDLE, the Python editor (and debugger!). I found debugging a little confusing, most likely because my long experience with debuggers is mostly Visual Studio. (Although I did write an interactive debugger for 8080 assembly language programs, back when the Heathkit H-89 was popular. I called it 'BEDBUG', for "BEtter DeBUGer". The name was suggested by my father.)

And on a third project, I find I need an RS-232 breakout box. I used to have one, a Heathkit PMK-130 that I had assembled myself. I don't know where it is now; I suspect that I sold it some years ago. (And now breakout boxes are rarities, with only one model available on Amazon.com.)

Sunday, October 15, 2017

Today a BASIC interpreter, tomorrow a BASIC compiler

A side project is building a compiler for the BASIC programming language. (Why? Because it's fun!)

A previous fun project was a BASIC interpreter. (Two interpreters, actually.) For them, I used Ruby. The projects were a way for me to learn the Ruby language, and I succeeded.

For the compiler, I want a language different from Ruby. I find Ruby is an excellent language for high-level concepts; I don't know that it can read and write individual bytes and I don't know how it can convert integers and real numbers to bytes. (It probably can, perhaps with the assistance of a library.)

If the language will not be Ruby, what will it be? First, a few requirements:

It has to run under Linux. I use Linux for my development work and I don't see that changing.

Specifically, it has to run on my workstation, a venerable Apple PowerBook G4 which runs Ubuntu Server 16.04 LTS. It has to run in text mode (which just about every language does).

My plan is to create a compiler that generates bytecode, or p-code. I'm not going to target a real processor like the Intel 386 or ARM chip; I will write bytecode and use a runtime system to execute them, much like Java and C#.

Some possible languages and thoughts:

C++: Available on Linux. But my day job sees me working with lots of C++ code, and I want something different.

C: Not C++, but not quite different enough. And I am comfortable in C; another language is another opportunity to learn.

C#: Available on Linux, but I'm not sure that the .NET environment is available for the PowerPC architecture. And I would rather have a compiler language, not a bytecode language. (My own bytecodes will be enough, thank you!)

VB.NET: Exact same problems as C#.

Java: Available but uses a bytecode arrangement as C#.

Python: If not Ruby, why not Python? I'm using it for a different project, so not for this one. And it is not a true compiled language.

Perl: Umm... no.

Go: Possible. There is some support for PowerPC, and it may run on my PowerBook.

Swift: Compiled. May be available for the PowerPC. My preference, knowing nothing about either language, is for Go rather than Swift.

Pascal: Compiled. Probably available for PowerPC. A language I know. (Or knew, a long time ago.)

Forth: Interpreted and not compiled. Probably available for Powerbook. The Forth I learned was an operating system and took over the computer. Can modern Forth read and write files?

Objective C or Objective C++: I don't see a future for either of these languages. Learning them now would be... unprofitable.

Fortran: Really no.

Ada: More "no" than Fortran.

* * * * *

Update:

There is a Go compiler for Linux, and it does run on the PowerPC processor. I've installed it and written "Hello, world!".

There is a Pascal compiler for Linux, and it does run on the PowerPC. Almost. The "Free Pascal" package doesn't install on Ubuntu 16.04, and I'm not the only one to experience problems. It also appears that Free Pascal is the only Pascal compiler for Linux. Therefore, Pascal is out.

With no option for Pascal, I'm choosing Go as the language. (In full disclosure, I must admit that I tried very little to resolve the problem with Free Pascal. Maybe I wanted to use Go all along, and the install failure was enough to push me to Go.)

Wednesday, October 11, 2017

The BASIC that wouldn't die

I started a project several years ago: write a BASIC interpreter. Not Visual Basic, but the old-school interpreted Beginner's All-Purpose Symbolic Instruction Code, with '10 PRINT "HELLO"' BASIC.

My real purpose was to learn the Ruby language. It was a good choice: the BASIC interpreter was complex enough to provide a challenge, yet not so complicated as to be daunting. I learned a lot about Ruby, and I also learned a lot about BASIC.

Writing a BASIC interpreter is not the easiest thing in the world. Writing an interpreter for any (serious) language is a non-trivial task (and yes, I do consider BASIC to be a serious language). I used several vintage books to help me: Kemeny and Kurtz' "The BASIC Programming Language", Coan's "Basic BASIC", Tracton's "57 Practical Programs and Games in BASIC", and Dave Ahl's "101 BASIC Computer Games". All of these (except for Tracton) predate Microsoft's dominant variant of BASIC.

We think little now about language variation. Before Microsoft's BASIC became the standard, there were different implementations of BASIC from different companies. DEC, HP, GE, and others created BASIC for their hardware, and each was unique. Differences ranged from simple appearances (a LIST of a program would left-pad line numbers with zeros, or not) to syntax (subscripted variables use parentheses, unless they use the now more traditional square brackets) to execution features (in one version, FOR/NEXT loops leave the index set to one beyond the terminal value). Early versions of BASIC also had statements that Microsoft dropped, such as the MAT statements and multi-statement user-defined functions.

I added many of the variants into my interpreter, using command-line switches to enable or disable specific behaviors.

I also added to the language and the interpreter. I added a set of ARR statements ('ARR' for 'array') which correspond to the MAT statements for matrices. In the interpreter, I added a profiler and a cross-reference output. I have seen neither of these in BASIC, and I wanted them.

Now, with not one but two versions of the interpreter near completion, I find myself looking for a new project. I'm considering a compiler for BASIC. Seriously. My inspiration is the old CBASIC compiler and the UCSD p-System. Both compiled code to an intermediate byte-code (or p-code) that was executed by a runtime environment, much like Java and .NET programs of today.

My idea is a simple compiler, one that generates assembly language for a fictitious processor. An assembler will convert that code into bytecode, and a runtime system will execute the bytecode.

The compiler can be as simple as an extension to the existing interpreter. I should be able to run the interpreter, load a program, and then type a command (probably 'COMPILE') and write the pseudo assembly code to a file. After that, a separate program can assemble the code and create the bytecode file. Finally, a run-time system (or an interpreter) can execute the bytecode.

So I have convinced myself that this new endeavor is worthwhile, and I will work on it as time permits.

BASIC just won't die!

Sunday, September 17, 2017

This Week I Learned...

This week I learned that Visual Studio 2015 provides timing information when stepping through code in the debugger. We previously used Visual Studio 2010 which did not provide such information.

This week I learned how to configure a Python-based web app service in Azure to handle multiple routes. (The right magic must be put in the web.3.4.config file.)

Wednesday, April 27, 2016

Data tools pay for themselves over time

Several years ago I worked on a project that required we pick apart Excel .XLS files at the byte level. We didn't use Microsoft's COM components to read the file. We didn't use a third-party library. We read the file (in C++ and in binary mode) and did exactly what Microsoft recommends we do not do: we parsed the file ourselves.

As part of this effort, I built a small utility to "dump" the contents of an Excel file. The common programs extract cell values, but I wanted something more detailed. I wanted a list of the various BIFF records in the file. (Microsoft Excel .XLS files consist of a series of binary records, each record describing some aspect of the contents. Each record contains a type field, a length field, and a set of fields that depends on the type of record, and some fields have a variable size. It's a compact and powerful format.)

The utility program (called "DumpXls", cleverly enough) was useful to identify the different records needed to construct a proper .XLS file. It took some time to create, time that turned out to be an investment.

It was a useful investment, because this past week I had a problem with the Python 'xlwt' library. (The 'xlwt' library lets Python create an .XLS file.) My old "DumpXls" utility helped me diagnose the problem and find a solution... and quickly. The time spent creating that program years ago was more than made up by the time saved in finding this week's problem.

The "dump" program was simple, yet it required a good understanding of the file format. When working with a new file format, it helps one understand the format to build such a "dump" program. The program is useful, as is the knowledge of the format.