Sunday, October 15, 2017

Today a BASIC interpreter, tomorrow a BASIC compiler

A side project is building a compiler for the BASIC programming language. (Why? Because it's fun!)

A previous fun project was a BASIC interpreter. (Two interpreters, actually.) For them, I used Ruby. The projects were a way for me to learn the Ruby language, and I succeeded.

For the compiler, I want a language different from Ruby. I find Ruby is an excellent language for high-level concepts; I don't know that it can read and write individual bytes and I don't know how it can convert integers and real numbers to bytes. (It probably can, perhaps with the assistance of a library.)

If the language will not be Ruby, what will it be? First, a few requirements:

It has to run under Linux. I use Linux for my development work and I don't see that changing.

Specifically, it has to run on my workstation, a venerable Apple PowerBook G4 which runs Ubuntu Server 16.04 LTS. It has to run in text mode (which just about every language does).

My plan is to create a compiler that generates bytecode, or p-code. I'm not going to target a real processor like the Intel 386 or ARM chip; I will write bytecode and use a runtime system to execute them, much like Java and C#.

Some possible languages and thoughts:

C++: Available on Linux. But my day job sees me working with lots of C++ code, and I want something different.

C: Not C++, but not quite different enough. And I am comfortable in C; another language is another opportunity to learn.

C#: Available on Linux, but I'm not sure that the .NET environment is available for the PowerPC architecture. And I would rather have a compiler language, not a bytecode language. (My own bytecodes will be enough, thank you!)

VB.NET: Exact same problems as C#.

Java: Available but uses a bytecode arrangement as C#.

Python: If not Ruby, why not Python? I'm using it for a different project, so not for this one. And it is not a true compiled language.

Perl: Umm... no.

Go: Possible. There is some support for PowerPC, and it may run on my PowerBook.

Swift: Compiled. May be available for the PowerPC. My preference, knowing nothing about either language, is for Go rather than Swift.

Pascal: Compiled. Probably available for PowerPC. A language I know. (Or knew, a long time ago.)

Forth: Interpreted and not compiled. Probably available for Powerbook. The Forth I learned was an operating system and took over the computer. Can modern Forth read and write files?

Objective C or Objective C++: I don't see a future for either of these languages. Learning them now would be... unprofitable.

Fortran: Really no.

Ada: More "no" than Fortran.

* * * * *

Update:

There is a Go compiler for Linux, and it does run on the PowerPC processor. I've installed it and written "Hello, world!".

There is a Pascal compiler for Linux, and it does run on the PowerPC. Almost. The "Free Pascal" package doesn't install on Ubuntu 16.04, and I'm not the only one to experience problems. It also appears that Free Pascal is the only Pascal compiler for Linux. Therefore, Pascal is out.

With no option for Pascal, I'm choosing Go as the language. (In full disclosure, I must admit that I tried very little to resolve the problem with Free Pascal. Maybe I wanted to use Go all along, and the install failure was enough to push me to Go.)

Wednesday, October 11, 2017

The BASIC that wouldn't die

I started a project several years ago: write a BASIC interpreter. Not Visual Basic, but the old-school interpreted Beginner's All-Purpose Symbolic Instruction Code, with '10 PRINT "HELLO"' BASIC.

My real purpose was to learn the Ruby language. It was a good choice: the BASIC interpreter was complex enough to provide a challenge, yet not so complicated as to be daunting. I learned a lot about Ruby, and I also learned a lot about BASIC.

Writing a BASIC interpreter is not the easiest thing in the world. Writing an interpreter for any (serious) language is a non-trivial task (and yes, I do consider BASIC to be a serious language). I used several vintage books to help me: Kemeny and Kurtz' "The BASIC Programming Language", Coan's "Basic BASIC", Tracton's "57 Practical Programs and Games in BASIC", and Dave Ahl's "101 BASIC Computer Games". All of these (except for Tracton) predate Microsoft's dominant variant of BASIC.

We think little now about language variation. Before Microsoft's BASIC became the standard, there were different implementations of BASIC from different companies. DEC, HP, GE, and others created BASIC for their hardware, and each was unique. Differences ranged from simple appearances (a LIST of a program would left-pad line numbers with zeros, or not) to syntax (subscripted variables use parentheses, unless they use the now more traditional square brackets) to execution features (in one version, FOR/NEXT loops leave the index set to one beyond the terminal value). Early versions of BASIC also had statements that Microsoft dropped, such as the MAT statements and multi-statement user-defined functions.

I added many of the variants into my interpreter, using command-line switches to enable or disable specific behaviors.

I also added to the language and the interpreter. I added a set of ARR statements ('ARR' for 'array') which correspond to the MAT statements for matrices. In the interpreter, I added a profiler and a cross-reference output. I have seen neither of these in BASIC, and I wanted them.

Now, with not one but two versions of the interpreter near completion, I find myself looking for a new project. I'm considering a compiler for BASIC. Seriously. My inspiration is the old CBASIC compiler and the UCSD p-System. Both compiled code to an intermediate byte-code (or p-code) that was executed by a runtime environment, much like Java and .NET programs of today.

My idea is a simple compiler, one that generates assembly language for a fictitious processor. An assembler will convert that code into bytecode, and a runtime system will execute the bytecode.

The compiler can be as simple as an extension to the existing interpreter. I should be able to run the interpreter, load a program, and then type a command (probably 'COMPILE') and write the pseudo assembly code to a file. After that, a separate program can assemble the code and create the bytecode file. Finally, a run-time system (or an interpreter) can execute the bytecode.

So I have convinced myself that this new endeavor is worthwhile, and I will work on it as time permits.

BASIC just won't die!