2008-11-23
Bootstrapping Pachuco
If you want to build a self-hosting compiler for a new programming language, you hit an interesting problem: If the compiler is the first, and therefore the only, implementation of the language, and the compiler is written in the same language, how do you compile the compiler so that you can run it?
To break this circularity, you have to come up with a second implementation of your language, just for the purpose of running the compiler. This bootstrap implementation must be written in another language, in order to escape the circularity that affected the compiler. Once you have a bootstrap implementation, you can run the compiler under it. Now you can pass the source code from the compiler as the input to the running compiler, so that it compiles itself, resulting is an executable for the compiler. That compiler executable can now be run directly by the computer, so you no longer need the bootstrap implementation. You can now use the compiler executable to compile the compiler, and to compile later enhanced versions of the compiler; the compiler has become self-sustaining. This process is called boostrapping a compiler.
So how do you obtain that bootstrap implementation? Assuming the compiler is your main goal, you want to divert as little effort as possible to the bootstrap implementation. And because it has to be written in some other programming language, it's also a in the sense that you have to work in some boring established language rather than your shiny new language. So a classic approach is to build an interpreter to act as the bootstrap implementation, interpreters being generally shorter and simpler than compilers. Interpreters are slow (particularly simple ones), but in this case, that shouldn't matter: it only needs to run once to bootstrap the compiler.
That is not the course I followed with Pachuco. The problem with it is that the compiler doesn't just spring into existence, ready to run under the interpreter. Like any non-trivial program, the compiler gets written incrementally, satisfying a gradually growing set of requirements. So the relative slowness of the interpreter (a couple of orders of magnitude for a trivial metacircular interpreter in Lisp) does matter — it makes the compiler development process cumbersome. Furthermore, the compiler will contain bugs as its development proceeds, and these bugs have to be found and fixed. A trivial interpreter lacks good error reporting and debugging facilities that make for a comfortable development environment.
So instead, I took advantage of the similarity between Pachuco and Common Lisp, and also the powerful macro system present in both languages. The compiler is written in a hybrid language that can be read, via a layer of macros, as either Common Lisp or Pachuco. This allowed me to develop the compiler under sbcl, but compile it with itself when it became sufficiently complete and robust. Since sbcl is a compiling implementation of Common Lisp, the Pachuco compiler always ran quickly, and sbcl's REPL loop and debugging facilities helped things along.