Pachuco Interpreter Performance

Last time, I talked about the performance drawbacks of bootstrapping a compiler using an interpreter, and how this was one of the factors that led me to bootstrap via SBCL. But Pachuco does contain a simple metacircular interpreter, used to support its Common-Lisp-style macro system. So I thought it would be interesting to run the compiler under the interpreter, and see how long it actually took.

The changes to support this are modest; their focus is support for passing command line arguments down into the program running under the interpreter. But this isn't of much general use, so the changes are off in a branch.

In the Pachuco build process, stage1 is the process of bootstrapping the compiler under SBCL, i.e. running the compiler under SBCL. Here is a timed run of stage1:

$ rm build/stage1.s
$ time make build/stage1.s
mkdir -p build
scripts/compile -C scripts/sbcl-wrapper -s -o build/stage1 language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco

real 0m6.736s
user 0m5.604s
sys 0m1.106s

Now here is the equivalent running under the interpreter — the compiler running under the interpreter running under SBCL.

$ time make build/stage1i.s
$ mkdir -p build
scripts/sbcl-wrapper interpret runtime/runtime.pco runtime/runtime2.pco runtime/gc.pco language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco -- compile runtime/runtime.pco runtime/runtime2.pco runtime/gc.pco language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco >build/stage1i.s

real 14m9.997s
user 13m30.174s
sys 0m12.979s

So the compiler runs 150 times more slowly under the interpreter.

We can do the same experiment, but using Pachuco rather than SBCL. stage2 is the process of building the compiler using the result of stage1, i.e. building Pachuco with itself:

$ rm build/stage3.s
$ time make build/stage3.s
mkdir -p build
scripts/compile -C build/stage2 -s -o build/stage3 language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco

real 0m2.034s
user 0m1.633s
sys 0m0.362s

And now the equivalent with the interpreter in the mix. That is, the Pachuco-compiled interpreter running the compiler compiling itself. Got that?

$ time make build/stage2i.s
mkdir -p build
build/stage1 interpret runtime/runtime.pco runtime/runtime2.pco runtime/gc.pco language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco -- compile runtime/runtime.pco runtime/runtime2.pco runtime/gc.pco language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco >build/stage2i.s

real 5m2.325s
user 5m0.765s
sys 0m0.926s

Almost exactly the same slowdown as under SBCL: 150 times.

(You might also notice that the Pachuco-compiled cases are about 3 times as fast as under SBCL. But this is not an apples-to-apples comparison. First, SBCL reads Lisp source code and has to compile it before running it, whereas the Pachuco cases involve the native executables produced by the Pachuco compiler (though this start-up cost for SBCL should be negligible in the long-running cases). Secondly, I have SBCL configured to prioritize safety and debuggability rather than optimizing performance during compilation. Thirdly, Pachuco currently cuts corners for the sake of simplicity in ways that help its performance.)

So the interpreter does indeed introduce a slow down of a couple of orders of magnitude. The interpreter could quite easily be made more efficient, and performance improvements could be valuable: the performance of the interpreter is of practical important due to its role in the implementation of the macro system. Because Pachuco features a small core language, on top of which many control structures and language features are implemented using macros, a significant fraction of the time taken to compile a program is actually spent performing macro expansion, and thus in the interpreter. So some low-hanging fruit for speeding up compilation could be in the interpreter! But at this stage it seems unlikely to be worth sacrificing the simplicity and brevity of the interpreter.

Comment from Anonymous

This reminds me of Niklaus Wirth, who, according to his former student and co-worker Michael Franz (see http://www.ics.uci.edu/~franz/Site/pubs-pdf/BC03.pdf), used to consider a change in his compiler good, if it reduced the time it took the compiler to compile itself. Thus, an increase in compiler complexity could be justified, if only the speedup it brought was enough to compile this more complex code faster.

Eugene.

Comment from David

Thanks for the link. I wasn't aware of Wirth's metric, but it has a lot in common with my rules for deciding whether to introduce a change to Pachuco. I have a benchmarking script which goes through the history in the Pachuco git repository, making a number of measurements for each version, and one of the measurements is how long the compiler takes to compile itself. I hope to get around to writing about this later on.