Last time, I talked about the performance drawbacks of bootstrapping a
compiler using an interpreter, and how this was one of the factors
that led me to bootstrap via SBCL. But
Pachuco does contain a
simple
metacircular
interpreter, used to support its Common-Lisp-style macro system.
So I thought it would be interesting to run the compiler under the
interpreter, and see how long it actually took.
The changes to support this are modest; their focus is support for
passing command line arguments down into the program running under the
interpreter. But this isn't of much general use, so the changes
are off
in a branch.
In the Pachuco build process, stage1 is the process of
bootstrapping the compiler under SBCL, i.e. running the compiler under
SBCL. Here is a timed run of stage1:
$ rm build/stage1.s
$ time make build/stage1.s
mkdir -p build
scripts/compile -C scripts/sbcl-wrapper -s -o build/stage1 language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco
real 0m6.736s
user 0m5.604s
sys 0m1.106s
Now here is the equivalent running under the interpreter — the
compiler running under the interpreter running under SBCL.
$ time make build/stage1i.s
$ mkdir -p build
scripts/sbcl-wrapper interpret runtime/runtime.pco runtime/runtime2.pco runtime/gc.pco language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco -- compile runtime/runtime.pco runtime/runtime2.pco runtime/gc.pco language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco >build/stage1i.s
real 14m9.997s
user 13m30.174s
sys 0m12.979s
So the compiler runs 150 times more slowly under the interpreter.
We can do the same experiment, but using Pachuco rather than
SBCL. stage2 is the process of building the compiler using
the result of stage1, i.e. building Pachuco with itself:
$ rm build/stage3.s
$ time make build/stage3.s
mkdir -p build
scripts/compile -C build/stage2 -s -o build/stage3 language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco
real 0m2.034s
user 0m1.633s
sys 0m0.362s
And now the equivalent with the interpreter in the mix. That is, the Pachuco-compiled interpreter running the compiler compiling itself. Got that?
$ time make build/stage2i.s
mkdir -p build
build/stage1 interpret runtime/runtime.pco runtime/runtime2.pco runtime/gc.pco language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco -- compile runtime/runtime.pco runtime/runtime2.pco runtime/gc.pco language/util.pco language/expander.pco language/interpreter.pco compiler/mach.pco compiler/mach-x86_64.pco compiler/compiler.pco compiler/codegen.pco compiler/stack-traditional.pco compiler/stack-no-fp.pco compiler/codegen-x86_64.pco compiler/driver.pco compiler/drivermain.pco >build/stage2i.s
real 5m2.325s
user 5m0.765s
sys 0m0.926s
Almost exactly the same slowdown as under SBCL: 150 times.
(You might also notice that the Pachuco-compiled cases are about 3
times as fast as under SBCL. But this is not an apples-to-apples
comparison. First, SBCL reads Lisp source code and has to compile it
before running it, whereas the Pachuco cases involve the native
executables produced by the Pachuco compiler (though this start-up
cost for SBCL should be negligible in the long-running cases).
Secondly, I have SBCL configured to prioritize safety and
debuggability rather than optimizing performance during compilation.
Thirdly, Pachuco currently cuts corners for the sake of simplicity in
ways that help its performance.)
So the interpreter does indeed introduce a slow down of a couple of
orders of magnitude. The interpreter could quite easily be made more
efficient, and performance improvements could be valuable: the
performance of the interpreter is of practical important due to its role
in the implementation of the macro system. Because Pachuco
features a small core language, on top of which many control structures and
language features are implemented using macros, a significant fraction
of the time taken to compile a program is actually
spent performing macro expansion, and thus in the interpreter. So
some low-hanging fruit for speeding up compilation could be in the
interpreter! But at this stage it seems unlikely to be worth
sacrificing the simplicity and brevity of the interpreter.