Pachuco ported to ARM

I had a few days of vacation to use up recently, and I spent some of the time working on pachuco. The main achievement was to port it to ARM. So now the compiler supports x86, x86-64, and ARM. The code is on github.

My main motivation for this project was to learn ARM machine code. The only general-purpose ISAs with a healthy future seem to be x86, x86-64 and ARM, but I haven't done any low-level development on ARM until now. The port also proves that pachuco isn't tied to x86/x86-64. It didn't require any significant changes to the core of the compiler, though a lot of code got moved around to separate the target-specific parts from the target-independent parts.

The ARM machine I used for development is an a NSLU2. This has an 266MHz XScale-IXP42x chip implementing the ARMv5 architecture, and 32MB of memory. It supports the THUMB instruction set, but I just used the main ARM instruction set.

A couple of things that arose in the process of developing the port strike me as worthy of note:

The first relates to the bootstrapping process. Although pachuco can compile itself, I still tend to develop under sbcl, because it makes identifying the causes of bugs much easier. But sbcl hasn't been ported to ARM, so I couldn't follow exactly the same process I followed on x86. The traditional way to port a self-compiler to a new platform is to to cross-compile: run the compiler on a supported machine, but generating code for ARM; then copy the results across to the ARM machine to run, or more realistically, to find out how they fail to run. But following this process literally would introduce cumbersome steps into the edit-test cycle.

What I did instead was to substitute sbcl with a wrapper script that runs sbcl on a remote x86 system via ssh. The script automatically copies the necessary files back and forth. This is still cross-compiling, but that fact is hidden from everything but the wrapper script. This required almost no changes to the main Makefile and build scripts, and allowed me to maintain a simple and rapid edit-test cycle.

The second interesting obstacle became evident as I got close to completing the bootstrap process. It turned out that the bootstrap process would take almost an hour, rather than the one or two minutes I was expecting. The cause was the assembler. The pachuco compiler produces assembly code, and uses the system assembler (specifically gas) to turn that into an executable. The assembly file produced when pachuco compiles itself is about 130k lines, and with 32MB of memory, gas swaps a lot while processing that file. I can't see a good reason for gas to use so much memory (more than the pachuco compiler uses to hold the program), except that it is most often used in conjunction with gcc, and C source files tend to be limited in size.

The solution was to split the output of the pachuco compiler into many smaller 10k-line files. gas can assemble these without swapping, and the linker connects the program back together to make the executable. Achieving this involved shuffling the order of the generated assembly code, and using global rather than local assembly labels in the appropriate places.

Pachuco on ARM now bootstraps for me in a couple of minutes (compared to 20 seconds on my Core2 laptop). It's necessary to set several environment variables and makefile variables to get there, but most of those should go away as I refine the port.

dwragg@bb5a:/tmp/pachuco$ make clean ; HEAP_SIZE=8 BOOTSTRAP_HOST=192.168.1.65 BOOTSTRAP_COMPILER_REMOTE=/home/dwragg/work/pachuco/scripts/sbcl-wrapper time make BOOTSTRAP_COMPILER=scripts/remote-bootstrap CODEGEN=simple COMPILEOPTS="-S -s" 
rm -rf build
mkdir -p build
scripts/compile -C scripts/remote-bootstrap -S -s -o build/stage0-test test/test.pco
build/stage0-test
Tests done
mkdir -p build
scripts/compile -C scripts/remote-bootstrap -S -s -o build/stage0-gc-test test/gc-test.pco
build/stage0-gc-test
GC tests done
mkdir -p build
scripts/compile -C scripts/remote-bootstrap -S -s -o build/stage1 language/util.pco language/expander.pco language/interpreter.pco compiler/walker.pco compiler/mach.pco compiler/mach-32bit.pco compiler/mach-arm.pco compiler/compiler.pco compiler/codegen-simple.pco compiler/codegen-generic.pco compiler/codegen-arm.pco compiler/driver.pco compiler/drivermain.pco
mkdir -p build
scripts/compile -C build/stage1 -S -s -o build/stage2 language/util.pco language/expander.pco language/interpreter.pco compiler/walker.pco compiler/mach.pco compiler/mach-32bit.pco compiler/mach-arm.pco compiler/compiler.pco compiler/codegen-simple.pco compiler/codegen-generic.pco compiler/codegen-arm.pco compiler/driver.pco compiler/drivermain.pco
mkdir -p build
scripts/compile -C build/stage2 -S -s -o build/stage3 language/util.pco language/expander.pco language/interpreter.pco compiler/walker.pco compiler/mach.pco compiler/mach-32bit.pco compiler/mach-arm.pco compiler/compiler.pco compiler/codegen-simple.pco compiler/codegen-generic.pco compiler/codegen-arm.pco compiler/driver.pco compiler/drivermain.pco
cmp -s build/stage2.s build/stage3.s
114.33user 11.71system 2:37.19elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
109088inputs+42832outputs (526major+137598minor)pagefaults 0swaps

1 comment

And I wore an onion on my belt, which was the style at the time

A local kernel exploit in the Linux kernel, involving access to a NULL pointer, was publicized recently and got a lot of attention. Jonathon Corbet provided a detailed two-part writeup on LWN.net (part 1, part 2).

The key to the vulnerability is this: If the kernel tries to dereference a NULL pointer, i.e. tries to access memory at address 0 or nearby, it actually accesses the virtual memory space of the current user process (since Linux on x86 gives the bottom 3GB of the virtual memory space to the user process and reserves the top 1GB for the kernel). And user processes can arrange for memory to be mapped at address 0 (at least under certain circumstances). So it is possible that a NULL pointer dereference in the kernel will not fail with a page-fault exception, as would usually be expected, but will actually return data that is controlled by the user process. This allows an exploit to be crafted.

One thing that I haven't seen mentioned in the ensuing coverage is the fact, once upon a time, Linux excluded such vulnerabilities by design. Actually, I hadn't noticed, or have managed to forget, that Linux changed its design, so I was a bit surprised to learn that it is vulnerable to such exploits.

Back at the dawn of time, Linux on x86 used the same 3GB/1GB virtual memory split that it does today. But it also used segments to prevent unintended access from kernel code to user-space memory. The segments used when executing kernel code covered only the top 1GB of the linear memory space, so that it was impossible to accidentally access user-space addresses. Address 0 at the bottom of the kernel's segment actually referred to linear address 0xc000000, inside the kernels address space, away from the control of user processes. When kernel code really wanted to read or write the memory of a user process, it had to call special functions to do so, which used non-default segment registers.

This changed in linux-2.1.0, back in 1996. In fact, this was the major change separating 2.1.x from the 2.0.x series, and Linus devoted his pre-2.1.0 release note to this topic. Since then, Linux on x86 has used a “flat” segment model for the kernel: The segments simply cover the whole of the linear address space, from 0 to 4GB. Linus didn't highlight one unfortunate consequence of this change — user processes can control whether NULL pointer dereferences from inside the kernel succeed, and what data they yield.

I'm not aware of anything in principle that would prevent this change being reverted on x86. But the x86-64 architecture more or less disables segmentation, so it can't support a rigid distinction between the kernel and user address spaces in the same way (in fact, I can't think of any practical way to achieve something like that on x86-64). In contrast, the RISC architectures supported by Linux tend to include some notion of address space identifiers, which are used to distinguish the user memory space from the kernel memory space, so they do not have such vulnerabilities. Certainly SPARC does this, and even ARM seems to include appropriate facilities.

All of this is probably only of historical interest. But I do find the present solutions to this vulnerability, which restrict how a user process can arrange its address space, to be regrettable. There is a pleasing purity in the idea that user processes should be able to arrange their address space however they like, and the same for the kernel, without interactions between them.