And I wore an onion on my belt, which was the style at the time

A local kernel exploit in the Linux kernel, involving access to a NULL pointer, was publicized recently and got a lot of attention. Jonathon Corbet provided a detailed two-part writeup on LWN.net (part 1, part 2).

The key to the vulnerability is this: If the kernel tries to dereference a NULL pointer, i.e. tries to access memory at address 0 or nearby, it actually accesses the virtual memory space of the current user process (since Linux on x86 gives the bottom 3GB of the virtual memory space to the user process and reserves the top 1GB for the kernel). And user processes can arrange for memory to be mapped at address 0 (at least under certain circumstances). So it is possible that a NULL pointer dereference in the kernel will not fail with a page-fault exception, as would usually be expected, but will actually return data that is controlled by the user process. This allows an exploit to be crafted.

One thing that I haven't seen mentioned in the ensuing coverage is the fact, once upon a time, Linux excluded such vulnerabilities by design. Actually, I hadn't noticed, or have managed to forget, that Linux changed its design, so I was a bit surprised to learn that it is vulnerable to such exploits.

Back at the dawn of time, Linux on x86 used the same 3GB/1GB virtual memory split that it does today. But it also used segments to prevent unintended access from kernel code to user-space memory. The segments used when executing kernel code covered only the top 1GB of the linear memory space, so that it was impossible to accidentally access user-space addresses. Address 0 at the bottom of the kernel's segment actually referred to linear address 0xc000000, inside the kernels address space, away from the control of user processes. When kernel code really wanted to read or write the memory of a user process, it had to call special functions to do so, which used non-default segment registers.

This changed in linux-2.1.0, back in 1996. In fact, this was the major change separating 2.1.x from the 2.0.x series, and Linus devoted his pre-2.1.0 release note to this topic. Since then, Linux on x86 has used a “flat” segment model for the kernel: The segments simply cover the whole of the linear address space, from 0 to 4GB. Linus didn't highlight one unfortunate consequence of this change — user processes can control whether NULL pointer dereferences from inside the kernel succeed, and what data they yield.

I'm not aware of anything in principle that would prevent this change being reverted on x86. But the x86-64 architecture more or less disables segmentation, so it can't support a rigid distinction between the kernel and user address spaces in the same way (in fact, I can't think of any practical way to achieve something like that on x86-64). In contrast, the RISC architectures supported by Linux tend to include some notion of address space identifiers, which are used to distinguish the user memory space from the kernel memory space, so they do not have such vulnerabilities. Certainly SPARC does this, and even ARM seems to include appropriate facilities.

All of this is probably only of historical interest. But I do find the present solutions to this vulnerability, which restrict how a user process can arrange its address space, to be regrettable. There is a pleasing purity in the idea that user processes should be able to arrange their address space however they like, and the same for the kernel, without interactions between them.