A few months ago, Imagination Technologies announced that
they were giving
away MIPS
Creator CI20 boards to open-source developers (Imagination
acquired MIPS
Technologies last year). The CI20 is a development board
based on the Ingenic JZ4780 SoC, which includes a 1.2GHZ
dual-core MIPS32
processor. Developers had to request a board by submitting a
short proposal: I proposed to port my Pachuco Lisp dialect to MIPS,
expecting that there was little chance of actually getting a
board. So I was surprised when I got an email telling me that
they had sent one to me. Now I am following through on my
proposal, and I thought I'd write about the process as I
go.
There are other hands-on
reviews of the CI20, so I won't say much about the board
itself, except to mention one nice feature: It has an 8MB
flash chip on the board and comes loaded with Debian Wheezy,
so I was able to get started with it straight away without
going through the process of downloading an image and writing
an SD card. But it's hard to recommend the board given that
it is not actually available for purchase. I'm not sure what
further plans Imagination has for these boards.
Pachuco
Pachuco started out as a minimal compiler targeting only
x86-64/i386 (the differences between the two are minor).
Later on I ported it to ARM. To give an idea of the work
involved in targeting a new architecture, there are 750
non-whitespace lines in the two ARM-specific files. ARM and
MIPS are both RISC instruction sets, so it seems like like it
should be fairly straightforward to add support for MIPS code
generation.
Pachuco was originally an exercise in strict minimalism:
How simple can a Lisp compiler be, to be able to compile
itself? But that goal is quite limited. For instance,
Pachuco was able to bootstrap itself before it had a garbage
collector. As no heap space was ever re-used, the heap would
grow to several hundred MB, but that's hardly a lot of memory
by today's standards. And the code generated by those early
minimal versions of Pachuco had many obvious inefficiencies.
So I succumbed to the temptation to elaborate it. The goal
changed to making the compiler able to compile itself in as
little time as possible. This is still a fairly minimalist
approach. I've tried some optimizations only to throw the
work away because they don't pay for themselves — they
cost more cycles when compiling than they save through
improvements for the generated code. But other enhancements
are consistent with this goal, and today Pachuco has many of
the features of a real Lisp system, including a GC and proper
tail calls, and it implements classic techniques for efficient
function calls and variable access.
The MIPS instruction set architecture
In order to re-target a compiler, you need to understand
the target machine code well enough to know how to produce
efficient sequences of instructions for the primitives
generated by the machine-independent portion of the compiler.
The only time I've done any development on a MIPS machine
before this was a few hours on a SGI Indigo
back in the mid 90s, and that didn't involve any low-level
work. But I've been exposed to MIPS machine code through
papers and books: As one of the most purist of the RISC
architectures that dominated the Unix workstation market in
the 80s and 90s, it is widely used as a case study. So I
already felt familiar with the outlines of the ISA, I just
needed a details reference manual. The JZ4780 implements
MIPS32 revision 2 (MIPS32 is based on the 32-bit MIPS ISA from
the R3000 with various extensions, some borrowed from the
64-bit line of MIPS processors that began with the R4000). I
downloaded the documents from
the MIPS32 page on the Imagination site. It's a minor
annoyance, but you have to register on the site to download
the PDFs. (If I remember correctly, ARM also requires
registration. Intel gets a bonus point for making the x86
manuals available for unrestricted download). Then I spent a
bit of time browsing the instruction set manual to get my
bearings, looking particularly for areas that wouldn't
correspond closely to the existing x86 and ARM code
generators.
Assembly syntax issues
The Pachuco compiler generates an assembly file rather than
producing an executable directly. That assembly file gets
passed to gcc, together with one small C file, to produce the
executable. That's the only C involved; the rest of Pachuco,
including the GC and runtime, are written in Pachuco. (It
would be nice to support a truly standalone bootstrap process
without relying on an external assembler or any non-Pachuco
code. But the ELF executable format is intricate, and writing
Pachuco code to generate it directly seems like a distraction
from the main goals of the project.)
So it's not quite enough to know the instruction set. I
also needed to know the assembly syntax. That would be easy
if assembly file only contained instructions, but it also
contains directives that describe, well, everything else:
- the program's static data
- what goes in which sections
- fine tuning of the layout of data and instructions in
memory
- debug information (i.e. if you compiled with -g)
- various non-debug meta-information
- other miscellaneous assembler settings
Although the gas assembler is ubiquitous on Linux, for a
particular target it often conforms to conventions established
by a once-dominant Unix (for MIPS, I think that means IRIX).
The instruction syntax is usually standardized and
well-documented (x86 is an exception, with two different
syntaxes in use). But the directives are not: there is a lot
of variation for different targets, and if they were ever
documented, that documentation is not easily available today.
The Machine Dependencies section of the gas manual
tends to be rudimentary and incomplete.
So the easiest way to discover the necessary directive
syntax is to look at the assembly files generated by gcc with
the -S option. By crafting appropriate C programs as
running them through gcc -S, you get to see what
instructions are used, and more importantly, what directives
are involved. For example, here's a simple C program:
extern int var;
int foo(int x)
{
var = x;
return 123456789;
}
And here's the MIPS code when compiled with gcc -S foo.c:
.file 1 "foo.c"
.section .mdebug.abi32
.previous
.gnu_attribute 4, 1
.abicalls
.option pic0
.text
.align 2
.globl foo
.set nomips16
.ent foo
.type foo, @function
foo:
.frame $fp,8,$31 # vars= 0, regs= 1/0, args= 0, gp= 0
.mask 0x40000000,-4
.fmask 0x00000000,0
.set noreorder
.set nomacro
addiu $sp,$sp,-8
sw $fp,4($sp)
move $fp,$sp
sw $4,8($fp)
lui $2,%hi(var)
lw $3,8($fp)
sw $3,%lo(var)($2)
li $2,123404288 # 0x75b0000
ori $2,$2,0xcd15
move $sp,$fp
lw $fp,4($sp)
addiu $sp,$sp,8
j $31
nop
.set macro
.set reorder
.end foo
.size foo, .-foo
.ident "GCC: (Debian 4.6.3-14) 4.6.3"
As you can see, there can be a lot of directives! With
some experimentation, it's possible to get an idea of what the
directives do and which ones are really needed in the assembly
generated by Pachuco.
OK, that's enough for one post, even if it was all
preliminaries. More soon.