bild
School of
Computer Science
and Communication

Project in DD2488, Compiler Construction (komp14)

Write your own complete compiler for MiniJava!

Requirements

You are allowed and infact encouraged to work in pairs on the project. The project must be completed before you take the theory exam. You will be graded individually, from the project result and from the exam result.

Students that choose to carry out the project alone, will be paired with another student to form a feedback group. These students should read a late version each other's compiler sources and provide critique. In the final report, include about half a page where the critique is summarised. I imagine that any feedback work will take place just before the oral project presentation, i.e, in May. [If you are in a project group, nothing in this parapraph is relevant.]

You are allowed to discuss the project with anybody taking the course this year. You are of course not allowed to copy any code, only discuss general ideas.

Copying code from other students or from the Net is considered to be in breach of the course rules and therefore the Code of Honour. If you're not familiar with the Code of Honour, please follow the link on the left.

Project summary:

  • You must implement a correctly functioning and well-designed solution for all features you want to be part of your graded compiler.
  • You must document your solution with a short report in pdf, a few A4 pages. See the examination page [later] for more information. The report should also contain instructions for how a user can build and run your compiler.
  • In addition to the compiler itself, you need to write tests as defined in the section "Testing" below.
  • You need to submit the compiler to Tigris and get both compilation and execution accepted.
  • You need to book a meeting for presenting and demonstrating the project. I understand that if you have worked two on the project, you might understand some parts better than other parts. But if you have worked like that, you nevertheless need to make sure you understand the entire program well, even the parts you have not written. We intend to make the presentation sessions at least 30 minutes, since this discussion is an important part of the course.

Minijava

We will use the Minijava language almost as described in the course book [Appel]. You should not support nested comments, which do not exist in the Java programming language, and hence Minijava programs will be a true subset of Java programs (almost true*).

Due to some quirks in the book grammar, please use this grammar instead.

(*) Actually, you do not need to check that variables are initialised, so some legal Minijava programs with undefined behaviour will in fact not be legal Java programs.

While it is possible to do the project in other languages, using Java is recommended as the resources provided contain partial Java implementations of compiler modules. If you want to use a different implementation language, you unconditionally need to talk to me before you start.

Levels

The compiler project will be graded according to a point system. Furthermore, a basic compiler plus one backend are mandatory, yielding a minimum of 125p with the report. Here is a table over the planned points:

points     extension code                     feature
100 syntax checking compiler for the basic language
20 ISC inheritance syntax checks
10/30 ICG inheritance code generation (10 with JVM, 30 with other backends)
30 JVM JVM backend (using the jasmin "assembler")
70 ARM ARM CPU backend with register allocation
70 X86 X86-64 CPU backend with register allocation
70 MIPS MIPS CPU backend with register allocation
70 PPC PPC CPU backend with register allocation
70 SPARC SPARC32 CPU backend with register allocation
30 SPILL register spilling (depends on non-JVM backend)
15 LONG add long and long[] [see grammar]
15 IWE 'if' statements with or without 'else' [see grammar]
20 NBD nested blocks with new variable declarations [see grammar] [N.B. SEE NOTE BELOW]
0/10 ABC array bounds checks (0 for JVM, 10 for other backends)
1 CLE CGT CGE per comparison operator ('<=', '>', '>=') [see grammar]
3 CEQ CNE per comparison operator ('==', '!=') [see grammar]
3 BDJ logical connective '||' (binding looser than '&&') [see grammar]
15 INT32 int is really 32 bits for the x86-64 backend
x student suggested extension

Furthermore, we will award points for the report, 10 for a good report, 5 for an acceptable report, and 15 for an extraordinary report.

You are welcome to implement all back-ends, but we will then not grade more than two of them, since once you have shown that you can re-target your compiler, we think you will learn more by trying some of the other optional features.

Please annotate your compiler with all applicable extension codes. This is done using the DESC file, see Tigris. These codes will be used to guide the test system. (See also the section Testing below.)

The project grade limits are 125 ≤ E < 140 ≤ D < 160 ≤ C < 190 ≤ B < 220 ≤ A. For project grade A or B, a CPU backend is required. (With theory exam grade A you can get at most B for the entire course without a CPU backend.)

Note the "with register allocation" clause for the CPU backend extensions. You need to allocate every (used) scalar variable to a register during its lifetime. Variables may not live in the stack frame (except with SPILL, but that extension does not preclude a well working register allocator). All CPUs' ABIs define that the first k method parameters are passed in registers, and that the remaining ones are passed on the stack; you should copy the latter into 'TEMPs' in the function feed-in code and then use your register allocator also for these.

NOTE about SPILL: You either need to generate code using a frame pointer or make preliminary stack offsets for references to the k+1 and later incoming parameters (k is defined in the previous paragraph). The reason is that the spill area's size is unknown before register allocation, and so is the spill frame area. The result is that the distance from the stack pointer will be indeterminate until after register allocation/spill handling. This is not hard, but can be confusing if forgotten.

NOTE about NBD: Java does not allow proper variable scoping, e.g., {int i; ... {int i; ... } } is invalid. You should adhere to this Java limitation.

NOTE about LONG: See note about INT32 and TEMP typing below.

NOTE about int sizes and INT32:

With a 64-bit backend such as X86-64, references will need to be 64 bits large while the int type in MiniJava is 32-bits.

To make this work properly, one needs to add an operation width to TEMPs, and have the code generator handle that. One would probably not want to allow binops with different type operands, and instead enforce explicit sign extensions from 32-bit to 64-bit.

Surely, several variations would be possible, at least one group have added types at the binops instead.

It will be allowed to let int be 64 bits wide. Test cases that rely on exact 32-bit behaviour (i.e., depend on 32-bit overflow) will be annotated so that those can be omitted for INT64 compilers. You will have to annotate your compilers too; the "extension code" will be INT64. This will not add or remove any points from your compiler.

Note that allocating space with the runtime allocation function is done by passing the number of 32-bit ints. When using 64-bit ints, you will need to multiply this by two. (FIXME: This will change to number of bytes.)

If you implement proper 32-bit ints for the x86_64 backend, you will be awarded 15 extra points (as per the INT32 extension). For other backends, the points-deserving problems will not come up since references/pointers there are 32 bits, like Minijava ints.

Output

Clearly, the output of a full blown compiler is runnable code, but to assist in debugging, you might want to output some representation of the results of each step in the process, e.g., syntax trees, symbol tables and interference graphs.

Resources

The course directory /info/komp14 includes tools, examples and code skeletons.

Some useful documentation:

Testing

Additional information about the test cases can be found at the testcases page.

We have a set of test cases already, some of which we publish (see Resources above), and some which are hidden. Your compiler needs to pass all tests to be awarded its points. You will not get partial points for a compiler that passes a partial set of tests.

Any compiler writer needs to write her/his own tests. You need to write your own tests for this project too. You need to write at least these tests:

  1. 5 bogus tests that contains some lexical error, syntax error, or type error. These tests don't need to be large, but please try to be tricky. Binary dumps, jibberish, or anything that is not even close to MiniJava is not OK.
  2. One test that compiles and executes, using just the base language.
  3. One test that compiles and executes, using the fully extended language you chose to implement. If you only implemented the base language, this will be a different test than the the previous one.

As a guideline, the tests should contain about 1000 tokens in total. You may add poor programming constructs, such as easily foldable expressions (e.g., a = 3*5-i+(i-i)*43) but please don't fill out a significant part of the 1000 tokens with such trivialities.

The tests 2 & 3 should execute for < 1 second using the Java compiler on a modern computer, and it should output something useful, which can be used to check that compiler have generated correct code for the test. The tests should not use too much memory, over 100 MB is not good.

Computing primes, factoring numbers in a range, sorting an array of numbers, solving boolean equations, are some examples of what your tests might do. Your tests need to be well-defined, but do not need to win any beauty contests.

Your tests will be used by the test system to test not only your compiler, but the compilers of all other students. There will be a deadline for when the tests have to be ready, so as to avoid creating a "moving target" for compiler acceptance. The deadline is 2014-04-25 2359.

Please annotate your tests with the implemented extensions from the table above.

Qemu images

In order to test the code generated by your compiler, you might want to get convenient access to a supported target. We assume that you have a machine that runs Java and thus can execute jvm code, if needed. You'll then also need the jasmin assembler, as mentioned above. If your target is X86-64, then you'll need a PC running GNU/Linux, BSD, or Solaris. Neither Windows nor Mac OS will work, since they use private ABIs; the Windows calling convention is actually completely different.

Other CPU targets might be harder to get by. Your cellphone (whether smart or dumb) almost certainly has an ARM CPU, but programming for it is either blocked and awkward (Apple) or just awkward (Android).

Instead, we suggest that you use our qemu images, or make similar images yourself. If you have a Raspberry Pi or some such system, that might also be convenient.

These images should run under any reasonably recent qemu. They were created under qemu 1.7.0.

Image download links:

Qemu arm32 bz2 Qemu arm32 lz
Qemu mips32 bz2 Qemu mips32 lz
Qemu sparc32 bz2 Qemu sparc32 lz

There is a boot.sh script in each directory. This can be used to boot the image, which will create some sort of boot console (as described in each header comment) as well as start tap networking and an sshd daemon.

In order for the image to get an IP address, an dhcp (or bootp) server needs to be within reach.

How to do tap networking depends on your host. There are more guides about this on the Net than I can count.

The accounts are described in the boot.sh scripts.

Published by: Torbjörn Granlund <infomaster@csc.kth.se>
Updated 2014-05-30