PxdScript language tutorial series.

Chapter 9 - Emitting Assembly Code:

by Kasper Fauerby


Hi again and welcome to an ultra-short chapter in which we'll emit the assembly code we calculated in chapter 8 to a text file for closer inspection. Also we'll use this module to calculate something I briefly mentioned in chapter 8 - the maximum stack increase a function or program can lead to. This number allows us to decide whether or not there is enough free space on the stack before calling a program or function. This number depends on the opcodes we use in the assembly representation of the program so it is something we could not calculate before the code generation has been done.  

Calculation of the stack change

To calculate the maximum stack increase a block of code (function or program) can lead to we must check every possible path through the code and for each location in the code keep track of the stack height at that point. The stack invariant tells us that such a stack height is unique for a particular place in the code no matter how we reached the opcode (by different paths through the code). When we have visited every opcode in a block of code we know that the maximum stack height we recorded during our walk through it will be the maximum change in stack height during any execution of the code block.

So how do we walk through all the opcodes? That is easy enough. Remember that we keep the code as a linked list of CODE structs. So we simply start with the first opcode and record how that affects the stack, then continue to the next one in the list. For every opcode that might branch (a conditional jump opcode) to a label we recurs on the jump (i.e.. follows the label) and when that call returns we continue on the branching opcodes 'next' pointer. This is in essence the same as first following the path the VM would take if the condition evaluates to true and afterwards following the path the VM would take if it does not branch.

To make sure we don't recurs forever we'll only look at a CODE struct if it has not already been visited (we keep track of this using the 'visited' field in the CODE struct).

In code it looks like this:

int limitCODE(CODE *c) {
   maxStackSize = 0;
   findLimit(c, 0);
   return maxStackSize;

where the function 'findLimit' is a recursive function that does what I described above - keeping track of the change in stack height. Here it is (partially, too long to list it all):

static void findLimit(CODE *c, int currentSize) {

    /* only adjust & recurse if we haven't already visited this CODE node. */

    if (!c->visited){
        c->visited = 1;

            case nopCK:
                /* no changes */
            case iremCK:
                currentSize -= 1;                                             /* irem pops two values and pushes one. So total stack change is -1 */
            /* an example of a branching opcode */
            case ifeqCK:
                currentSize -= 1;                                             /* no matter if it branches or not it'll pop a value from the stack */
                findLimit(emitLabels[c->val.ifeqC].position, currentSize);    /*recurs to the point in the code where the jump would go  */
            /* an example of an opcode that increases stack */
            case ldc_intCK:
                currentSize += 1;                                             /* ldc_int loads an integer to the stack */
                maxStackSize = max(currentSize, maxStackSize);                /* check if the current size is the max so far */
            /* the call is a bit special */
            case callCK:
                currentSize += signature2StackChange(c->val.callC);
                maxStackSize = max(currentSize, maxStackSize);
        /* continue along the CODEs next pointer */
        if(c->next != NULL)
            findLimit(c->next, currentSize);

Notice the check on a 'call' opcode. The way it affects the stack depends on how many parameters it has and if it has a return type. Before the call opcode all parameters are build as expressions on the stack and the function call will pop these when it returns. Also if the function is non-void it'll leave a return value on the stack.

Remember in chapter 8 we calculated the functions signature which was simply a string describing how many parameters the function took and if it had any return values. The function 'signature2StackChange' will parse that string to figure out how the function call affects the stack height. See the source code for more details on this.

Emitting the code to a text file

I'll not say much about this because it's really easy to see what's going on by looking at the source.

Basically there is a function 'void emitCODE(CODE *c)' which will print the right text symbol to the output file based on which opcode 'c' is. The rest is simply a run through the top-levels of the AST (the toplevels are the function and program nodes). For each toplevel node the stack change is calculated as described above, some sort of header is written to the output file and finally its linked list of CODE structs are dumped as the byte code in symbolic form.

And that's really all there is too it - go look at the source now!


Final words

This was a very short chapter - a nice relaxing read after several complicated chapters I should think. In a sense this actually completes our compiler - the emitted assembly could be read from the text-file into the VM and be assembled at load time. We chose to pre-assemble the code though and save the final output of the compiler as binary file containing the byte code for the programs. This is what we'll do in the next chapter where we'll build an assembler which reads in the emitted file from this chapter and assembles it to the binary format we want our VM to support.

For now try to play around with the compiler. Write some small programs for it, compile them and check the assembly code. What is good about the code? What is bad? Try looking for places where the compiler really does a bad job of generating code. Try writing some really wacky programs using complex features of the language (I've tried doing that in the small example I put in the zip package). I think you'll be impressed with how powerful and robust the compiler actually is! It produces correct code for even the most complicated lines of code which is something you would have a hard time doing if you had used a more "hacked" approach to writing a script language.

Important: If you find some mistakes in the code generations or make the compiler crash (that can happen :) please either fix the problem and tell me about it or send the source code that made the compiler crash :)

Keep on codin'

  Kasper Fauerby