Hi again and welcome to an ultra-short chapter in which we'll emit the assembly code we calculated in chapter 8 to a text file for closer inspection. Also we'll use this module to calculate something I briefly mentioned in chapter 8 - the maximum stack increase a function or program can lead to. This number allows us to decide whether or not there is enough free space on the stack before calling a program or function. This number depends on the opcodes we use in the assembly representation of the program so it is something we could not calculate before the code generation has been done.
To calculate the maximum stack increase a block of code (function or program) can
lead to we must check every possible path through the code and for each location in
the code keep track of the stack height at that point. The stack invariant tells us
that such a stack height is unique for a particular place in the code no matter how
we reached the opcode (by different paths through the code). When we have visited
every opcode in a block of code we know that the maximum stack height we recorded
during our walk through it will be the maximum change in stack height during any execution
of the code block.
So how do we walk through all the opcodes? That is easy enough. Remember that we keep
the code as a linked list of CODE structs. So we simply start with the first opcode
and record how that affects the stack, then continue to the next one in the list.
For every opcode that might branch (a conditional jump opcode) to a label we recurs
on the jump (i.e.. follows the label) and when that call returns we continue on the
branching opcodes 'next' pointer. This is in essence the same as first following
the path the VM would take if the condition evaluates to true and afterwards following
the path the VM would take if it does not branch.
To make sure we don't recurs forever we'll only look at a CODE struct if it has not
already been visited (we keep track of this using the 'visited' field in the CODE
struct).
In code it looks like this:
int limitCODE(CODE *c) {
maxStackSize = 0;
findLimit(c, 0);
return maxStackSize;
}
where the function 'findLimit' is a recursive function that does what I described above - keeping track of the change in stack height. Here it is (partially, too long to list it all):
static void findLimit(CODE *c, int currentSize) {
/* only adjust & recurse if we haven't already visited this
CODE node. */
if (!c->visited){
c->visited = 1;
switch(c->kind){
case nopCK:
/* no changes */
break;
.
.
case iremCK:
currentSize -= 1; /*
irem pops two values and pushes one. So total stack change is -1 */
break;
.
.
/* an example of
a branching opcode */
case ifeqCK:
currentSize -= 1; /*
no matter if it branches or not it'll pop a value from the stack */
findLimit(emitLabels[c->val.ifeqC].position, currentSize); /*recurs
to the point in the code where the jump would go */
break;
.
.
/* an example of
an opcode that increases stack */
case ldc_intCK:
currentSize += 1; /*
ldc_int loads an integer to the stack */
maxStackSize = max(currentSize, maxStackSize); /*
check if the current size is the max so far */
break;
.
.
.
/* the call is
a bit special */
case callCK:
currentSize += signature2StackChange(c->val.callC);
maxStackSize = max(currentSize, maxStackSize);
break;
}
/* continue along the CODEs next pointer
*/
if(c->next != NULL)
findLimit(c->next,
currentSize);
}
}
Notice the check on a 'call' opcode. The way it affects the stack depends on how many
parameters it has and if it has a return type. Before the call opcode all parameters
are build as expressions on the stack and the function call will pop these when it
returns. Also if the function is non-void it'll leave a return value on the stack.
Remember in chapter 8 we calculated the functions signature which was
simply a string describing how many parameters the function took and if it had any
return values. The function 'signature2StackChange' will parse that string to figure
out how the function call affects the stack height. See the source code for more details
on this.
I'll not say much about this because it's really easy to see what's going on by looking
at the source.
Basically there is a function 'void emitCODE(CODE *c)' which will print the right
text symbol to the output file based on which opcode 'c' is. The rest is simply a
run through the top-levels of the AST (the toplevels are the function and program
nodes). For each toplevel node the stack change is calculated as described above,
some sort of header is written to the output file and finally its linked list of CODE
structs are dumped as the byte code in symbolic form.
And that's really all there is too it - go look at the source now!
This was a very short chapter - a nice relaxing read after several complicated chapters
I should think. In a sense this actually completes our compiler - the emitted assembly
could be read from the text-file into the VM and be assembled at load time. We chose
to pre-assemble the code though and save the final output of the compiler as binary
file containing the byte code for the programs. This is what we'll do in the next
chapter where we'll build an assembler which reads in the emitted file from this chapter
and assembles it to the binary format we want our VM to support.
For now try to play around with the compiler. Write some small programs for it, compile
them and check the assembly code. What is good about the code? What is bad? Try looking
for places where the compiler really does a bad job of generating code. Try writing
some really wacky programs using complex features of the language (I've tried doing
that in the small example I put in the zip package). I think you'll be impressed with
how powerful and robust the compiler actually is! It produces correct code for even
the most complicated lines of code which is something you would have a hard time doing
if you had used a more "hacked" approach to writing a script language.
Important: If you find some mistakes in the code generations or make
the compiler crash (that can happen :) please either fix the problem
and tell me about it or send the source code that made the compiler crash :)
Keep on codin'
Kasper Fauerby
www.peroxide.dk