Hi again guys.
Today we'll take a look at the first module in our analyzing phase - namely the weeding module. So what does this module really do? It does exactly what the name implies - it looks through the code and weeds out programs which contains certain 'bugs' which we didn't catch in our parsing phase. Weeding is very easy to do and it uses the same technique as the pretty-printing module which I gave you in the last chapter to traverse recursively through the AST. So let's take a look at which problems we wish to catch during our weeding phase.
The first and most important thing we want to check is that every non-void function
actually returns a value - always! This might sound like a very trivial thing but
try to construct a context free grammar which only allows non-void functions if they
contains a return statement in every possible branching through the function body!
If you can do this then I would be very interested in seeing it - and
so would a lot of other people ;)
But even if you could write such a grammar it would be very big and
hard to read and understand - the added feature of the language would have too high
a cost. So we choose to create a grammar which accepts a slightly larger language
than the one we want and weed out the bad AST's during this weeding phase. So how
do we do this? We run through the AST and for every non-void function we check the
STM nodes which makes out its body to see if every pass through it contains a return
statement. I wrote this function (which can be found in weed.c in the zip):
bool weedSTMCheckReturns(STM *s)
{
switch(s->kind){
case returnK:
return true;
case ifelseK:
return (weedSTMCheckReturns(s->val.ifelseS.thenpart)
&&
weedSTMCheckReturns(s->val.ifelseS.elsepart));
case sequenceK:
if (weedSTMCheckReturns(s->val.sequenceS.first)
== true) {
reportWarning(s->lineno, "Unreachable code");
return true;
} else
return weedSTMCheckReturns(s->val.sequenceS.second);
case scopeK:
return weedSTMCheckReturns(s->val.scopeS.stm);
default:
return false;
}
return false;
}
and that's all!
So let's see that this really does what we want. As you can see it recurses through
the STM nodes of a function body. Remember how the statements are build up in our
AST. They are not linked lists of STM nodes - if a function or program body has more
than one statement then the single STM node in the FUNCTION or PROGRAM node is expanded
using the sequence STM node.
Now there are only very few interesting cases: If a return STM is reached then obviously
the function returns and the recursion is ended and 'true' is returned. If we reach
a if-then-else STM then we recurse on both the 'if' and the 'else' part and requires
that both of these paths contains a return statement. If we reach a sequence STM we
just checks both statements in the sequence. Notice how we get the 'unreachable code'
check for free here - if the first statement in a sequence actually is a return STM
(which is the only way it can return true) then we will never execute the second statement
in the sequence (think about it).
I have decided that unreachable code isn't an error as such so I just report
a warning if I find it - if you want a more strict language you could report an error
which would halt the compilation.Every other statement is covered by the default case
in the switch and the default answer to whether or not a statement is a return is
false.
Notice our conservative choice regarding if sentences (not if-then-else, but those
only containing an if part). Without even checking the body of the if sentence we
return false when we reach it. That means that we will reject functions such as:
int test() {
if (true) {
return 5;
}
}
even though we can see that every possible path through this function will contain a return. The reason is that the boolean condition in the if sentence could be anything, it's only because we can see that it is constant 'true' that we accept this function as valid. The compiler is more general but of course you can always add special code for checking these things and thus allowing a function like the one above. This is called static analysis and is not something I'll cover in this series.
And how do we use the function above? Well, we start our recursion through the AST and when we reach a function node we do the following:
void weedFUNCTION(FUNCTION *f)
{
if(f->formal != NULL)
weedDECL(f->formals);
if(f->stms!= NULL) {
weedSTM(f->stms);
if (weedSTMCheckReturns(f->stms) == false && f->type->kind
!= voidK )
reportError(f->lineno,"Function must
return a value");
}
}
There is another small thing that I remove during the weeding phase and that is the appearance of double scopes. They appear because of my general handling of the places where a new scope might be introduced - for example at a for-sentence. Consider the following two for-sentences:
for (int i=0; i<5; i=i+1)
{ // here I explicitly declare a new scope
a = a * 2;
}
for (int i=0; i<5; i=i+1)
a = a * 2;
for (int i=0; i<5; i=i+1)
{ // automatically added scope
{ // users scope from above
a = a * 2;
}
}
for (int i=0; i<5; i=i+1)
{ // automatically added scope
a = a * 2;
}
Actually it wouldn't be a problem with the extra scope but I don't think it looks
very nice so we might as well weed it out.
This is done in the weedSTM function in weed.c file and the part where
it happens looks like this:
case scopeK:
/* weed 'double scopes' */
if(s->val.scopeS.stm->kind == scopeK){
s->val.scopeS.stm = s->val.scopeS.stm->val.scopeS.stm;
}
weedSTM(s->val.scopeS.stm);
break;
so if we reach a scope STM and the stm pointer in the scope points at another scope node then we simply link the pointer to the second scopes statements.
Urgh, you don't really like the sound of the heading for this section do you? ;)
Well, this chapter has been a bit too easy don't you think? Well, now it is time for
you to mess around with the code on your own for the first time. As you can see I
have actually only analyzed a small part of the AST in this chapter - namely the STM
nodes. Yet I have left code to recurse on a large part of the rest of the AST - for
example expressions and declarations. How come? Well, there are more things you could
weed out in this module and you are the one who is going to do it. I'll give you one
thing to weed out and then you can spend some time thinking about other things you
could catch. The one thing I'll suggest is weeding of division by zero. You decide
how to do it but in its simplest form (namely weeding out division by constant zero)
it should be easy to do. Good luck fooling around with the code...
So, this chapter was fairly easy huh? So what is our current status? Well, about the
same as in chapter 3 except that we fixed a few flaws introduced by our context free
grammar. Another thing, I've put up a small html page on my site where I'll write
small notes now and then when I discover errors in the previous chapters or in the
source code. This saves you from having to re-read the chapters just because of small
changes and provides a fast way to maybe get an answer to a question you might have
if you think you have spotted an error somewhere. If you are reading this you should
know but the address of my homepage is: www.peroxide.dk
Stay tuned for chapter 5 soon..
Keep on codin'
Kasper Fauerby
www.peroxide.dk