PxdScript language tutorial series.

Chapter 1 - Introduction:

by Kasper Fauerby

Introduction

Hi there guys. This is the first chapter in a series of tutorials with the goal of creating a nice scripting language for your computer game engine. The language I have designed for this series goes under the codename PxdScript and is a subset of the language I use for my main programming project: Peroxides 3D game engine. During this series we will write two tools: A compiler for the language and a VM (Virtual Machine), which will be used to run the compiled script code inside your own engine.

So why have I chosen to split this up in a whole series of tutorials? Couldn't I have made just one? Well, the reason is that it's the easiest way both for you and me to have it split up into chapters. Designing a programming language, writing a compiler for it and writing a VM to run the compiled code is a huge task and it involves a lot of theory and code (I have all the code ready and it's about 300kb of source code). If I were to include all of this in one single tutorial it would be so huge that I would probably never finish writing it. Besides it's also a good idea for you to read it as a series of tutorials. That way you can read about and understand each step in the process before moving on to the next and maybe the whole thing won't look as overwhelming.

The Language

Before you start reading you might be interested in knowing what kind of language you'll end up with when all this is done. What can it do and what can't it do?

Basically PxdScript is a subset of C with some borrowings from C++ to make it more pleasant to work with. Besides this PxdScript is trying to provide you with an interface between the script programs you'll write and the variables you want to manipulate in your game engine. Interfacing a script language with the surrounding game engine is a whole topic in itself so to keep it simple I'll present a pretty dumb and inflexible way of manipulating certain types of variables in your game engine. The idea is that after you finish reading this series you'll understand compilers and virtual machines well enough to implement your own (better) way of interfacing your game engine - and to provide you with some inspiration I'll describe my own way of handing this problem a little later in this chapter (after I've shown you the dumb way first).

Anyway, let take a look at which bells and whistles PxdScript features:

And finally PxdScript suggests a bunch of "trigger types" which defines when a program should be activated: on scene initialization, when clicking on an object, on collision with something etc. Again, this is not really dependant on the language - it's just one of many ways you could possibly handle the triggering of the scripts. You should modify this to suit your needs in your own engine. The Peroxide 3D engine uses a more direct calling scheme and does have any triggering information attached to its scripts - they are simply 'called by name' by the game engine.

PxdScript works like this: A script file can define a number of functions and programs. The programs are the actual scripts that you need in your game world. The programs can call the functions if they need to but a program cannot call and activate another program!
Each program has a trigger attached to it along with a name. The trigger type specifies which events in the game that will start the program. Once a program is started it'll run until it either reaches the end of its code block or it is put to sleep. A program can be put to sleep with the build-in function 'sleep' which takes either one integer parameter or no parameters. If sleep is called with a parameter that parameter indicates how many ms the program should sleep before it continues its execution. If no parameter is given the program sleeps until it is triggered again!

A Script Collection is the top most abstraction in the language and covers what we understand as a bunch of programs and functions. Each compiled PxdScript file contains exactly one script collection and the VM will be able to load and contain exactly one script collection at a time. Here is a small example of a Script Collection:

/* *********** GAME INTERFACE FUNCTIONS ******************* */
void SetPlayerHp(int amount) {
    setint(TYPE_PLY, "", 0, amount);
}

void SetPlayerScore(int amount) {
    setint(TYPE_PLY, "", 1, amount);
}

void AddScore(int amount) {
    setint(TYPE_PLY, "", 2, amount);
}

/* ************** UTILITY FUNCTIONS ***************** */
// Fibonacci numbers. Test recursion int
fib(int a) {
    if (a==0 || a==1)
        return 1;
    else
        return (fib(a-2) + fib(a-1));
}

/* ************ PROGRAMS ************************ */

/* A script program running in your game */
program addscore trigger_on_pickup("present.mdl") {
    while (true) {

        // Ok, so it's strange to represent a score as fib(8) ;)
        AddScore( fib(8));

        // Sleep until retriggered
        sleep();
    }
}

/* Scene constructor */
program init trigger_on_init {
    const int PLAYER_HP = 100;

    SetPlayerHp(PLAYER_HP);
    SetPlayerScore(0);
}

So this script collection contains 4 functions and 2 programs. The first 3 functions are what I call "Game interface functions". They are not different at all from the ordinary functions except that they all just have a single call to 'setint' which is the suggestion I'm going to present for interfacing with some game variables (but more on this in a while). Then follows a recursive function 'fib' which calculates Fibonacci numbers. Its use is totally meaningless - it's just there to illustrate that the language supports recursion.

The 2 programs are different because of their trigger types. The first is a program that is triggered when someone picks up a game entity named "present.mdl". The idea is simply that the player should be awarded with some points when he picks up a present. The program uses a while(true) language construct to stay alive forever but notice how it uses the 'sleep' without parameters to put the program to sleep until it is triggered again (when someone picks up another present). Think of an arcade game like Pac-man - there would be two small programs like this one: one that would add 10 points when the player moves Pac-man over a normal seed in the labyrinth. The other would add 100 points and trigger the game mode where Pac-man can eat the ghosts. Then it would sleep for 30 seconds and set the mode back to normal again. Why script this? Well, of course Pac-man is used here as a small simple example but lets think big - we have a 50 man team on this game! Some do the programming (and by definition the engine programmers knows nothing about game balance right? :) ) and others tweak the game logic (and also by definition such people can't code). By putting the game logic in a script the 'tweakers' can fiddle around with the parameters (10 points, 100 points and 30 seconds) until Pac-man plays well - without recompiling the huge game engine..

The last programs trigger type is "trigger_on_init". This is my suggestion to how to initialize your scenes through the script language. The trigger type indicates that the program will be running as soon as it is loaded.

The 'setint' language construct is a dumb way of interfacing with integer variables in the game engine. As you can see setint takes 4 parameters. The first is a type which is used to do a first sorting on where to find the integer you wish to modify. In PxdScript valid types are:

The next parameter is the name of the specific entity you wish to modify. In the example above the name is set to "" because the type is TYPE_PLY and I'm assuming only one player (and thus there is no need to assign a name to him). Had the type been TYPE_LIGHT the name could have been something like "garden_light01", "cellar_light09" or something similar. Those names would have been set in your level editor.

The third parameter is an enumeration of the integers you can modify on that certain entity type. In PxdScript this is currently an integer, which isn't very nice! It would be nicer to be able to use defines such as

#define HITPOINTS 0

and then rewrite the setint instruction in the first function to :

setint(TYPE_PLY, "", HITPOINTS, amount);

but we'll not do this now. Maybe I'll add this in the pre-processing chapter if I decide to write it (if this series turns out a success). For now you'll have to accept this as it is. It's not THAT bad however as you can define functions to interface with the specific integers like in the example above. Then you only have to write the cryptic 'setint' calls once and for all and let your scripts use the interface functions.

Now, as promised above I'll give you a little inspiration for how to improve the interfacing part of PxdScript later. You're not supposed to understand all that I say below (until the next section that it) but read it now and then come back here later when you've read and understood the other chapters and hopefully things will be more clear. First of all - why have I presented such a dumb way of interfacing with the game engine at all? Couldn't I've done just a little better than that without making things too complicated? Well, maybe - but if so I didn't think of it so tough luck :) The good thing about the 'setint' idea is that it can be implemented as a simple build-in function. It always takes 4 parameters as described above and the type of those are also fixed so we'll only need this one build-in function. It'll plug in nicely in the language without requiring any complex modifications to the grammar of the language or any serious changes in the components of the compiler. In fact, from the languages point of view the 'setint' construct is very similar to the 'sleep' one for example - which we'll need in any case.

In the Peroxide 3D engine I've chosen to implement something I call external function declarations which is similar to what you can do with the 'extern' modifier in C/C++. It is a promise to the script collection that the functions I declare as external can be found in the game engine and the script compiler will then, under this assumption, allow calls to these external functions from the script programs and functions without having any code for their bodies. These calls are type checked and everything - in fact the only difference between external functions and normal functions in the script is that the external ones have no byte codes. When the script collection is loaded into the VM it'll check if all the declared external functions has actually been exported from the game engine. If it cannot find an external function in the engine it'll pop up a runtime error and kill the scripts that uses it. Here is how the script collection from above would look with this modification:

/* *********** GAME INTERFACE FUNCTIONS ******************* */
    external void SetPlayerHp(int);
    external void SetPlayerScore(int);
    external void AddScore(int);

/* ************** UTILITY FUNCTIONS ***************** */

    // Fibonacci numbers. Test recursion
    int fib(int a) {
        if (a==0 || a==1)
            return 1;
        else
            return (fib(a-2) + fib(a-1));
    }

/* ************ PROGRAMS ************************ */ 

/* A script program running in your game */
    program addscore {
        while (true) {
        // Ok, so it's strange to represent a score as fib(8) ;)
        AddScore( fib(8));

        // Sleep until retriggered
        sleep();
    }
}

/* Scene constructor */
constructor {
    const int PLAYER_HP = 100;

    SetPlayerHp(PLAYER_HP);
    SetPlayerScore(0);
}



In the game engine I would then export the three required functions by adding them to a list of exported functions. The VM would look the correct C++ function up, retrieve the arguments to it from the script and finally execute it with those arguments.

How is this done? Well, after reading through this series you should have a pretty good idea but here's a small hint. You'll have to introduce a new type of toplevel to the grammar of the language called 'external' or something. This toplevel should be almost identical to the 'function' on that I've shown you - except you'll have to remove the bytecode parts.

The really cool thing about this solution is that your script language won't contain any engine-specific opcodes. If you need to use the script language for another engine it's easy to hook it up by just defining a new list of external functions that can be found in this particular game engine.

The code for the compiler

We're going to write the compiler in C while we'll do the VM in C++. This is partly because I have reused some code from an older project and partly because we are going to use two tools, which will provide us with C code, we'll need in our compiler project. Those tools are 'Flex' and 'Bison' which are standard tools used in compiler creation to "automatically" generate C code for some of the steps in the compilation. Both are free tools and can be downloaded from the Internet (search for them).

For those with a good Internet connection there is an easier (and better) way to get the tools - you should download and install 'Cygwin' which provides you with a "Linux" interface on top of Windows. Don't worry, it won't change the appearance of Windows or anything - it works just like any other windows program. You click an icon and a window with a Linux-style 'bash' pops up (a Linux command line prompt). Included with the Cygwin package are also all the tools we'll need; Flex, Bison, gcc (a free C compiler) and Make along with a huge bunch of other useful Linux tools (an extensive C reference manual for example).

Anyway, for the remainder of this series I'll assume that you have Cygwin installed. It is NOT required, you can just download Flex and Bison and use your favorite compiler but it is highly recommended.

So, how do you install and use Cygwin? Installation could not be easier! Follow these simple steps and you should be set up in no time (or as long as it takes to download approx. 70MB ;) (as of this writing):

Now a quick Linux crash course so you'll be able to navigate in the Cygwin bash.

When you first start Cygwin you'll be at your 'home directory'. Linux is multi-user OS and each user is given a home directory, which serves as the root directory from where he operates. Lets see which files are currently in your home directory. Type 'ls' and hit enter. Well - fairly empty eh? Lets create a new directory for our compiler project. Type 'mkdir pxdscript' and hit enter. Now try typing 'ls' again to see the result of your actions. We want to change the current directory to the newly created one so we type 'cd pxdscript'. To back up one dir you type 'cd ..' (remember the space). Now for a neat feature of Linux - the auto completing of filenames! If you have written a part of a filename and hit the tab key Linux will try to finish the word for you. To test this try to change into the new dir one more time but this time just type 'cd pxd' and hit tab. See how it completes the directory name for you? Neat huh?

Another tool, which is quite nice, is the 'man' tool (short for manual - Linux nerds don't like names longer than 3 letters ;). From the bash you can type 'man' followed by ANY word you want help on. And this help system isn't like the Windows one - it actually has useful information. For example you can look up most (if not all) C commands using man. If you want to be really well prepared for chapter 2 of this series you could type 'man flex' and read a little about the tool we're going to use in that chapter. You quit the man tool by pressing 'q'.

Finally notice how you can scroll through the last commands you wrote to the bash by pressing the up and down arrows.

You can access your Cygwin homedir from outside Cygwin (from a Windows explorer for example) to copy files in and out of it. With my installation the path to my homedir is

c:\cygwin\home\Telemachos

Description of a compiler

Ok, so how does one program a compiler after all? To many it might sound like a quite overwhelming task. Is it hard? Well, if hard means that the code is hard to understand then writing a compiler isn't hard. The code is for the most part very simple and tends to invite to a lot of cut'n'paste. Writing a compiler for the first time is quite a task though, mainly if your goal is to understand what's going on (and I assume it is as you are reading this text). The first compiler I wrote took me and two other guys 3 months to complete, the next (which was the u1script one) took me 3 days! One reason was that I could re-use a lot of the code from the first one. There is a lot of code, which really appears in about every compiler for C-style languages, and through this series you'll learn how to write that code. So why do I mention this? Well, what I'm trying to do is to motivate you to REALLY understanding every step in this series BEFORE moving on to the next! You can easily just rip the code I give you but in the end you'll thank yourself for spending the time needed to understand what's going on because this will probably not be the last compiler you'll have to write.

Well, enough of that - lets take a look at what a compiler actually does under the hood. A compiler is traditionally put together as a bunch of modules that the source code passes through before it is emitted in its compiled form. These modules are split up in two parts: a 'front-end' and a 'back-end'.

The front-end is the part where the source is analyzed to make sure it does not contain any errors. The first step in this analysis is to convert the source code from a text file into a tree-like structure where we can modify and analyze it better than we can when it is in its text form. Such a tree is called an AST - Abstract Syntax Tree - and the process of converting a program from text form into such an AST is what we'll do in Chapter 2 and 3. The rest of the front-end consists of separate modules analyzing this AST, each module focusing on catching a certain type of errors. These modules include weeding, symbol checking and type checking and I'll cover each of these steps in chapters 4 to 6.

When the compiler reaches its 'backend' it knows that the program is correct. There are no syntax errors, no type errors, every symbol (variable or function) referenced can be found etc. Now it concentrates on converting the AST into an assembly representation of the program, which it can then emit to a file. The assembler will then read this file and assemble the code into its final representation - the byte code for our VM. This is covered in chapters 7 to 10.

All in all PxdScript should cover pretty much every common feature from a real C-style programming language. You are encouraged to add even more features as you go along. It will be a pretty good exercise though to add support for floating point variables as you read the chapters. If you can do that for each component of the compiler you've probably understood that part of the compiler pretty well. On the other hand - if you have no idea how to do this you should probably read the chapter again :)

Checklist

So as described above a compiler takes the program source through a lot of separate steps before it emits the compiled code. Based on this I'll write a list of the chapters this series will consist of eventually:

Final words

Well, you should now be well prepared for task at hand. You have the tools installed, you have an idea of how this is going to happen and you are probably now only MORE anxious to get started on the actual code. Next time I promise that there will less talk and more code but I felt these things had to be said before we started for real.

Anyway, as you can see there is a lot of work to be done before you guys have a running script compiler and virtual machine. I hope most of you will jump on this series and read the chapters as they come out because that'll really be the best way for you to learn the stuff.

Also be sure to give me some feedback after each chapter: What did I do well and what did I explain badly? If I hear these things BEFORE writing the next chapter then I can use it to make the rest better. If I hear them after the series is completed it's probably just too bad ;)

Keep on codin'
  Kasper Fauerby


www.peroxide.dk