Hi there guys. This is the first chapter in a series of tutorials with the goal of
creating a nice scripting language for your computer game engine. The language I have
designed for this series goes under the codename PxdScript and is a subset of the
language I use for my main programming project: Peroxides 3D game engine. During this
series we will write two tools: A compiler for the language and a VM (Virtual Machine),
which will be used to run the compiled script code inside your own engine.
So why have I chosen to split this up in a whole series of tutorials? Couldn't I have
made just one? Well, the reason is that it's the easiest way both for you and me to
have it split up into chapters. Designing a programming language, writing a compiler
for it and writing a VM to run the compiled code is a huge task and it involves a
lot of theory and code (I have all the code ready and it's about 300kb of source code).
If I were to include all of this in one single tutorial it would be so huge that I
would probably never finish writing it. Besides it's also a good idea for you to read
it as a series of tutorials. That way you can read about and understand each step
in the process before moving on to the next and maybe the whole thing won't look as
overwhelming.
Before you start reading you might be interested in knowing what kind of language
you'll end up with when all this is done. What can it do and what can't it do?
Basically PxdScript is a subset of C with some borrowings from C++ to make it more
pleasant to work with. Besides this PxdScript is trying to provide you with an interface
between the script programs you'll write and the variables you want to manipulate
in your game engine. Interfacing a script language with the surrounding game engine
is a whole topic in itself so to keep it simple I'll present a pretty dumb and inflexible
way of manipulating certain types of variables in your game engine. The idea is that
after you finish reading this series you'll understand compilers and virtual machines
well enough to implement your own (better) way of interfacing your game engine - and
to provide you with some inspiration I'll describe my own way of handing this problem
a little later in this chapter (after I've shown you the dumb way first).
Anyway, let take a look at which bells and whistles PxdScript features:
And finally PxdScript suggests a bunch of "trigger types" which defines when a program
should be activated: on scene initialization, when clicking on an object, on collision
with something etc. Again, this is not really dependant on the language - it's just
one of many ways you could possibly handle the triggering of the scripts. You should
modify this to suit your needs in your own engine. The Peroxide 3D engine uses a more
direct calling scheme and does have any triggering information attached to its scripts
- they are simply 'called by name' by the game engine.
PxdScript works like this: A script file can define a number of functions and programs.
The programs are the actual scripts that you need in your game world. The programs
can call the functions if they need to but a program cannot call and activate another
program!
Each program has a trigger attached to it along with a name. The trigger type specifies
which events in the game that will start the program. Once a program is started it'll
run until it either reaches the end of its code block or it is put to sleep. A program
can be put to sleep with the build-in function 'sleep' which takes either one integer
parameter or no parameters. If sleep is called with a parameter that parameter indicates
how many ms the program should sleep before it continues its execution. If no parameter
is given the program sleeps until it is triggered again!
A Script Collection is the top most abstraction in the language and covers
what we understand as a bunch of programs and functions. Each compiled PxdScript file
contains exactly one script collection and the VM will be able to load and contain
exactly one script collection at a time. Here is a small example of a Script Collection:
/* *********** GAME INTERFACE FUNCTIONS ******************* */
void SetPlayerHp(int amount) {
setint(TYPE_PLY, "", 0, amount);
}
void SetPlayerScore(int amount) {
setint(TYPE_PLY, "", 1, amount);
}
void AddScore(int amount) {
setint(TYPE_PLY, "", 2, amount);
}
/* ************** UTILITY FUNCTIONS ***************** */
// Fibonacci numbers. Test recursion int
fib(int a) {
if (a==0 || a==1)
return 1;
else
return (fib(a-2) + fib(a-1));
}
/* ************ PROGRAMS ************************ */
/* A script program running in your game */
program addscore trigger_on_pickup("present.mdl") {
while (true) {
// Ok, so it's strange to represent a score
as fib(8) ;)
AddScore( fib(8));
// Sleep until retriggered
sleep();
}
}
/* Scene constructor */
program init trigger_on_init {
const int PLAYER_HP = 100;
SetPlayerHp(PLAYER_HP);
SetPlayerScore(0);
}
So this script collection contains 4 functions and 2 programs. The first 3 functions
are what I call "Game interface functions". They are not different at all from the
ordinary functions except that they all just have a single call to 'setint' which
is the suggestion I'm going to present for interfacing with some game variables (but
more on this in a while). Then follows a recursive function 'fib' which calculates
Fibonacci numbers. Its use is totally meaningless - it's just there to illustrate
that the language supports recursion.
The 2 programs are different because of their trigger types. The first is a program
that is triggered when someone picks up a game entity named "present.mdl". The idea
is simply that the player should be awarded with some points when he picks up a present.
The program uses a while(true) language construct to stay alive forever but notice
how it uses the 'sleep' without parameters to put the program to sleep until it is
triggered again (when someone picks up another present). Think of an arcade game like
Pac-man - there would be two small programs like this one: one that would add 10 points
when the player moves Pac-man over a normal seed in the labyrinth. The other would
add 100 points and trigger the game mode where Pac-man can eat the ghosts. Then it
would sleep for 30 seconds and set the mode back to normal again. Why script this?
Well, of course Pac-man is used here as a small simple example but lets think big
- we have a 50 man team on this game! Some do the programming (and by definition the
engine programmers knows nothing about game balance right? :) ) and others tweak the
game logic (and also by definition such people can't code). By putting the game logic
in a script the 'tweakers' can fiddle around with the parameters (10 points, 100 points
and 30 seconds) until Pac-man plays well - without recompiling the huge game engine..
The last programs trigger type is "trigger_on_init". This is my suggestion to how
to initialize your scenes through the script language. The trigger type indicates
that the program will be running as soon as it is loaded.
The 'setint' language construct is a dumb way of interfacing with integer variables
in the game engine. As you can see setint takes 4 parameters. The first is a type
which is used to do a first sorting on where to find the integer you wish to modify.
In PxdScript valid types are:
The next parameter is the name of the specific entity you wish to modify. In the example
above the name is set to "" because the type is TYPE_PLY and I'm assuming only one
player (and thus there is no need to assign a name to him). Had the type been TYPE_LIGHT
the name could have been something like "garden_light01", "cellar_light09" or something
similar. Those names would have been set in your level editor.
The third parameter is an enumeration of the integers you can modify on that certain
entity type. In PxdScript this is currently an integer, which isn't very nice! It
would be nicer to be able to use defines such as
#define HITPOINTS 0
and then rewrite the setint instruction in the first function to :
setint(TYPE_PLY, "", HITPOINTS, amount);
but we'll not do this now. Maybe I'll add this in the pre-processing chapter
if I decide to write it (if this series turns out a success). For now you'll have
to accept this as it is. It's not THAT bad however as you can define functions to
interface with the specific integers like in the example above. Then you only have
to write the cryptic 'setint' calls once and for all and let your scripts use the
interface functions.
Now, as promised above I'll give you a little inspiration for how to improve the interfacing
part of PxdScript later. You're not supposed to understand all that I say below (until
the next section that it) but read it now and then come back here later when you've
read and understood the other chapters and hopefully things will be more clear. First
of all - why have I presented such a dumb way of interfacing with the game engine
at all? Couldn't I've done just a little better than that without making things too
complicated? Well, maybe - but if so I didn't think of it so tough luck :) The good
thing about the 'setint' idea is that it can be implemented as a simple build-in function.
It always takes 4 parameters as described above and the type of those are also fixed
so we'll only need this one build-in function. It'll plug in nicely in the language
without requiring any complex modifications to the grammar of the language or any
serious changes in the components of the compiler. In fact, from the languages point
of view the 'setint' construct is very similar to the 'sleep' one for example - which
we'll need in any case.
In the Peroxide 3D engine I've chosen to implement something I call external function
declarations which is similar to what you can do with the 'extern' modifier in C/C++.
It is a promise to the script collection that the functions I declare as external
can be found in the game engine and the script compiler will then, under this assumption,
allow calls to these external functions from the script programs and functions without
having any code for their bodies. These calls are type checked and everything - in
fact the only difference between external functions and normal functions in the script
is that the external ones have no byte codes. When the script collection is loaded
into the VM it'll check if all the declared external functions has actually been exported
from the game engine. If it cannot find an external function in the engine it'll pop
up a runtime error and kill the scripts that uses it. Here is how the script collection
from above would look with this modification:
/* *********** GAME INTERFACE FUNCTIONS ******************* */
external void SetPlayerHp(int);
external void SetPlayerScore(int);
external void AddScore(int);
/* ************** UTILITY FUNCTIONS ***************** */
// Fibonacci numbers. Test recursion
int fib(int a) {
if (a==0 || a==1)
return 1;
else
return (fib(a-2)
+ fib(a-1));
}
/* ************ PROGRAMS ************************ */
/* A script program running in your game */
program addscore {
while (true) {
// Ok, so it's strange to represent a score
as fib(8) ;)
AddScore( fib(8));
// Sleep until retriggered
sleep();
}
}
/* Scene constructor */
constructor {
const int PLAYER_HP = 100;
SetPlayerHp(PLAYER_HP);
SetPlayerScore(0);
}
We're going to write the compiler in C while we'll do the VM in C++. This is partly
because I have reused some code from an older project and partly because we are going
to use two tools, which will provide us with C code, we'll need in our compiler project.
Those tools are 'Flex' and 'Bison' which are standard tools used in compiler creation
to "automatically" generate C code for some of the steps in the compilation. Both
are free tools and can be downloaded from the Internet (search for them).
For those with a good Internet connection there is an easier (and better) way to get
the tools - you should download and install 'Cygwin' which provides you with a "Linux"
interface on top of Windows. Don't worry, it won't change the appearance of Windows
or anything - it works just like any other windows program. You click an icon and
a window with a Linux-style 'bash' pops up (a Linux command line prompt). Included
with the Cygwin package are also all the tools we'll need; Flex, Bison, gcc (a free
C compiler) and Make along with a huge bunch of other useful Linux tools (an extensive
C reference manual for example).
Anyway, for the remainder of this series I'll assume that you have Cygwin installed.
It is NOT required, you can just download Flex and Bison and use your favorite compiler
but it is highly recommended.
So, how do you install and use Cygwin? Installation could not be easier! Follow these
simple steps and you should be set up in no time (or as long as it takes to download
approx. 70MB ;) (as of this writing):
Now a quick Linux crash course so you'll be able to navigate in the Cygwin bash.
When you first start Cygwin you'll be at your 'home directory'. Linux is multi-user
OS and each user is given a home directory, which serves as the root directory from
where he operates. Lets see which files are currently in your home directory. Type
'ls' and hit enter. Well - fairly empty eh? Lets create a new directory for our compiler
project. Type 'mkdir pxdscript' and hit enter. Now try typing 'ls' again to see the
result of your actions. We want to change the current directory to the newly created
one so we type 'cd pxdscript'. To back up one dir you type 'cd ..' (remember the space).
Now for a neat feature of Linux - the auto completing of filenames! If you have written
a part of a filename and hit the tab key Linux will try to finish the word for you.
To test this try to change into the new dir one more time but this time just type
'cd pxd' and hit tab. See how it completes the directory name for you? Neat huh?
Another tool, which is quite nice, is the 'man' tool (short for manual - Linux nerds
don't like names longer than 3 letters ;). From the bash you can type 'man' followed
by ANY word you want help on. And this help system isn't like the Windows one - it
actually has useful information. For example you can look up most (if not all) C commands
using man. If you want to be really well prepared for chapter 2 of this series you
could type 'man flex' and read a little about the tool we're going to use in that
chapter. You quit the man tool by pressing 'q'.
Finally notice how you can scroll through the last commands you wrote to the bash
by pressing the up and down arrows.
You can access your Cygwin homedir from outside Cygwin (from a Windows explorer for
example) to copy files in and out of it. With my installation the path to my homedir
is
c:\cygwin\home\Telemachos
Ok, so how does one program a compiler after all? To many it might sound like a quite
overwhelming task. Is it hard? Well, if hard means that the code is hard to understand
then writing a compiler isn't hard. The code is for the most part very simple and
tends to invite to a lot of cut'n'paste. Writing a compiler for the first time is
quite a task though, mainly if your goal is to understand what's going on (and I assume
it is as you are reading this text). The first compiler I wrote took me and two other
guys 3 months to complete, the next (which was the u1script one) took me 3 days! One
reason was that I could re-use a lot of the code from the first one. There is a lot
of code, which really appears in about every compiler for C-style languages, and through
this series you'll learn how to write that code. So why do I mention this? Well, what
I'm trying to do is to motivate you to REALLY understanding every step in this series
BEFORE moving on to the next! You can easily just rip the code I give you but in the
end you'll thank yourself for spending the time needed to understand what's going
on because this will probably not be the last compiler you'll have to write.
Well, enough of that - lets take a look at what a compiler actually does under the
hood. A compiler is traditionally put together as a bunch of modules that the source
code passes through before it is emitted in its compiled form. These modules are split
up in two parts: a 'front-end' and a 'back-end'.
The front-end is the part where the source is analyzed to make sure it does not contain
any errors. The first step in this analysis is to convert the source code from a text
file into a tree-like structure where we can modify and analyze it better than we
can when it is in its text form. Such a tree is called an AST - Abstract Syntax Tree
- and the process of converting a program from text form into such an AST is what
we'll do in Chapter 2 and 3. The rest of the front-end consists of separate modules
analyzing this AST, each module focusing on catching a certain type of errors. These
modules include weeding, symbol checking and type checking and I'll cover each of
these steps in chapters 4 to 6.
When the compiler reaches its 'backend' it knows that the program is correct. There
are no syntax errors, no type errors, every symbol (variable or function) referenced
can be found etc. Now it concentrates on converting the AST into an assembly representation
of the program, which it can then emit to a file. The assembler will then read this
file and assemble the code into its final representation - the byte code for our VM.
This is covered in chapters 7 to 10.
All in all PxdScript should cover pretty much every common feature from a real C-style
programming language. You are encouraged to add even more features as you go along.
It will be a pretty good exercise though to add support for floating point variables
as you read the chapters. If you can do that for each component of the compiler you've
probably understood that part of the compiler pretty well. On the other hand - if
you have no idea how to do this you should probably read the chapter again :)
So as described above a compiler takes the program source through a lot of separate steps before it emits the compiled code. Based on this I'll write a list of the chapters this series will consist of eventually:
Well, you should now be well prepared for task at hand. You have the tools installed,
you have an idea of how this is going to happen and you are probably now only MORE
anxious to get started on the actual code. Next time I promise that there will less
talk and more code but I felt these things had to be said before we started for real.
Anyway, as you can see there is a lot of work to be done before you guys have a running
script compiler and virtual machine. I hope most of you will jump on this series and
read the chapters as they come out because that'll really be the best way for you
to learn the stuff.
Also be sure to give me some feedback after each chapter: What did I do well and what
did I explain badly? If I hear these things BEFORE writing the next chapter then I
can use it to make the rest better. If I hear them after the series is completed it's
probably just too bad ;)
Keep on codin'
Kasper Fauerby
www.peroxide.dk