The latest beta of Construct 3 includes a major new optimisation for the C3 runtime: it now compiles expressions to JavaScript. This improves even further the already impressive performance gains of the new engine. Here's a little history, how the new expression compiler works, and some benchmark results towards the end, such as a 34% improvement on the Bunnymark benchmark.
Expressions in the C2 runtime
In the old C2 runtime, expressions were represented as an Abstract Syntax Tree (AST). This is basically a tree representation of expressions. For example myVar + 1
will be represented as an "add" node with "myVar" and "1" as the children, which can be drawn as a diagram like this:
The expression can then be calculated by starting at the top (root) node, and doing the sum. The "add" node will calculate its left value (retrieving myVar
), calculate its right value (retrieving 1
), and then add them together.
More complex equations build up a bigger tree with more nodes. For example myVar * 2 + 1
can be represented like this:
This time when the "add" node calculates its left value, it first calculates the "multiply" node, which retrieves myVar
and 2
, multiplies them together, and then passes the result back to the "add" node to add 1
. Note how this respects the order of operations: the addition is done last, since it's at the top. Ultimately any kind of equation can be represented this way.
In the C2 runtime, expressions were evaluated this way - using the tree method. Each node in the tree is represented by a JavaScript object, and the arrows are dynamic function calls to say "get the value". The function calls are dynamic since "get the value" does something different for each kind of node. A downside of dynamic calls is they have a higher performance overhead, since the function that is actually called is not known ahead of time. This also makes them harder to optimise, since dynamic calls are difficult to inline (a kind of optimisation that essentially copy-pastes the contents of a function to the place it is called).
The C2 runtime also returned values from nodes as part of a "result" JavaScript object, which included a tag for the type of value (e.g. number or string), as well as the value itself. So there's a lot of JavaScript objects involved, and a lot of dynamic calls happening!
For many projects, this approach works fine. It's not until you start doing complex calculations with lots of loops or function calls, or handling thousands of instances all calculating expressions individually, where the overhead of this approach starts to become significant.
The initial C3 runtime approach
We first released an alpha of the C3 runtime around a month ago in r95. In that release the C3 runtime used much the same approach as the C2 runtime - it was more or less a direct port of the C2 code.
However there was one minor optimisation: instead of returning values from nodes as part of a "result" JavaScript object, it just returns the value directly. For example the "1" node in the previous example would have returned a JavaScript object that tags a value as a "number" and includes 1
as the value. However in JavaScript this is pretty redundant! JavaScript has its own type system, such as using the typeof
operator to check if a value is a number or a string. So in the C3 runtime the node just returns a JavaScript value of 1
directly. This eliminates all the "result" JavaScript objects entirely and removes their overhead from calculating expressions, which is a nice performance bonus.
Similarities between expressions and JavaScript
Once real values are being passed around like this, it suddenly becomes clear that the way it's calculating expressions isn't too far off what JavaScript itself does. For example while Construct's expressions aren't actually JavaScript code, myVar + 1
still looks like JavaScript code, which calculates the expression in a similar way: retrieve the values on both sides, and then add them. However there are two big advantages to writing JavaScript code like that:
- There are no dynamic function calls. The values on either side are accessed directly, which is faster.
- JavaScript engines have incredibly sophisticated optimisers in their JIT compilers. For code like that, assuming both sides really are numbers, they can usually compile it to a native CPU instruction for addition. This is not possible with the tree approach, since it's difficult for the optimiser to see past all the dynamic function calls and work out what the types really are.
Rewriting Construct's expressions this way eliminates the entire AST and all its JavaScript objects for nodes, and eliminates the dynamic function calls between them. It moves the calculation directly in to JavaScript code where it can be fully optimised by the JavaScript engine, often down to optimal type-specific CPU instructions. In fact the AST approach is how many programming languages like JavaScript work internally, so having an AST for expressions is arguably another level of redundancy. Instead of having Construct's AST on top of JavaScript's AST, it could just use JavaScript's AST directly. However there are difficulties involved in this too.
Differences between expressions and JavaScript
As mentioned, Construct uses its own expression syntax - it's not actually JavaScript code.
One key difference is operators are not fully compatible. For example the JavaScript +
operator actually does all sorts of things, depending on if there are numbers, strings or even objects being added, often with odd results. It's only guaranteed to do numerical addition if both sides of the addition are numbers. Construct's +
operator only does numerical addition (the &
operator does string concatenation). Further, Construct does have dynamic typing in a few places, where it's not known in advance if you'll be adding numbers or something else. If you try to add things that aren't numbers, Construct treats a + b
as just returning a
. This is a notable divergence from how JavaScript works. Construct and Javascript only do the same thing with +
when both sides are definitely numbers - but in that case they then work identically.
Another significant difference is that an expression like Sprite.X
doesn't actually just return one value. If you run an action affecting 100 instances of Sprite, then an expression including Sprite.X
will be evaluated 100 times, each time returning a different Sprite's X co-ordinate. It's more like a function being called on multiple different instances. So while Sprite.X
does look like valid JavaScript syntax, what it does in Construct is completely different to JavaScript. Similarly, behavior expressions and instance variable expressions aren't directly translatable to JavaScript. System expressions and event variable references are more similar, but still aren't directly translatable. These kinds of expressions are probably the most tricky aspect of compiling expressions to JavaScript: they need to be written with significantly different JavaScript code.
Motivation
The AST approach is a good place to start. It matches how the editor handles expressions, making it straightforward, and it is also easy to customise it to work how we want.
A wide range of optimisations had already been implemented for the C3 runtime, including across the entire event engine. However expression evaluation itself so far only had a minor improvement to remove the "result" objects. If the rest of the engine speeds up more than expression evaluation, then it makes expression evaluation slower relative to the rest of the engine. In other words, it becomes a more significant bottleneck.
Consequently we'd seen expression evaluation ranking as relatively more time-consuming in performance benchmarks since the release of the C3 runtime - even if it was running faster overall. For example the "bunnymark" test ends up bottlenecked on how fast it can calculate expressions for the positions of thousands of instances. Some of our event-based performance benchmarks from our original C3 runtime announcement also spend relatively more time evaluating expressions, such as the calculation to find prime numbers in the primefind benchmark.
In the Chowdren forum thread a "fake 3D" project also turned up that was bottlenecked on expression evaluation while rotating the camera. It was quite inefficiently designed, repeating the same complex calculation many times only to produce a Y offset, but it's interesting to see this kind of thing - it shows it's a bottleneck "in the wild" in real projects, and the project provides a good independently-made sample to benchmark any optimisations with. To give you an idea of the complexity, it calculates positions according to X = pOBJ_Car.X + (Self.SliceFrame + (pOBJ_Car.ZScale * ZScale)) * cos(LayoutAngle - 90)
and Y = pOBJ_Car.Y + ((Self.SliceFrame * pOBJ_Car.ZScale) * ZScale) * sin(LayoutAngle - 90)
repeatedly for hundreds of instances.
So we'd essentially run in to the performance limitations of the AST approach. In one case I had seen a profile with a multiply node taking up 5% of overall execution time - just to multiply two numbers. After a few office conversations about compilers, and thinking carefully about how to overcome the differences to JavaScript, we set about building an all-new expression-to-JavaScript compiler for Construct. This is a pretty radical change: we ended up deleting almost all of the code for the tree-based approach, and writing entirely new code in its place!
Compiling expressions to JavaScript
So as of the latest betas (r101+), Construct now compiles expressions to JavaScript - and it brings some big performance gains. But first - how does it work?
Efficient and compatible operators
First let's look at how operators like addition are compiled efficiently and 100% compatible to the old approach. The obvious solution is to use functions instead of operators. The function can then implement addition exactly like Construct does. For example myVar + 1
can be compiled as JavaScript code add(getMyVar(), 1)
. (Note the JavaScript code examples I show in this blog aren't exactly what Construct outputs, but they illustrate the point.) Then we write the add()
function in JavaScript to do exactly what the "add" node would have done previously - which makes it 100% compatible. This is important to make sure the new approach doesn't break any existing games, which people rightly get upset about.
However the add()
function still does extra work other than addition - it needs to check both values really are numbers. Also if it's not inlined, there's still the overhead of a function call. Can we use the JavaScript +
operator instead?
It turns out - yes, in many cases we can! Construct already has a way of evaluating expressions only looking at the type. It uses this to validate parameters, such as to ensure you put a number (or something that could be a number) for a number parameter like an X co-ordinate. However this same approach can be used to look at the expression and say: yes, both sides are definitely a number. (In this case we're assuming myVar is a number event variable.)
So in these cases - which is most of the time - when Construct can prove both sides of +
are a number, it compiles directly to getMyVar() + 1
. That's very close to hand-written JavaScript. There is zero overhead from function calls to calculate addition, and the JIT compiler will see addition with two numbers, and likely compile that to the optimal CPU instruction. It's pretty much as optimal as it can be.
If Construct can't prove both sides are a number, which is relatively rare, it falls back to the add()
function. This is still faster than the tree approach though, since it's a static rather than dynamic function call. The end result is the best of both worlds: most of the time expressions compile down to the equivalent JavaScript operators, which is optimal, and the rest of the time it gets a 100% backwards compatible implementation - which is still faster!
Compiling system expressions
System expressions basically compile down to function calls. For example the dt
expression can compile to getDt()
. To make sure the expression method is run correctly, we bind the expression method that returns dt to the system plugin, so we don't need to do something like system.getDt()
. It provides a single standalone function to directly call and get the expression result. We even use the same approach for "single global" plugins like Function, where there is guaranteed to only ever be one instance of the plugin that expressions are ever run on. This makes sure expressions like Function.Call()
and Function.Param()
are compiled to a single function call that directly runs Function plugin code on the right instance, entirely bypassing any event code to identify the picked instance.
Interestingly, expression parameters can simply be independently compiled. For example the expression distance(0, 0, myX, myY)
simply compiles each parameter to JavaScript and inserts it to the function parameters, resulting in JavaScript code similar to getDistance(0, 0, getMyX(), getMyY())
. Previously there was code for this to gather parameters in to an array and pass them to the expression; now this too is entirely bypassed and parameters are added directly to the call.
Finally there are some system expressions which are pretty much identical to JavaScript functions. For example the sqrt
system expression simply calls JavaScript's Math.sqrt
. So rather than call a system expression function that calls Math.sqrt
, we added an extra step to simply insert a call to the equivalent JavaScript function. So an expression like sqrt(myVar)
is compiled to Math.sqrt(getMyVar())
- directly calling the well-optimised JavaScript library function. Many of the basic system expressions compile directly to JavaScript equivalents like this!
Compiling object expressions
Object expressions like Sprite.X
are probably the trickiest part. The same approach is used for behavior and instance variable expressions, but for simplicity we'll only cover object expressions.
Evaluating an object expression involves knowing which instances are picked by the event, and which is the current instance the expression should be calculating for. This involves too many runtime details to split off in to a simple function like add()
. Instead we use a closure: the JavaScript function that calculates the expression is itself put inside another function, which declares a variable that points at essentially the same tree node that would have been used before. In code it looks roughly like this for an expression like Sprite.X + 1
:
function getExpression(runtime)
{
// Get the tree node for Sprite.X
const spriteXNode = runtime.GetSpriteXNode();
// Return another function that calls that
return function evaluateExpression()
{
// This is what calculates Sprite.X + 1
return spriteXNode.Get() + 1;
};
}
On startup the runtime calls getExpression(runtime)
, and it returns a new function. Call that, and it evaluates the actual expression. It's like creating a specialised expression function on startup. The runtime can also repeatedly call it for different instances so it works correctly when actions run on multiple instances. It's still faster: again the call is static rather than dynamic, any parameters can be directly inserted, and the rest of the expression evaluates without any tree overhead.
Another interesting trick is the actual call to spriteXNode.Get()
can be tweaked depending on the circumstances. For example retrieving a family instance variable involves a small amount of extra code to look up the variable. Construct can compile different calls for each case, like spriteVariableNode.GetFamily()
and sprtieVariableNode.GetNonFamily()
- the GetNonFamily variant omitting the extra lookup code. This means Construct can compile JavaScript code that automatically uses the most efficient function, without needing any dynamic checks at runtime!
Deduplicating functions
One more bonus is you might use an expression like myVar + 1
in 10 times across your events. Each of these will compile to a JavaScript function like getMyVar() + 1
. There's no point exporting 10 identical functions. Construct notices these functions are all identical so just exports a single JavaScript function, and re-uses it across the events where the same expression appears. This saves memory and ensures the JavaScript engine only needs to optimise the function once.
A real example
Let's take the "fake 3D" expression that calculates an X co-ordinate from earlier. This is what it looks like as a Construct expression:
pOBJ_Car.X + (Self.SliceFrame + (pOBJ_Car.ZScale * ZScale)) * cos(LayoutAngle - 90)
Here is the actual JavaScript code Construct exports to evaluate this expression - having tidied up only the names and parentheses for clarity, neither of which affects execution:
pOBJ_CarX.Get() + (selfSliceFrame.Get() + (pOBJ_CarZScale.Get() * ZScale.Get())) * Math.cos(C3.toRadians(getLayoutAngle() - 90))
Notice how close the JavaScript is to the expression:
- There is a single function call per object/variable expression
- It uses JavaScript math operators directly, like
+
and *
- The
cos
system expression was compiled directly to a Math.cos
call
- The parameter to
cos
was inserted directly, complete with degrees-to-radians conversion (since JavaScript's Math works in radians)
The tree approach is entirely gone! It simply runs analogous JavaScript code, which compiles to very efficient machine code.
Performance results
So what kind of performance improvement does this bring? Quite a lot!
Let's take "bunnymark", which spawns thousands of bouncing bunnies. In the C3 runtime, this was originally bottlenecked on calculating the positions of each instance. How have things improved? I've labelled the original C3 runtime as "C3" and the updated C3 runtime with the expression-to-JavaScript compiler as "C3+" in the graphs, and added the C2 runtime results for comparison.
Compiling expressions to JavaScript improves performance by 34% in this test. That's quite a boost! The C3 runtime was originally 26% faster than the C2 runtime, but with compiled expressions, it's now 70% faster than the C2 runtime!
Next let's re-evaluate the primefind benchmark that we used in our original C3 runtime announcement blog. This is quite intensive with calculations to identify prime numbers. How many iterations can it run in 10s?
Compiling expressions improves performance by 34% again for this test. The improvement over the C2 runtime increases from 2.3x faster to over 3x faster than the C2 runtime.
Next let's re-evaluate another test from the same blog post - using the Function object to evaluate the 30th fibonacci number with a deliberately intensive algorithm. This test is timing-based, so quicker results with lower numbers are better.
Compiling expressions improves performance by 18% for this test. The improvement over the C2 runtime increases from 2.5x faster to 3x faster than the C2 runtime.
Let's go back to that fake 3D project that was slow while rotating the camera. It's good to check projects that users have made since it's a more realistic example and isn't something we came up with ourselves. We measured the framerate while rotating the camera with 3000 instances on-screen.
Compiling expressions increases the framerate by 55% - the best result yet! The C3 runtime was already 29% faster, but this boosts it to 2x faster than the C2 runtime. Since the project was inefficiently designed, it's likely it can be brought up to 60 FPS by optimising the events themselves, especially with compiled expressions.
Conclusion
Our new architecture for compiling expressions to JavaScript in the C3 runtime extends already huge performance gains even further - up to 50% faster in one case. Compared to the C2 runtime, already large improvements are now even bigger. Meanwhile thanks to careful engineering, we've preserved exact backwards-compatibility with the existing runtime, which itself was designed to be exactly backwards-compatible with the C2 runtime. We've checked full games like Kiwi Story, Pop Lab and Demonoire and verified they work identically with the new expression compiler. So these performance gains should come without affecting how games run at all.
The obvious next question is: why not compile all the events to JavaScript? The answer is this becomes phenomenally complicated, and may not help as much as it did with expressions. The entire event engine is far more complex than simply calculating expressions. As mentioned, expressions are pretty close to JavaScript to begin with. Events on the other hand are not: they are very different indeed. This blog covered how we still had to deal with some tricky hurdles just to compile expressions to JavaScript, and the challenges for compiling events themselves are far more difficult and numerous. We've already seen that third-party compilers don't attempt to support all features because it is too much work; we definitely appreciate that, given the scope of the job. Additionally the performance gains are less clear: the overhead of a function call is big compared to doing a multiplication, so compiling expressions can save a lot there. However the overhead of a function call compared to running an entire condition or action is much smaller, since conditions and actions do far more work, so compiling them to JavaScript may bring smaller gains. Still, it's technically possible - so who knows, maybe one day in future we'll take a look at it.
Still, we're very pleased with the great results from compiling expressions to JavaScript! And you can try it out yourself today in the latest beta. We are still working hard to improve and finish off the C3 runtime ready for its full release. We'll keep you posted with more news as that develops. And once it's ready, your games will run even faster!