We've been busy building a whole new runtime for Construct 3. We aim to release an initial testing version in the near future. Before that, we have a short blog series covering all the new optimisations and improvements in the new runtime.
An all-new runtime
Rebuilding the Construct 3 editor was a huge project. To ensure it was manageable, we kept the same runtime (game engine) that Construct 2 uses, and transferred it across to Construct 3. This is one of the reasons we were able to successfully launch Construct 3 last year, first in beta, and then the full release in December. However we've always planned to replace the runtime later down the line.
Over the past year or so a whole new runtime for Construct 3 has been built from scratch. It is completely rearchitected and rewritten to be faster and more powerful. It unlocks a whole new tier of performance and features. Its design is forwards-looking: using cutting-edge web platform features, future-proofed for years more of the breakneck pace of progress on the web, and dropping a great deal of legacy compatibility code that was weighing the engine down. And we've even managed to keep it highly compatible, allowing many existing projects to opt-in to the new runtime with few or no changes.
The new runtime is the future of Construct. First though — we're sure you want to know it's fast. So here are some of the key engine performance improvements.
All tests were run on a laptop with an Intel Core i7-4712HQ @ 2.3 GHz with 16 GB RAM and nVidia GeForce GT 750M graphics, using Chrome Canary 66 (which will reach the stable channel in due course). The specific test system should not be particularly important — we expect to see the same relative differences across different devices and OSs. Also note some tests measure a result at 30 FPS to avoid interference from v-sync timers at 60 FPS.
Event optimisations
Running events is often the most CPU-heavy part of a game. Fortunately, some of our biggest performance gains have been in the event engine. To test the overall overhead of running events, we made an empty Repeat loop and tested how many iterations we could run every tick and still hit 30 FPS. The Construct 3 runtime is a whole 3.3x faster — that's 330% — indicating a much reduced overhead to running events.
This is impressive, but an empty loop isn't very realistic. To make a more realistic event performance test we made a small project to find the highest prime number it can in 10 seconds. This uses a naive test which loops testing all the factors of the number, involving a set of sub-events, conditions and actions. Every factor tested adds 1 to an iteration count. In 10 seconds, the Construct 3 runtime can run 2.3x as many iterations, indicating impressively faster overall event performance.
We've worked to optimise functions, too. Our first test simply times how long 5 million function calls takes, each call just adding 1 to a variable. The C3 runtime is 2.6x faster. The second test is slightly more realistic, with a naive fibonnaci number calculator using recursive functions calculating the 30th fibonnaci number. The new runtime is again about 2.6x times faster.
More consistent performance
One important aspect of the new runtime is its consistency. We worked hard to remove many cases of polymorphic (dynamic) code in the engine in favor of monomorphic (static) code, which has less overhead. The quad issue performance test is a good way to highlight the difference. This tests how many sprites it can constantly re-draw at 30 FPS. This was already very well optimised in the C2 runtime, and the C3 runtime initially performs exactly the same. (It's good to know despite completely rewriting all the code, we haven't regressed performance on this tough test!) However something interesting happens in the C2 runtime when you add a family or behavior to the sprite being tested: performance drops. This is due to polymorphism in the C2 runtime, and the internal code becomes dynamic when these features are added. The new C3 runtime remains monomorphic and so consistently remains fast, making it 70% faster with a family and 88% faster with a behavior.
It's interesting to note that the Construct 3 runtime performance does still dip slightly when adding these features. We don't believe this is due to polymorphism, having largely eliminated it from the engine. Instead it's likely due to the increased memory usage when using these features. The test is incredibly demanding — on the tested system it takes just 300 CPU cycles to render each sprite. This is so quick that some very subtle effects can come in to play. That's the ballpark time that a fetch from main memory takes, so the cache size and memory layout becomes significant. Adding a behavior involves allocating a small amount of memory for each instance's behavior, which likely also spreads out the memory for the instances being rendered. This can use up the CPU cache more quickly and end up making more main memory requests. We think this explains the small dip in performance. The C3 runtime still far outperforms the C2 runtime in these cases, and in a way it's fascinating to have such a well-optimised engine that such low-level details can be observed.
Faster effects
Construct's WebGL-powered effects can look incredible, but have often had a high performance impact. We've overhauled the effects rendering engine and made major improvements to how the batching engine works, allowing the C3 runtime to efficiently batch large numbers of instances with effects where the C2 runtime was not able to. This shows up well when running the same quad issue test used before, but just adding an effect to the tested sprite. Adding a tint effect is 5.8x faster in the C3 runtime as it's able to completely batch the effects.
In the particular case of the tint effect, the new C3 runtime also supports a built-in color property for many objects, including Sprite. This acts like a built-in tint effect, in a similar way to how opacity is a built-in transparency effect. (For those of you who remember, Construct Classic had this feature way back!) For the specific use of changing object colors, the built-in color property is even faster still than the tint effect, having little performance impact over simply rendering without any effect at all. Overall the built-in color property is over 2x faster than using the tint effect in the C3 runtime, which is an amazing 12.5x faster than the tint effect in the C2 runtime.
Rendering background-blending effects is much more tricky: due to the way graphics APIs work, it must first render to an intermediate surface and copy the result at the end. This has a high performance overhead. In this example we used the Overlay background blending effect. We've still managed to improve background blending effects in the C3 runtime, making it 80% faster.
General engine optimisations
We worked to optimise creating and destroying instances in the new runtime. We made a test that creates 10,000 sprites, then every tick will destroy N random sprites and re-create new ones. The test measures how many sprites can be destroyed and recreated every tick and still hit 30 FPS. The new runtime is over 5x faster at this particular test.
The built-in bounding box performance test measures how fast the engine is at updating moving or rotating objects. It measures how many sprites it can keep updating at 30 FPS. Construct 2 was already very good at this, but in the Construct 3 runtime this core function is 20% faster.
We've also got a test that creates N sprites and tests for collisions between all of them using either Is overlapping or On collision. Similarly it tests how many it can create and still hit 30 FPS. To give you an idea how intense this test is, if it creates 1000 sprites, it must check every of the nearly 1,000,000 combinations between them every tick! Fortunately collision cells help remove most. The Construct 3 runtime is a bit faster at Is overlapping tests (around 7%), but jumps ahead to 45% faster at On collision tests. Testing On collision is more work than overlap since it has to track the state over time; the C3 runtime is much faster at this tracking.
Incredible tilemap performance
The C2 runtime has to switch texture every time it draws a different tile. This has quite a high performance overhead, especially if every tile is different. We'd already improved tilemap rendering for the editor with a much better rendering algorithm that can render entire tilemaps without needing to switch texture at all, while maintaining high-quality rendering (no seams or color bleed issues). We carried this new algorithm over to the new C3 runtime, and the results are stunning. Measuring how many all-different on-screen tiles can be rendered at 30 FPS, the new runtime is over 40x faster. That's a 4000% improvement!
And more
The new runtime has been thoroughly reviewed as we rebuilt it, with this approach of carefully tuning performance applied to every aspect of the engine. There are further improvements in other areas like parallelising texture loading when switching layouts, reducing the overhead of ticking behaviors, optimising triggers, improving startup time, and more. We'll be covering some more cases in later blog posts.
This demonstrates how we've made major, wide-ranging optimisations to the new runtime. We expect this to combine in to making a transformative difference to complex, CPU-heavy games, as well as helping keep mobile games streamlined and efficient. We even added in a few GPU-side performance improvements along the way, too. We believe our new runtime is so good it will compete strongly with some native engines on the market.
We can't wait for you to try our new engine. Stay tuned for more updates about what else is coming, and how you'll be able to try it out soon.