Behavior improvements in the Construct 3 runtime

13
Official Construct Post
Ashley's avatar
Ashley
  • 19 Mar, 2018
  • 1,342 words
  • ~5-9 mins
  • 5,281 visits
  • 2 favourites

Our first blog post on performance focused on the runtime engine performance. Whilst upgrading our code for the new runtime, we also made some key improvements to Construct's behaviors. Two in particular have been significantly rearchitected: Physics and Pathfinding. Here's what we've done.

WebAssembly-powered Physics

In the C2 runtime, the Physics behavior is based on the Box2D library and is powered by asm.js — a high-performance subset of JavaScript. However it has some pitfalls, particularly around memory management and overheads. WebAssembly is a modern native-grade binary executable format for the web that can get virtually the same performance as native C++ code and solves many of the problems of asm.js. So we're pleased to announce that in the C3 runtime we've moved to a WebAssembly build of Box2D to power Physics. This brings native-equivalent performance and memory characteristics to Construct 3's Physics behavior!

First we're pleased to note that the WebAssembly physics engine is around 10% faster than asm.js when testing how many Physics objects can be simulated at 30 FPS. This is due to the fact WebAssembly is a genuine binary executable format rather than a clever JavaScript hack, which allows it to eliminate some of the performance overhead of asm.js.

One of the major improvements of WebAssembly over asm.js is it supports memory growth with no performance overhead. With asm.js you had to choose between deoptimising performance to allow increasing memory use, or maximum performance with a fixed memory limit. We chose maximum performance with a fixed memory limit of 50mb. A downside of this is if Physics needs more than 50mb of memory for a game, it will crash as out-of-memory because it cannot increase the memory size. Another issue is if your game only needs a small amount of memory for Physics, it still has to allocate a full 50mb. The new WebAssembly Physics engine starts with just a 16mb memory allocation — that's 68% less than the asm.js version.

If the WebAssembly physics engine needs more memory, it simply allocates more memory and keeps going — and there's no performance overhead for doing that. So there's no memory limit for Physics any more! Your games can keep going as big as you like and you'll never hit a fixed memory limit.

Another benefit of WebAssembly is that the binary format is more compact, making the download size smaller. Most servers will send compressed resources, and after compressing both the asm.js and WebAssembly versions of the Box2D library, the WebAssembly version is still nearly 20% smaller.

This smaller binary format is also much quicker to parse, allowing a faster startup too. In fact some browsers can compile WebAssembly faster than it downloads, meaning there is essentially zero loading time.

WebAssembly is already supported in all major browsers, so these improvements will come to all platforms! We have entirely removed the old asm.js Physics engine from the C3 runtime, so you are guaranteed high-performance, low-overhead Physics.

Faster, multi-threaded pathfinding

The other behavior we've particularly improved is the Pathfinding behavior. This uses the A* pathfinding algorithm to find a path across a layout avoiding obstacles. It's great for making seemingly intelligent enemies that can find their way around the level without bumping in to things. However actually calculating the path is very CPU intensive, especially when used with a small grid size.

First up we did some fundamental refactoring of the A* pathfinding algorithm we use to improve its efficiency, making use of the latest JavaScript features. To test the improvement, we made a test pathfinding layout with a bunch of obstacles forcing a winding path to be found. The layout is shown below. The blue dot must calculate a path to the red X in the top-left. The layout is 3000 x 3000 with a small cell size of 7, meaning the pathfinding behavior must calculate a path through a grid with over 180,000 cells.

How quickly can it do that? In the new C3 runtime, individual paths can be calculated over 4x faster than in C2. This is so much faster the delay is almost unnoticable, whereas it took over half a second before.

We didn't stop there.

The pathfinding behavior in the C2 runtime uses a Web Worker to run pathfinding calculations in parallel to the game. This means that 500ms delay to calculate a path doesn't jank the game: it continues running smoothly and triggers On path found when the result is ready. However, the C2 runtime only ever creates one Web Worker. This misses the opportunity to use multiple CPU cores. Even some mobile devices have 8 CPU cores. On such systems, creating 8 Web Workers for pathfinding calculations would allow 8 paths to all be calculated in parallel. With one Web Worker, it must queue up all the paths to be calculated and run them one after the other, delaying the completion of later paths.

For the Construct 3 runtime, we created a whole new multi-core dispatching framework to allow plugins and behaviors to easily post tasks to multiple Web Workers. The runtime can create one Web Worker per CPU core, allowing maximum throughput. As a result the Pathfinding behavior can now run multiple pathfinding calculations in parallel — and the more cores, the more throughput. This essentially multiplies the throughput by the number of CPU cores.

To test the improvement, we took the previous test, created an extra 50 slightly spread out instances, and asked them all to calculate a path at the same time. The result is stunning. Testing on a quad-core system, the C3 runtime can finish the work 32x faster than the C2 runtime!

Just look how quickly results come in with the C3 runtime:

https://www.scirra.com/images/blogstuff/c3runtime/pathfinding-c3.mp4

...compared to the C2 runtime:

https://www.scirra.com/images/blogstuff/c3runtime/pathfinding-c2.mp4

It's interesting to observe that we got a bigger speed-up than expected: on a quad-core system we might expect a 4.4 x 4 speedup, which is an overall 17.6x speedup. Instead we got 32x! There are two reasons for this. Firstly the quad-core system has hyperthreading, so it's actually got 8 hardware threads and therefore creates 8 Web Workers. Normally hyperthreading does not double performance, but in good conditions it can give maybe up to another 30% performance. The other factor is that in long-running pathfinding jobs the JavaScript engine must do garbage collection to clean up the memory after past jobs. With a one-off job it can post back the result and then do garbage collection afterwards, meaning it's not included in the timing. However with long-running jobs the time spent in garbage collection becomes important to the throughput measurement, since it cannot start the next task in the queue until it's finished cleaning up. Since the new algorithm in the C3 runtime is more efficient with memory management, it also saves time on this garbage collection work, further increasing the throughput.

Conclusion

Along with a fundamentally rearchitected runtime to boost overall performance, we've also done considerable work to redesign the Physics and Pathfinding behaviors for maximum performance. Physics is smaller, faster, uses less memory, and has no fixed memory limits. Individual pathfinding jobs are considerably faster, and when combined with multi-threaded processing, amounts to incredible throughput. Games involving large groups of objects all pathfinding around a complex level can now handle the pathfinding calculations far more efficiently — enough of a difference to raise the bar on the games you can design.

Physics in the Construct 3 runtime should now be as good as native — period. Multi-threaded code is notoriously difficult to get right in native programming languages, but we have built an easy-to-use Web Worker dispatching engine that any plugin or behavior can take advantage of, and the gains for pathfinding over the C2 runtime are enormous. This is just a small slice of the work we've done to exploit the latest web technologies to make the Construct 3 runtime even better.

Catch-up

Missed a previous post? Here's the blog series so far:

  1. Announcing the Construct 3 runtime
  2. New text features in the Construct 3 runtime

Subscribe

Get emailed when there are new posts!

  • 1 Comments

  • Order by
Want to leave a comment? Login or Register an account!