Why do events only run on one core?

3
Official Construct Team Post
Ashley's avatar
Ashley
  • 6 Mar, 2015
  • 1,381 words
  • ~6-9 mins
  • 2,491 visits
  • 2 favourites

Recently on the forum it's been asked several times why Construct 2 only runs events on one CPU core. I'm not sure why it's suddenly become a hot topic, but I went in to some detail in one reply then had to keep finding the thread again when it got asked again. It's also an interesting aspect of game engine design, so here's a bloggified version my forum reply.

It's a difficult problem

Making complex game logic run in parallel is a very difficult problem. This is in no way limited to Construct 2, because the problem is the same no matter what framework, engine or programming language you use. That also means switching to another engine or framework is probably not going to make it any better. The problem is fundamentally difficult given the architecture of modern computer systems.

Sequential logic

It might be easy to assume that game logic is just a big pile of generic work that can be split N ways over N cores. However this is not the case: game logic (and a lot of application logic in general) is highly sequential. This means that most tasks depend on the results of previous tasks and cannot be executed until those results are available. Therefore they must be run after the previous tasks. With complex logic there can easily be a long chain of tasks dependent on the previous results all the way through the workload, and then there is no easy way to make it parallel. Running part of that workload on a different core doesn't help at all, since the tasks must still be done in order, one after the other - the same as on a single core.

Construct 2's event sheets run in top-to-bottom order, conditions are evaluated top-to-bottom, and actions are run top-to-bottom. Anything an event refers to could have been changed by previous events. This makes the event sheet highly sequential, and it's prohibitively difficult to try to analyse those dependencies to figure out which parts do or don't depend on the results of other parts - especially considering third-party plugins which have unknown effects. This is basically the same problem faced by all engines, frameworks and programming languages, except they are using programming language statements instead of events.

The cost of parallelism

Further, parallelism does not come for free. The necessary synchronisation between cores comes with a performance overhead. Every time work is sent off to another core, the work must be scheduled on that core, the core must context switch to the thread to process that work, it will initially be slow while it runs in to cache misses and "warms up" the caches with content, and then the results must be scheduled for processing back on the original core which must then context switch back to the thread that processes those results. Consequently sending small amounts of work to a different core is usually actually slower, even if they can be run in parallel, since the overhead of arranging the off-core work eclipses any benefit.

An example of this in Construct 2 is running a "Set X to 0" action on 100 instances (a quick and simple action to run). On a quad-core machine it might seem like a good idea to schedule that work over four cores processing 25 instances each. However the scheduling overhead is large enough that a single core could finish running the 100 actions by itself much faster than it could schedule the work over four cores.

In addition to that, multithreaded programming is often very difficult. It's a great challenge to synchronise every aspect of the engine such that there are no race conditions, unexpected results if results complete in a different order, or unsafe results due to incorrect synchronisation. This puts a great burden on development, and is a high cost to pay if there aren't even significant performance gains to be made.

So it's challenging to parallelise large chunks of logic due to dependencies on previous results, and it's challenging to parallelise small chunks of logic due to the overhead of synchronising work on other cores. Running on a single core is easy to program for and has little overhead. The downside is, of course, that logic performance can be bottlenecked on single core performance. In this case the best solution is to optimise the efficiency of your logic so it runs well even on one core. Modern Javascript engines and CPU cores are very efficient and you can still do a great deal with a single core - providing your logic is efficiently designed.

The engine still uses multiple cores

However the game logic is just one of many tasks in a game engine. There is other significant processing work which can already run on other cores in modern browser engines:

  • Draw calls are executed in parallel in some browsers. This means all WebGL calls made from Javascript are basically just logged without doing anything, and then passed to another thread or process for execution. This allows the actual calls to the graphics driver to run in parallel, removing their performance impact from the main thread.
  • Rendering the display doesn't just happen on another thread - it happens on a whole other hardware processor, the GPU! This is specially designed to be highly parallel and achieve best performance when rendering sprites and effects, and works in parallel with the CPU.
  • Audio processing such as routing, mixing, processing effects and playing through hardware is all processed by a separate thread.
  • Input events are watched by a separate thread in some browsers to achieve best responsiveness to user input, sometimes with extra smoothing to improve the perceived quality of motion.
  • Image, video and audio decoding is all processed in parallel when loading resources. This helps reduce loading times or in-game performance if this happens dynamically.
  • Network requests are issued and processed on a separate thread, such as if making AJAX requests, using WebSockets, or WebRTC multiplayer. This is also why AJAX requests run a trigger a while after making a request: the game continues to run in parallel as the request is made, and the trigger notifies you of the result.
  • The Pathfinding behavior is unique in that it deals with large CPU-intensive workloads (calculating paths with A*) that can run in parallel to the game. It uses a Web Worker to run the Javascript logic for pathfinding in parallel to the main game. Like with AJAX requests this is why 'On path found' can trigger a while after the 'Find path' action is issued: the game continues to run while the path is calculated in parallel. This has the really nice benefit of preventing pathfinding calculations - however intensive they may be - from impacting the game's framerate.

Hopefully this demonstrates while game logic is often confined to a single core, there are still several major features that can make use of multiple cores and run in parallel.

Increasing parallelism

Looking to the future, browser vendors are keenly aware of the need to parallelise their engines as much as possible to achieve maximum performance in today's multi-core world. There will probably be further improvements on this in future. Mozilla are even experimenting with a new browser engine called Servo designed from the ground up with parallelism in mind. There is also the possibility of writing new features that can use Web Workers like Pathfinding already does. Other more challenging options include moving other large processing tasks to a Web Worker (perhaps Physics?), or even a special feature that allows an event sheet to run on a Web Worker, although this would probably require rearchitecting the runtime, and not necessarily be straightforward to use (since it cannot directly access any of the other objects or variables on the project without the synchronisation overhead).

However game or application logic has traditionally been highly sequential, and it's also much easier to think about sequential code than parallel calls that could finish in any order or interact in other complicated ways. Regardless of the tool or language you are using, single-core logic performance is likely to be a limiting factor for some time, and it will continue to be important to carefully design efficient logic that does not waste this finite resource.

Subscribe

Get emailed when there are new posts!