I wrote this as an update to the document draft. However I think this could improve game performance overall going forward.
https://docs.google.com/document/d/1pNR ... ZGe8/edit#
it's better formatted at google docs
Engine Runtime Code execution
Having spent some more time in the main game loop and ways to take advantange of memory, cache optimization for Duff’s Device(below) has led to a different suggestion. The current design of C2 main game loop for objects is
Current Loop
loop through all objects types
loop through instances of Object type
do object pre tick
do update
loop through behaviours
execute behaviour update
do object post tick
This is the actual loop. The loop is repeated twice.
for (i = 0, leni = this.types_by_index.length; i < leni; i++)
{
type = this.types_by_index;
if (type.is_family || (!type.behaviors.length && !type.families.length))
continue;
for (j = 0, lenj = type.instances.length; j < lenj; j++)
{
inst = type.instances[j];
for (k = 0, lenk = inst.behavior_insts.length; k < lenk; k++)
{
inst.behavior_insts[k].tick();
}
}
}
for (i = 0, leni = this.types_by_index.length; i < leni; i++)
{
type = this.types_by_index;
if (type.is_family || (!type.behaviors.length && !type.families.length))
continue; // type doesn't have any behaviors
for (j = 0, lenj = type.instances.length; j < lenj; j++)
{
inst = type.instances[j];
for (k = 0, lenk = inst.behavior_insts.length; k < lenk; k++)
{
binst = inst.behavior_insts[k];
if (binst.posttick)
binst.posttick();
}
}
}
In the above loop the loop is jumps into a multi embedded function. because the loop goes down, then back up the system is constantly clearing cache for a new behaviour, then going back to the behaviour for the next instance. There is little opportunity for the JIT to truly optimize cpu cache for both function variable cache, or object function cache.(as related to Duff’s device). As an analogy GPU/WebGL/GPU in general. Part of efficient rendering is to package sprites together on a texture. That way there is less swapping textures, which reduces memory transfer to the gpu cache, less draw calls and overall way better performance. Which C2 already works towards. however the CPU also benefits from caching code, and memory variable(as Duff Device below) significantly.
I propose a loop that the CPU can take advantage of code/behaviour batching which can improve JIT optimization at runtime. Less deep than above, and also offers the awesome feature of priority behaviour control. this of course still works off the basic idea of world object and everything is a behaviour.
Proposed loop
loop through different behaviours
loop through all of the same behaviour on all objects
execute behavior update
How does this work instead.
This loop does not loop through Objects.
This loop does not go through behaviors in an object
This has the behavior at the top, and then manipulates it’s world object
There is a BehaviourList. this list contains a BehaviourInstanceArray. And the array contains a reference to each and every instance of the behavior.
abstract sample below
BList[
Sprite[]
Collision[]
Platform[]
Solid[]
Pin[]
]
When an object is created, the code adds the behavior instance to the array. When the object is destroyed then the behavior is removed from the array. This also means that the CPU can cache the update function and run through them all in one go. Doing so offers the benefit of Duff below. However the system also offers another benefit.
Sort-able behavior execution. The above sample shows Sprite at the top, and Pin at the bottom. however this list can be sorted. Offering control as to what should be executed first to last.
So the loop is
for(int i = 0, int length = blist.length; i < length; i++)
{
var bInstList = blist[ i ];
var n = iterations % 8;
while (n--) {
bInstList[ n ].tick();
}
n = (iterations * 0.125) ^ 0;
while (n--) {
bInstList[ n - 0 ].tick();
bInstList[ n - 1 ].tick();
bInstList[ n - 2 ].tick();
bInstList[ n - 3 ].tick();
bInstList[ n - 4 ].tick();
bInstList[ n - 5 ].tick();
bInstList[ n - 6 ].tick();
bInstList[ n - 7 ].tick();
}
}
So here is the game loop. Takes advantage CPU caching. Far simpler in design. Allows for Behavior execution order. So instead of Object top down, It’s behavior top down. There are probably other optimization techniques that can be applied.