Technical Question on WebGL Rendering

0 favourites
  • Ashley or anyone who can effectively answer this question.

    There was a discussion not to long ago about cloned objects, rendering he "same" image when they are different memory references. So I wanted to know some more technical question in regards to C2 rendering logic and a single sprite object with multiple instances.

    I admit I could be fuzzy on this. So please bear with me

    At the render time when C2 renders the object. The image is moved to the render pipeline. If there are numerous same instance of the Object then C2/WebGL makes 1 render call based on fill rate. So 1 call draws numerous images on the screen.

    Now what if it's 1 sprite object, but with numerous frames or numerous animations. When rendering the 1 sprite object, but has numerous instances set at different frames; are all the frames image put into the pipe line. If so does WebGL/C2 only make 1 render call still because it's only 1 sprite object even though it is multiple instances set at different frames?

  • Ashley can probably answer more precisely and you can find some more info on C2's renderer on Ashley's blog, but I can give some overview. Basically images are moved to video memory once and when rendering webgl is told which texture to use, the opacity and the quad's four corners' xy and uv coordinates. C2 uses sprite batching to group sprites with the same texture together and reduce the draw calls to lesstexture switching and sending bigger chunks of the quad vertices. Now as to your question I imagine if you had only two instances of one object and each had a different frame then that would be two draw calls. However on export C2 creates texture atlas' of the frames if it can, so then it may take only one call. To be more exact about a draw call, it is only done once per frame in webgl. Everything else is just sending info to webgl to use when the frame is drawn.

  • R0J0hound

    Thanks that clears a lot up. So I suppose then the part that could use some final clarification.

    " However on export C2 creates texture atlas' of the frames if it can, so then it may take only one call. To be more exact about a draw call, it is only done once per frame in webgl. Everything else is just sending info to webgl to use when the frame is drawn."

    So because C2 creates a texture atlas. Could or does this happen. Can the export receive better performance improvements if the atlas was larger. I'm wondering because would it be better to design a single sprite as an atlas sprite and then use invisible game objects to handle game logic.

    I do get your point. If the sprite breaks the image into more memory blocks then that's more calls. Where as if the sprite were to drawn with more advanced UV coordinates off using a larger memory coordinates could reduce draw calls could improve performance.

  • I'd imagine the performance would be better after exporting. It's hard to design for this since the texture atlas is done automatically and you're still bound by a maximum texture size.

    To see if your idea would be faster you'd need to make a test either way and measure the performance. This is assuming that's where the bottleneck is at in your game. My machine is weak with html5/javascript so logic is often the bottleneck instead of rendering. Well I have a slow rendering issue as well since my graphics card is in driver limbo, so I'm hit by both sides.

  • That seems to be the case. Going to have to do some kind of active performance test. I don't actually have a performance problem as of now. I'm just thinking that if this is the case where the export atlas can get better performance. Then that would have considerable changes in how the game should be structurally designed. Making a large change to fewer atlases later would be a big job. Where as designing around this from the start would save a lot of effort.

    Well time to work on a test

  • In my tests I found that the order in which different sprites are created appears plays a part in the draw calls. Basically, it appears objects are rendered in the order of their UID's.

    So, if you have 200 each of sprite A, B, and C, your draw calls can be higher if these objects are being created in such a way that their UID's are scattered (A,B,C,A,C,B, A...).

    Whereas, if they are grouped together (A,A,A, etc...B,B,B, etc..., C, C, C, etc...), drawcalls are lower, since the engine is only having to switch textures a few times.

    I did not test this extensively, and am a little distrustful of the draw call percentage in debugger to begin with, so take that all with a grain of salt...

  • TiAm

    That's really interesting and that shouldn't be happening. Ashley mentioned before that a Sprite should be rendered as a whole. Though I have a question. Did that include Z-sorting and layers? I would assume z-sorting and layering would increase draw call in any circumstance.

    All right so I did a simple render test. With unique but stable results. These are done with 1 sprite object. 10 objects are created providing creation allowed. Creation is allowed fps < 30 up to 100 time. The reason I chose this was because of how GPU handles rendering dips and how often and when they become extensive. This doesn't mean a MAX limit.

    1. Performance is better if your UV size or in our case Texture sizes stick to the power of 2(2,4,8,16,32,64,128..)

    2. On Preview (rounding to nearest 500)

    1 Frame displayed on the object. Made it 11,500 after 10 tests.

    Random Frame on object. 6000 after 10 tests.

    3 On export

    1 Frame displayed on object. Didn't test because this is about atlas testing on export.

    Random Frame on object 9500 after 10 tests

    Interestingly enough that on Export the performance increase by %45 where the GPU had less troubles.

    However other factors I found out. When given the unreasonable usage of just constantly creating objects. C2 CPU will become over burden rather quickly. I suspect this has to do with the JS storage object. But it's not important because your not going to have 5000 objects on the screen. but it's interesting to know that C2 CPU usage will hit 100% before the GPU hit's limits.

    So with this information. It might be best to actually create games based on the idea of an Atlas sprite object. Of course these will only count for sprites that live on the same layer.

    My question is now answered. It is best to use Atlases as much as viably possible. So all widgets, small objects should share 1 object. And let robust stuff have there own sprites.

    I appreciate everyone's input. I wouldn't mind Ashley's thoughts on the entire subject to.

  • What do you mean by "So all widgets, small objects should share 1 object. And let robust stuff have there own sprites." ?

    Sorry but it's 3:28 AM and i'm not sure if I understand this correctly

  • Try Construct 3

    Develop games in your browser. Powerful, performant & highly capable.

    Try Now Construct 3 users don't see these ads
  • I find that lot's of dev's use 1 object per game button, 1 object per bullet type, 1 object per enemy.... so on etc. In the end they have around 200 objects and at any one time they may have 50 different objects.

    Where as if all the buttons, bullets... small stuff were in 1 Sprite used as an atlas using animations and frames. that would be better.

    As for more robust stuff. Well if you have a player with 5 different animation types and many frames per animation. It might be easier for development reasons to to just have the player have there own entire object.

    I have been enlightd towards mobile development. Even older hardware say an iPhone4 should be limited to 25apx draw calls. So of course if each of your objects on the screen is there own object then your going to hit and pass that fast.

  • I find that lot's of dev's use 1 object per game button, 1 object per bullet type, 1 object per enemy.... so on etc. In the end they have around 200 objects and at any one time they may have 50 different objects.

    Where as if all the buttons, bullets... small stuff were in 1 Sprite used as an atlas using animations and frames. that would be better.

    I cannot be sure but, wouldn't that be worse when not all objects are used at the same time?

    Since the webgl renderer I think only load needed textures, doing that would force everything to be loaded at startup from what I understand

  • A quick overview of the WebGL renderer is this: (maybe this deserves a separate blog post)

    1. As far as memory usage goes, the layout-by-layout loading means that on start of layout all textures that are used by the layout are loaded (and any others released). All frames of all animations of any objects initially placed on the layout are loaded. This means for the purposes of rendering a layout, loading/unloading textures can be ignored.

    2. The WebGL renderer is a batched back-to-front renderer. This means it draws the bottom thing on the bottom layer and then moves to the front, drawing things over what's already been rendered.

    3. The "batch primitives" include things like "set texture" and "draw N quads". C2 batches draw calls to eliminate redundant work. For example if the engine tries to set the same texture 3 times in a row, it only adds one "set texture" batch job.

    4. If there are 10 of the same sprite showing the same texture, and they are all drawn in a row (so they must be consecutive in Z order), the batch can be just two items: "set texture" then "draw 10 quads".

    5. If there are 10 of alternating sprite types with different textures, they cannot be batched with a single "draw 10 quads" call, because that can only use the same texture for all of them. So the batch ends up with "set texture A", "draw 1 quad", "set texture B", "draw 1 quad", "set texture A", "draw 1 quad"...

    This is not as bad as it sounds. These are very quick calls and are still much faster than the equivalent canvas2d rendering commands.

    6. After export there are lots of optimisations run like image deduplication and spritesheeting. This makes it more likely that objects share the same texture, so in some cases after exporting the batch can look like step 4 where it would have looked like step 5 in preview, which improves performance slightly.

    So overall you can optimise your game by doing things like making sure objects with the same images appear consecutive in Z order. But this can be difficult, and is probably a "micro-optimisation" - there are likely to be much more important things to worry about for performance.

    It's possible to make giant spritesheets that have all the game's images on it, but it doesn't totally eliminate texture switching if you have lots of images and exceed the maximum texture size with one spritesheet. Also putting all images in one image can make the loading bar useless in web games, and actually can significantly increase the download size (particularly problematic where stores have a file size limit, like Google Play). More info in this in the blog post Under the hood: spritesheets in Construct 2.

  • First: Definitely make this a blog post. This is very useful info, hate to have it get buried in the forum to never be seen again.

    Anyway: Glad it's done by Z order rather than UID; makes more sense and easier to design around if need be. Thanks, Ashley

  • I appreciate everyone's input and it's great to get some insight into what is happening under the hood with C2. That was a fantastic bit of information.

    I guess though even with that information the situation doesn't change. Developers should take care and understand that putting 50 different objects that do the same thing, are single image sprites as say a jigsaw puzzle. Are actually working against the rendered.

    Having done some research older devices such as iPhone4(not s) should only have apx 25 draw calls to maintain good performance. I think Ashley should certainly write the blog about draw calls, use of sprite sheets and a guide.

    Ashley once said recently that one of the biggest reasons people have a hard time with mobile game dev; is because of GPU overload. I see this all time. Knowing that C2 treats every different object regardless of spritesheet optimization will still cause a separate draw call is good to know.

    Of course mobile devices of 2013 easily have draw call of around 200+. Also It seems like modern computers have a relative safety around 1000 to 2000. But it seems people want to do 200 calls on older devices.

    I found the discussion fascinating. Thanks. I'll continue to read up if more posts come.

    batching sprites is good. Be smart about batching as an object over multiple 2048x2048 sheet's isn't that good.

    Yes a well written technical blog would be fantastic on the subject and may help relieve the "mobile performance sucks" syndrome.

  • why not batch objects per family? or have batchgroups, this would provide more control over the batching, this would also batch single objects together, like for example the spriter plugin, there's no reason that these objects should not be batched together, and in the end would maximize performance for everyone, using frames or objects.

  • vtrix

    oh my yes. once I learned more detailed information about this I realized that Spriter is never going to be mobile capable with C2 with any kind of complex spriter. I looked at the default platformer character; 43 objects. Considering that an 1g iPad should be around 100 for ideal performance that character alone uses almost half. Then add a monsters, fx and stuff. Spriter will blow over most mobile devices in a single breath.

    I posted this request in the thread that Spriter should change to a single sprite object due to this. But I never got a reply. Spriter may not be mobile friendly until mid 2015. Which is a pity with all the work that went into JSON structure for better cross html devices.

    Of course you can get away with simpler component spriters though.

Jump to:
Active Users
There are 1 visitors browsing this topic (0 users and 1 guests)