WebGPU renderer - allow other WebGPU renderer to other off screen canvas in same context?

0 favourites
From the Asset Store
Global plugin that allow for relatively easy and efficient implementation of an in-game inventory system
  • I am not sure if this behavior is the same for webGPU, but I imagine it may be similar, so I wanted to discuss first before adding an aha suggestion.

    Ashley

    In the current Spine plugin, I share WebGL context with the C3 renderer and the Spine WebGL renderer. To do this, I save off certain WebGL parameters that C3 is using, render using the Spine WebGL renderer and then restore the WebGL parameters. I use the same context to render to texture for C3 use, for better performance. The save parameters is a blocking WebGL operation, but I need to do it because I don't have access to any shadow parameters from the C3 engine. This is all relatively ok and it works, but has perf impact.

    If there is a new webgpu renderer coming for C3, would it be possible to include a save and restore set of methods, so the C3 webgpu renderer can coexist with another webgpu renderer in the same 'context'? For example for the Spine renderer or other renderer. I have never implemented anything in webgpu yet, so I won't be shocked if I make bad assumptions, but I wanted to start the conversation.

  • I'd rather not use that approach at all. Can't you use the provided renderer interface IWebGLRenderer? (Despite the name that should also work 100% compatibly with WebGPU. We'll probably rename it at some point.) If features are missing that you need, which features are those? It would be much more efficient to use Construct's renderer methods, as you avoid the need for saving/restoring state at all, and it should be much more efficient as it can batch all your draw calls together with the rest of the engine's renderer calls.

  • Ok, understood. As WebGPU comes online with C3, I'll focus on seeing if we can change Spine plugin to work directly with the IWebGLRenderer, good to know it will be forward-compatible with WebGPU.

    Using the IWebGLRenderer, one of the issues I will need to resolve is that in Spine, each Quad can be two color-tinted differently (normal color and dark color). This allows the different attachments in Spine to be colored according to gameplay or user choice dynamically.

    To keep WebGL draw batching efficient, in the Spine WebGL renderer these tint colors are passed as vertex colors to the fragment shader ('varying'), instead of doing different fragment uniforms which would break batching.

    I am not sure how to approach that with IWebGLRenderer as it is currently defined. Any ideas?

    Here's the shader used in the Spine WebGL renderer:

    			let vs = `
    				attribute vec4 ${Shader.POSITION};
    				attribute vec4 ${Shader.COLOR};
    				attribute vec4 ${Shader.COLOR2};
    				attribute vec2 ${Shader.TEXCOORDS};
    				uniform mat4 ${Shader.MVP_MATRIX};
    				varying vec4 v_light;
    				varying vec4 v_dark;
    				varying vec2 v_texCoords;
    				void main () {
    					v_light = ${Shader.COLOR};
    					v_dark = ${Shader.COLOR2};
    					v_texCoords = ${Shader.TEXCOORDS};
    					gl_Position = ${Shader.MVP_MATRIX} * ${Shader.POSITION};
    				}
    			`;
    
    			let fs = `
    				#ifdef GL_ES
    					#define LOWP lowp
    					precision mediump float;
    				#else
    					#define LOWP
    				#endif
    				varying LOWP vec4 v_light;
    				varying LOWP vec4 v_dark;
    				varying vec2 v_texCoords;
    				uniform sampler2D u_texture;
    				void main () {
    					vec4 texColor = texture2D(u_texture, v_texCoords);
    					gl_FragColor.a = texColor.a * v_light.a;
    					gl_FragColor.rgb = ((texColor.a - 1.0) * v_dark.a + 1.0 - texColor.rgb) * v_dark.rgb + texColor.rgb * v_light.rgb;
    				}
    			`;
    
  • in Spine, each Quad can be two color-tinted differently (normal color and dark color).

    I'm not clear what this means exactly. Do you mean adding a gradient over the image or something?

  • It applies the above fragment shader, it’s applied uniformly across the image. So it’s not depending on uv position. It only depends on the texture color and the color and color2 which are passed into VS and then to the FS. Each Spine object cab be made of 100 to 1000 quads (actually triangles) when enabling mesh deform. So batching is important. The color data is passed into vertex buffer, so FS uniforms don’t change.

  • Well, it does look like it's tricky to integrate that with the engine, as it works differently to the existing engine.

    Could it be an effect in Construct instead? That will still be efficient and batch well providing the specified colors are the same.

    Is this feature widely used? Perhaps you could skip it for the first release?

  • The feature is widely used. In our game, it’s one of the draws for the game, highly customizable characters (our game is a MMO brawler, C3 client Colyseus server). It’s also a key part of our monetization (purchasing dyes to color different parts of the player skins). We also use the Spine skins feature for further customization.

    Each character is made of many quads to allow for lots of customization and players tend to color different part of the characters differently, so if an effect was used, we would need to change the FS uniforms and that would break the batching.

    Perhaps our solution will be to stay with the webgl version of C3 (our favorite game engine). Will that be an option for a while? Similar to how we can still choose to enable worker mode or not?

    I am definitely open to other ideas on this, so if there are alternatives for implementation I am willing to put in the hard work, porting shaders as needed or learning webGPU or using new feature of the new C3 renderer, etc.

  • Hey, Ashley, Mario from Spine here. Soeey for busting into this thread, Mikal asked me to chime in.

    I think your suggestion of going through IWebGLRenderer (and IWebGLTexture) is spot on. That will ensure future compatibility, and allow the underlying C3 renderer to do its best batching renderables.

    That said, for the renderables that need to be rendered for Spine skeletons, the current IWebGLRenderer interface is not ideal. There are two issues:

    1. IWebGLRenderer only supports rendering individual (textured) quads. Spine skeletons are rendered as textured triangles. A single skeleton may consist of a few dozen to thousands of (indexed) triangles. Going through IWebGLRenderer would thus require us to use degenrate quads to render triangles, and also looses us the more performant indexed triangle rendering. I assume that methods like IWebGLRender.Quad() and consorts add vertices to an underlying vertex buffer with a fixed vertex format, i.e. x/y,u/v,rgba.

    Maybe it would be possible to add an "expert" method like IWebGLRenderer.triangles()? The parameter would be a array of vertices in C3's vertex format. Each consecutive 3 vertices would make up one triangle. Ideally the array contents could be directly copied to the renderer's vertex buffer (which may require the parameter to be an ArrayBuffer instead of a vanilla JS array). This would still not allow for indexed triangle rendering, but would remove all the overhead of issuing individual calls to IWebGLRenderer.Quad().

    2. Spine's two color tinting (aka tint black, see esotericsoftware.com/spine-slots requires a custom vertex format (x,y,u,v,color,dark color) and shader. Why the custom vertex format with the additional attribute? Batching. A Spine skeleton consists of many attachments, e.g. one for the head, one for the torso, etc. Each attachment is essentially a textured triangle mesh. There may be dozens of attachments in a single skeleton (amd hence dozens of meshes). Each attachment can apply two color tinting to its mesh. This could be implemented by setting the dark color as a shader uniform, but that would break batching of attachments. Instead, the dark color of an attachment is written as an additional vertex attribute to each of its mesh' vertices While that may seem expensive, it's actually orders of magnitude faster, as it allows batching. The additional data that needs uploading to the GPU is negligible.

    In order for this to work in C3, IWebGLRenderer would have to expose a way to specify the vertex format (and the two color tint shader, which seems possible). I see two ways to accomplish this.

    One way would be to allow extending the vertex format C3 is using. The vertex format would always have the default C3 attributes. A plugin could then tell C3 to add additional attributes to the defaults. The C3 renderer would just write a default value for those additional attributes, while the plugin can set whatever is necessary when rendering renderables it controls. The plugin would still go through the proposed IWebGLRenderer.triangles() above, so the C3 renderer could still batch everything as the vertex format is constant across all renderables.

    Now, changing the default C3 vertex format is a big ask imo, as it may ripple through the entire C3 rendering code base I'd assume. I could imagine another method on IWebGLRenderer, i.e. IWebGLRenderer.customTriangles(triangles, vertexFormat, shader). That would allow a plugin to implement its own batching with a custom vertexFormat and shader for a single renderable, i.e. one Spine skeleton consisting of dozens of meshes. That renderable could of course no longer be batched with other C3 quads. I think this could be a good compromise between performance, extensibility, and the amount of changes needed in C3 itself.

  • Ashley - any thoughts on the above?

  • On quads vs. triangles, I'm sceptical that it's worth making any changes. Construct's pipeline has been fine-tuned for extreme performance with quads. Just today I tested the M1 Pro and found it can render 750k quads on-screen at 30 FPS. A quad is just two connected triangles, so that means 1.5 million triangles. On top of that, as best as I can tell from the evidence, this is bottlenecked on the memory bandwidth of iterating the JavaScript objects Construct uses to represent objects in the layout. So a single object issuing lots of quads would probably score significantly higher still. I've previously tried to optimise the way quads are issued, and it's made zero difference to the benchmark - presumably because the bottleneck is memory bandwidth iterating JS objects. So I think issuing a degenerate quad is fine: the performance penalty of sending a single extra vertex appears to be dwarfed by the other overheads.

    Further if there is some other triangles mode, it will actually mean breaking the batch to change rendering parameters, since the default rendering mode can only render quads as it's been so heavily optimised for it. So even if we went and did it, I think there is a chance it would actually be slower than sticking with degenerate quads, as the overhead of changing modes could outweigh the overhead of sending a single extra vertex.

    It's also possible to render pairs of connected triangles as quads, avoiding wasting a vertex. Our own engine does that for rendering mesh distortion. Our own engine also issues degenerate quads in a couple of corner cases where it just wants a single triangle. So I don't think there's any case for changing this, especially given the high level of complexity it would probably involve - in Construct, just go with degenerate quads. If you find some benchmark that proves it's unreasonably slow, let me know, and we can take it from there, but I think it's a good bet to say that won't happen.

    On two color tinting, this is a more complicated problem and could involve more performance overhead. However I still want to understand a bit more about exactly how it's used, as it significantly affects the potential solutions. Construct has a special fast-path for simple color-only affects like "Adjust HSL" - those can just be rendered normally with (more or less) another shader program selected. The shader parameters matter though and can affect the batch. However if you do something like set one set of parameters and then render 1000 triangles, it will be fine: it can still batch everything, as it can see the parameters aren't changing. However if you do something like change the parameters per-triangle, then there will be batch thrashing and things like adding an extra vertex attribute could come in to play. So my question is, how do people really use this? Do you need per-triangle colors? Do lots of people really make use of per-triangle colors so this really is something that will affect a lot of cases? The answers to these questions could mean the difference between it basically working fine as-is, to a very complicated overhaul of the entire renderer - something I'm very reluctant to do. So those answers are important.

    FWIW I've been nearing completion of our WebGPU renderer, and it works significantly differently to the WebGL renderer internally, but it still efficiently implements the same interface you get with IWebGLRenderer. This means both that trying to customise the renderer for things like extra vertex attributes is much more complicated, as there are two renderers that work significantly differently to support, and also that there is opportunity to make things much faster in the WebGPU renderer specifically.

  • I can speak for our project which uses the plug-in. I hope that badlogic can give a general sense for the wider Spine community.

    In our project, every 'slot' of the Spine character can be customized in terms of color by the players. We have a MMO-style game in Alpha and customizing your character is a big part of the experience and potential monetization. We can have 30-50 characters on screen at a time right now when we are busy (we are hoping for more in the future, but we will also start 'sharding' on our server-side when it starts to get too big. We already support multiple rooms/areas.)

    Each character can customize each slot (e.g. leftArm, head, shield, foot, etc.) with two-color tint, which the players do, to differentiate between each other (we also have different images.) There are roughly 25 slots per character. Currently, we don't do mesh deform, so 25 slots = 25 quads. However, we are considering doing mesh deform and that would probably change it to something 50-75+ quads (dividing up certain quads to have more points so mesh deform will look more appropriate to the design.)

    So, we would hope that at least it could be batched per character.

    (As an aside, in the current implementation using webGL, we render to C3 texture all the instances of a particular Spine object together using the spine-ts WebGL renderer, they are each rendering to a different dynamic texture though.)

    In general, customization is a key part of the game and the two-color tint is a big part of that customization.

    Here's a sample screen, it's just one case, we also have mass battles and team battles, CTF, etc. too, but this shows some of the variety of characters and colors.

    In our case, we would only need 'new' features for the webGPU renderer as we already have something working for webGL.

    As you work on the new renderer, I hope you consider these possibilities for flexibility of future development, allowing vertex attributes, different shaders, and also the possibility of access to multiple textures in a shader (for example we are using that with the current spine WebGL renderer to do palette colors, with a grayscale main texture and a separate palette texture, other uses could be for bump, normal, specular maps.)

    Thanks for the discussion Ashley.

  • Try Construct 3

    Develop games in your browser. Powerful, performant & highly capable.

    Try Now Construct 3 users don't see these ads
  • Ashley, thanks for the in depth reply! Totally understand that you won't change a complex rendering pipeline focused on quads to allow triangle rendering.

    Mikal already outlined the two color tint use case pretty well. To add a bit more info on the problem. Taking Mikal's example of a skeleton with 25 slots, where each can have its own colors, that'd translate to 25 draw calls when done with shader uniforms, i.e. one uniform for the color and another uniform for the second color. This can not be batched, as we have to set the uniforms for esch slot to be rendered. By using a vertex attribute for the colors, the shader and uniforms stay constant for all the slots, and thus slots can be batch rendered.

    As for use cases, it's a very common use case. In fact, Big N (those with the Mario games) were the ones who approached us about it a few years ago. It has been used by a large percentage of our user since we released the feature a few years ago, specifically in games where characters are customizable by the player. A simple single color tint does not provide the same functionality.

  • OK, thanks for the explanation. I think I better understand the usage, and I can see how there could be overhead from the batch updates.

    As ever though we're extremely busy and we get literally hundreds of other feature requests, so we have to be ruthless about prioritisation. I'm very reluctant to do lots of complicated work for one specific case. So I would suggest to start by getting it working with the features available, even if it's slow. Then (if it really is to slow) you can build some benchmarks demonstrating this in a measurable way, and then any improvements can also be measured. (Having been bitten by this before, I'm also very reluctant to do anything pre-emptive for performance; there has to be a way to measure it or you're probably wasting time.) After all, it's not 2012 any more - maybe things are OK even with a lot of changing uniforms, or some of the more simple use cases can already be covered, or when looking at a real benchmark there's other low hanging fruit that can easily improve the situation.

  • Good plan! Thanks for taking the time to respond.

  • Thanks Ashley - any estimate on when we could start testing this with C3 and WebGPU (I understand we can start dev now with webGL, since we'd be using the C3 SDK interface and it won't change.)

Jump to:
Active Users
There are 1 visitors browsing this topic (0 users and 1 guests)