Additive blend simply comes down to gl.blendFunc(gl.ONE, gl.ONE). The results of this are implemented by the native GPU driver. This call does not involve a shader, and has much less overhead than a shader. Third-party shaders will almost certainly be less efficient. My best guess is some GPUs are optimised for alpha blending, and using additive blend uses a less-optimised path (perhaps because it has to clip the output - you can add pixels which go beyond the pixel value range, whereas you can't with alpha blending, so additive blending probably involves an extra clamp operation).
I've reached out to a contact on Intel's embedded gpu team to get more info on this. What you've posted above definitely shouldn't cause what I'm seeing on every Intel chip I've tested on (and it only happens in C2/3), but I've also seen the same performance and graphical issues on a Shield TV/Tegra, so it's probably not GPU-specific. Based on the way C2's renderer works, should this work the same whether or not the blend is applied to a layer vs. a sprite, assuming both are set to have no opacity and backgrounds are solid 0,0,0 black?