The way HTML layers work is by creating an additional <canvas>
element per HTML layer, so other content can be drawn to that and layered above other HTML content. However this has quite a high performance overhead. Firstly and probably most significantly it will use a lot of GPU fill rate as it has to draw at least one additional viewport-sized texture (and possibly 2-3 depending on the realities of compositing) - a similar overhead to using one or two 'own texture' layers. Secondly it will also use more GPU memory as it allocates another viewport-sized surface (and possibly 2 depending on details like double buffering). A HD (1920x1080) size surface is about 14 MB.
My advice is to only use the minimum necessary HTML layers to avoid unnecessary performance overhead.