Hmm I think you will need a copy of every sprite object in the game per "camera", and a system to keep track of each "camera" position relative to reach other, and use those to offset the objects. Each set of your sprites would be on their own layer and you would use a blending mode like destination in with force own texture on the layer to create a clipping mask.
Just theory crafting here, have that rough idea in my head but no idea how feasable it is. Might give it a shot later.
It would need a lot of custom code to deal with interaction and sync the viewports'objects... Not unlike net code actually.