It doesn't look too bad. Generating the graphics would be the hardest to do. Drawing layer by layer or using a voxel editor are a couple options. Personally I think It would be cool to just slice up textured 3D objects, but I haven't seen a workflow to do that yet.
The graphics would probably need to be lower resolution to keep the video ram usage low.
Rendering using multiple instances for every frame isn't so bad. Probably just a matter of zordering everything by frame. The rotation may take a bit of math, but it's not bad either. Animation is just more images and isn't bad, maybe a couple more events.
Now the paster plugin could be used to cut down on the number of instances needed and the amount of things to be drawn. The static scenery could all be drawn to a paster per layer that way you only have draw those and the moving objects per frame.
Another plus of using the paster object is all the rotation math isn't needed.
Besides that the vertical squish can't be done with vanilla C2.
Collisions would be done in the same way as isometric. Everything is done from a top view.
I'm not very proficient with shaders but I don't think they would be helpful here because for one you can't access more than one source image in C2.
A plugin may work, but internally it would be much more complicated to make than with events imo. Not to mention it would be less flexible.
Reenabling the "front to back" may give some rendering improvement but at the same time the paster plugin wouldn't work with it, but that may be a small loss.