My 3d program uses 'isometric' for a perspectiveless camera, I should've looked it up. I didn't convey what I meant properly by that, I know boxes are rendered with perspective, I meant for the entire camera because it seems like the camera's fixed at orthographic, and so can't move around freely in a 3d world with perspective (I shouldn't have mentioned that since I don't entirely understand how the camera(s?) work). Is there another camera for the 3d box? I guess my head isn't wrapped around the concept of moving the camera around in 3D when it's rendering both orthographic and regular perspective at the same time. Forget that point.
Again, I still have to disagree though, via the point you made - construct already has a lot of power to make 3d games as it is now (if you don't mind using boxes). I think even an 3D RTS or a 3d mario-type game could be done with only adding animated meshes and lighting (and being able to click on a 3d object, forgot that one).
Making a 3d box jump or run around is easy already. For the RTS, you can use a sprite with RTS movement elsewhere on the layout to control the x and z position of the box (mapped to the x and y positions), as well as the yaw, and solid sprites as obstacles, so while it seems like the box is navigating around obstacles in 3D, it's really a normal sprite doing it offscreen with 2d pathfinding. It's a little clumsy, but it works.
Animated meshes I don't know anything about, so I won't comment on.
I should have said basic 3d box collisions. I know exact 3d collisions are hard, but simple ones can be done now via events (I know how to check if a point in 3d space is in a box, but if not two boxes are overlapping). They can be faked to some extent by checking in 2d for two dimensions (like I mentioned above for the RTS) and use events for the third. It wouldn't be the most exact method in the world, but it would work for basic games.
Sprites are already setting their X and Y - I remember someone saying it would be possible to make a 3d sprite object, if it required too much modification of the normal sprite.
Honestly, I think you overestimate the difficulty. I don't mean to suggest that I know better than the developers, or how long it should take but from what I know about 3D, they don't seem like either insurmountable tasks or unreasonable requests.
Clarification: I think they're right that they should concentrate on the 2D part first. This is post 1.0 stuff.
Sorry about the wall of text.