There are certain advantages for running a project like this in a 2D environment, with arrays governing the action and the engine simulating 3D and I believe that this is how Fez works.. 2.5D, 2D simulating 3D..
You can fairly easily simulate a 3D environment with 2D objects being represented based on their position in an array. Crudely, off the top of my head, you can have each object represented by a number in the 3D array... say, 1=player, 2=floor... have a series of 2s filling up the x and z of one layer of y.. have a global variable just to store which perspective you're in... if perspective=1, then move all floor-objects to Z,Y. if perspective=2, then move all floor-objects to X,Y... you could have the floor objects move at a speed to show a transition...
And for the algorithm handling player acceleration, you could have a global variable that counts up to 3.. for every .2 second that [movement key] is pressed, add 1 to moveVar. for every .1 second that no movement keys are pressed, subtract 1 from moveVar... For the next part, I'm having a little trouble thinking of how to do this... but you'll want to shift the "1"'s position in the array an amount of spaces in the array = to moveVar... if you have the objects oversized, and the array very big, the plots in the array can be a few pixels big each, so moving 1-3 spots every .05 seconds could be pretty smooth movement....
Anyway, not sure how much of this is coherent or helpful.. But your project is ambitious and I'd love to see it come to fruition.