Generally speaking, you can get the effect you want by making the origin of every sprite at it's base, or lowest point.
This will not work for complex shapes though, which will require stitching together multiple simple shapes.
Elevation adds another headache, where you won't be able to use automatic z sorting by y position at all. The simplest case being that the player can have the possibility of being either on top of the cliff or behind it at the exact same single x/y coordinate.
That said, there is a reason those 2d tilemaps based games are designed the way they are, in that it's probably not worth the extra complexity and buginess for something that might not be actually desirable for the user in terms of gameplay, even though the developer might think it's cool (to have the player character or other objects be occluded by terrain).
The best way to do implement this imho is to actually learn and work with 2d artwork inside a 3d framework/engine instead of trying to simulate too much 3d in a 2d engine. You're just going to run into more and more technical issues to get frustrated with and stuck and waste time trying to find solutions that may or may not exist to begin with instead of actually working on making the rest of your game.