I didn't test it so there may be errors. Also I can't look at the capx till tomorrow.
Z is the distance from the eye, which is at z=0. The camera is at z=1. Anything further will have a greater z which in turn will look smaller since the perspective is calculated with x/z, y/z. Basically the bottom width divided by 1.7 should equal the top width.
1. The idea I had was to find how far away the back edge would have to be if it was that width. Without perspective the top edge should be just as wide as the bottom. With perspective the same projection formula can be used proj_width=width/z. Sovlving for z it come out to z=proj_width/width, and then it's just a matter of interpolating by y or v.
2. The 0.5 is so we're scaling at the center instead of the left edge. U and v are in the range of 0 to 1, so 0.5 is halfway. Also /z is scaling.
As a simple example say you wanted to double the x distance from 320 you'd do:
X=(x-320)*2+320 if instead we wanted to just scale from 0 it would be
X=x*2