There's more involved, but to keep it simple and imaginable:
Think of the screen as a canvas. This canvas is stored in the vram and doesn't change (unless you switch resolution, where everything is rearranged, but let's forget about it now). Now everything is drawn to that one canvas.
Let's say, that canvas is 1280x720. Whatever you draw now is clipped to that extend. A 20000x20000 image will only take up 1280x720. It's the same as using a paint program. You set up a canvas and draw to it, and nothing will be drawn outside that canvas.
BUT, the sources need to be placed in vram, too. So, if you place an image with 128x128, it will consume exactly the space needed for 128x128, no matter how large you draw it to the output canvas. The same applies if you draw it smaller. It will still consume the space for 128x128, because the source doesn't change at all.
It's a bit different with vertex data. These are also stored in vram. But they don't use much space. Again to keep it simple, let's forget about finer details. There are 3 informations per vertex point (x, y, z) Even if they were of double precision, they would only need 24 bytes per point. A sprite with 10x10 vertex points would consume ~0.002 mb (with single precision it would be 0.001 mb). Of course there are overheads, but basically that's it. Calculating the points (distorting the source image while drawing to the output canvas) is a concern, as it takes more gp time than just drawing the source image.
Of course, this is a simplified image of what is really going on, but this makes it easier to understand, I think.