Also people are very sensitive to jank/juddering/uneven movement. If the logic is running at a different rate to the rendering, the motion becomes uneven. For example logic at 45 Hz with rendering at 60 Hz means every rendered frame shows alternating step distances, e.g. 10px, 15px, 10px, 15px... given how sensitive people are to this (particularly when scrolling), I think the only solution would be to go for an integer multiple/divisor of the framerate to guarantee every rendered step moves the same distance. E.g. logic at 2x, 3x, or 1/2, 1/3.
...and then if you only run logic at an integer multiple/divisor, there is no solution - you can't choose a logic rate that covers all these display rates with smooth motion.
I think you may be missing or misinterpreting the part about interpolation/extrapolation. The idea is that the logic should run at a constant rate, while the rendering is a separate process that interpolates or extrapolates the logic steps to match any framerate the display uses. The render and the logic don't even need to be in sync, they could be different threads running in parallel on different cpu cores.
To use your example, if interpolation was used the result would be a smooth movement with no jank whatsoever:
*The gray balls represent the logic steps which are the real positions of the simulated object, while the red balls are the interpolated positions used just for rendering (they have no influence on the simulation).
With this method it's possible to render at any display rate without affecting the simulation:
As you can see with 120fps you just need to do more samplings to the logic steps while the simulation remains unaffected. Everything would behave the same regardless of framerate, just like when testing. There would be no unexpected behavior like tunneling on untested framerates since the simulation is always the same. And like variable timestep, CPU usage is matched to the framerate and can even be smaller since it doesn't need to step the logic at every rendered frame. This means it can use less battery, make phones cooller, and significantly increase the headroom to do interesting stuff with the CPU.
What I don't really get about this is if you only step the game at 30 Hz, there is nothing new to draw in between logic frames.
...I think some devs propose a "lite" step in between which just advances motion without doing anything else, but this worsens tunnelling problems (new scenario: object actually seen colliding, but no collision registered!), and could still use a bunch of CPU e.g. if it's a large physics game.
This is were the debate about interpolation vs extrapolation comes in. With interpolation you get a very smooth result, but it comes at the cost of rendering with up to 1 logic step of delay. Since you need to know the next logic step to do a proper interpolation with the previous step, you delay the render until the next step is available. This adds a bit of latency to input vs feedback. For a logic step of 30fps the latency is about 33ms, for 45fps 22ms, for 60fps 16ms (it actually looks like this on a timeline). But it should be fine for most games since according to Wikipedia the average input latency for games is 133ms, 67ms for quick action games. If a game needs a faster input response one can just increase the rate of the logic step.
The other option is extrapolation, which is basically using the past steps to try to guess the future. It doesn't have the downside of latency but it adds a bit of imprecision to the perceived movement. For instance, a fast moving object may seem to overlap another solid for a fraction of a second before moving to it's true just-touching position. Or an object that changes direction constantly may drift a bit from it's real position on the simulation. However this has no influence on tunneling since it only affects the display position not the game simulation. What collides in preview should also collide in any framerate. It may cause the perception of some objects colliding but not registering though (when passing close on a curved motion).
A third option is combining the two to get an average of the benefits and compromises of both. Like you could delay only half logic step, using extrapolation for the first half step (while the future logic step is not available) and interpolation for the other half step (after the new logic step is available). This would result in half latency with a bit of drifting. It's even possible to transform this into a slider that goes from full interpolation to full extrapolation with proportional steps in between.
The thing is, depending on the game one solution may be preferred to the other. In my view interpolation is superior, but having an option to switch between the two is feasible.
And don't forget on weak systems fixed-step games go in to slo-mo instead of stepping further to compensate.
That's true, but speaking as a developer I prefer the game to go on a slo-mo in too weak systems than objects going through walls or jumping different heights. Although if my game was failing to run at a decent speed on my target devices I would probably adjust the logic step rate to accommodate those weaker systems and avoid any slo-mo.
If desired it's also possible to still use an integer dt in those cases to step the logic only once. Like if the logic needs to step 2 times in a render frame, you set dt to 2 and step once. The result would be much more stable than a dt that varies every frame with complex fractions.
However I would still prefer the game to step twice the logic instead of using dt, simply because it makes things more deterministic and easier to deal with. After all most games are limited by rendering and not by logic, so stepping the logic more than once in a slow frame should not add too much overhead.
For example even with fixed steps, you can never assume your platform behavior will land at Y = 100, because floating point errors (literally occurring in the CPU circuitry) mean it will probably land at something more like Y = 99.9999999978.
The main advantage of a fixed step is that the behavior will always land at the same height regardless of framerate given the same input conditions, while with a variable time step you can't ensure that. It's less about the math being 100% correct and more about the movements being predictable. The same maximum jump height you get when previewing at 60fps, you will get at any framerate. And with new displays increasing their refresh rate there's a chance the drift (when using variable timestep) will tend to get worse, as exemplified in the previous link I posted.
So I still find this may be a better solution than the current variable timestep. It adds a bit of overhead due to the interpolation but overcompensates by not having to execute the logic at every rendered frame and by being able to run the renderer in parallel. And as a bonus solves other problems like unpredictable tunneling, nondeterminism, and the difficulty of begginers in dealing with dt.