Replaying input would be hard to get exact. I mean you could save the time with the input used but as you said, a variable framerate would through it off.
You could change your logic to run a fixed number of times per second so input would only be handled at those times. Then to keep things smooth you'd interpolate object positions between logic steps. The pro is replays will be practically exactly the same, but the con is a huge invasive rewrite of your game. Probably too tedious to do since you also wouldn't be able to use behaviors since we can't control when they update.
The first idea you said won't work sounds better. It can be made to work with different types as well as with objects getting created and destroyed. The simplest way to save a frame would be to take an empty array and the for each of each object type, push it's name, its position and any other info. When that's do you can store it in another array by using array.asJson of the first array.
Loading a frame is then just a matter of taking one of those json strings, loading it into an array and looping over it and creating objects based on the saved name. Basically just recreating everything every frame.
To have it work with varying framerate or a different playback speed you'll need to save the time with the json of the frame, and then interpolate between the two closest frames.
An example is needed here to fully illustrate it. There are probably a few nuances I'm not thinking of.