I can now see that using the "wait until the correct character is typed before moving on" model would probably be dramatically simpler. I think that's also a fine way to go. So if a wrong character is entered, it does appear - but it's red - then a buzzer sounds, and the character flashes and goes away, leaving the user at the previous state, with the insertion point ready.
I think this must certainly be a far easier approach. It seems like it would also be less confusing to the user. Downside is it will not provide as much learning opportunity to troubleshoot and fix text, via the arrow keys, backspace, etc.
I think this route is the more sensible choice.