I’m unfamiliar with any games like that and tried to watch that tutorial but it’s pacing is pretty slow.
How it’s done is literally like making any game. The only thing specific to that game style is you just make things visible or invisible to show motion.
Key controls move the player.
Any text would just be a spritefont, or a sprite per character, or if you are feeling more ambitious, a sprite per part.
Then you have an update function that you call which with change what’s visible and check if conditions are met and whatnot.
At its core you have a player that has a position and a list of objects. They all have a location, which is just one of the sprites that can be made visible.
Every update you change the location and check if say the player has an object right above him.