It tells you how to approach the problem.
First make your base game, with x number of inputs. This will run in the background, it is up to you if you want to make any of it visible.
Then, make a copy of every object per input, offset by a certain amount based on the position of each player for their individual "viewports".
Have each individual viewport on its own layer, so you can use blending modes to slice the total viewport into halves or quarters or whatever you want.