Buckle up buttercup, this one is some real work.
I have tried in previous projects to detect based on amplitude and frequency of audio, and found the audio object lacking in this ability.
I will show you how to prerecord it with a string, using your own "beat studio environment":
Step zero: Create a variable to track how many ticks since you've started your song.
Step one : record your inputs to a variable, that is updating to an on-screen textbox. Do this in a way we can later separate the values based on the times.
So if you hit x set textbox to textbox.text + "X," + timerSinceSongStarted.Value + ";"
And if you hit Y set textbox to textbox.text + "Y," + timerSinceSongStarted.Value + ";"
So now you've created a string that closely represents when you expect spawns versus the time the song has been running and it looks something like this:
X,122;Y,125;X,134;Y,134;
Step 2: Copy this string out of your textbox, back into a local variable and use a repeat tokencount(string,';') times and tokenat(string,loopindex,';') to access each value as time progresses with your song playing again, and make it show a sprite or something on each beat. Use this visual data, to manually modify the strings timer values, to calibrate the values to match the game better than your original reactions allowed when recording your inputs. Tweak a value here and there, when you notice one of your inputs doesn't match the song as well as you'd like. It will be helpful to add an on-screen viewer of the timers value, so you can track down inputs you want to change.
So now we still have a string of "tokens" that closely represent when we expect spawns to hit, you will need one of these for each song.
Step 3: In your game, you will want to load up a song string into an array, before its played. so you can access that array at x where x will be the songs timer.
I wont post the exact way to load it into the array, but I will give you direction: Use array insert to add values to the array, make the position in the array match the timer we stored with the tokens. You can loop through your token string as we did before with repeat tokencount times to insert all the values.
Then when a player is actually playing, continuously test the current song timer against the array, and spawn the correct object represented by the array value stored there.
Understanding arrays is the backbone of large data manipulation and an important hurdle for most coders. It's just a spreadsheet of values accessed by position but understanding it truly is fundamental. Key'd arrays are easier to understand, but really, x,y,z are all you actually need.