When I said up and down I meant like 8d audio, which I checked and Construct 2 does support.
My system for detection seems to be working without the need for JavaScript, thankfully. Audio is played at an invisible Sprite's position, and tags, loops, duration etc. is stored in an array. When audio ends the corresponding invisible sprite is deleted and the audio is removed from the array. This is mostly tag independent, which weirdly enough might allow for multiple "tags" per audio. I'm surprised that isn't a feature yet either.
The system also has easing features for volume so its really easy to fade stuff in and out seamlessly, even if a new ease is called while a tag is currently easing.
In any case, the issue is solved. Thanks for your advice, I might look into making a custom audio plugin in the future.