So can't you do it by splitting it up in to 3 tracks: intro (clean), intro (with tail), and the rest of the main track? I don't think that's a hack, that seems a pretty reasonable way to do it.
I'm not sure if you realise, but scheduled audio does not actually usually work with audio categorised as music. These tracks are normally backed by <audio> elements, which don't support sample-accurate playback - only Web Audio buffers do, which is how audio categorised as sound is played.
On some platforms for various reasons like compatibility we actually ignore the audio categorisation and play everything as sound anyway (i.e. via Web Audio buffers). I've never seen or heard of any particular negative consequence of doing that, so I think it's fine to do that in your case too if you need sample-accurate playback of music tracks, and audio categorised as sound can be played with unlimited overlap.