There is a fundamental tradeoff: either you preload the audio so it can play instantly, which makes loading take longer, or you can not preload the audio, which makes a delay the first time each sound is played. The audio has to be loaded at one time or another, and those are your two options.
How long does it take exactly? Which browsers/OSs have you tried? Desktop machines should be able to decode large amounts of audio in just seconds. Also as ever share a .capx to demonstrate the problem better.