- Pick output based on conditions proper to each possible output nodes
An 'On node entered' trigger, with subevents with actions to choose which output to go to.
- Having a specific dialogue node that wait X second before executing the regular Dialogue node condition logic
An 'On node entered' trigger, with a wait action before going to the output. (You could for example have a "wait" node with the same tag in multiple places, and re-use it.)
- Playing different sounds, or putting different actions in every single nodes (let's say you have 5000 nodes as you suggested)
Have an output with a string of the sound to play, e.g. output name "PlayAudio" with value "SFX5", and in 'On node entered', if an output "PlayAudio" exists, play the sound in its value.
- Having multiple value to handle for each node (avatar image, avatar animation, text animation, related QuestID, including variables within the actual Dialogue text)
Outputs are also designed to store data, such as in the prior example I gave, where the sound to play is stored in the node. You can use multiple key-values, essentially using each node as a small Dictionary storage.
So I think all that can be done with the current design. I think it will take some time for people to use it and get familiar with how it works and the patterns you can use with it. Maybe some things will turn out to be a headache to implement, but I think at the moment things are very speculative and everyone needs to try things out in practice and then give us feedback for real-world uses. We also have lots of improvements in the pipeline so the way it is right now isn't necessarily its final form.