Construct has speech recognition and text to speech features. I have not used them, but I don't see why you couldn't use the two to accomplish what you need.
1 Speech Recognition, save the "Input" as a var string.
2 Text to talk use the speech, using the var string.