Hmm... if I were to take a stab at it I would imagine I'd need a (project) file containing a way to identify the relevant video clip, each subtitle block, the time it should begin, and the duration it should stay visible (and possibly positional information?). This file would need to be created manually and could be parsed on start of layout to load into an array.
A single text object would display all the subtitles. The text object would have one timer tag to keep track of the total duration of the video, and to trigger loading each subsequent subtitle into the text object. It would also have a separate duration timer tag to clear the previous subtitle upon triggering. A simple incrementing counter variable representing the array index could be used to display each subtitle block in sequence.
I can put together an example at a later time if this doesn't make sense.