What Prominent is trying to explain is how to use a single image as a basis from which C3 is to derive multiple frames. To work with a single spritesheet (image) is the very point. Placing it in the animation (film strip) window nullifies the main idea because the film strip shows discrete frames, and whole point is to have a full spritesheet to work from.
In my view, in this system that prominent is suggesting, the film strip could be the means of viewing the animation frames as defined by the 'spritesheet editor'. But yes, the window to edit the spritesheet may look different (ie different spritesheet-centric functions) for the user to immediately see the context of the edit.
I think each Animation could have a property that controls which image and context its sourcing from: single frames, spritesheet. If single frames, then a list of image frames is populated, and we are working in the same way we do in C2.
If we use the spritesheet as a source, then that's a different data, where we are sourcing a 'spritesheet object', whereby a single image is used, and then bounding boxes defined which extract the pixel information, then generates a list of images based on the extraction.