Damn, you beat me to it. I was building something similar but got too caught up optimizing the context extraction. I actually ended up building a full spec for it—basically a PoC of "grep for videos."
My end goal was to let an agent make semantic changes (e.g., "remove the parts where the guy in the blue dress is seen") by simply grepping the context spec for the relevant timestamps and using ffmpeg to cut them out.
Vector embeddings are fuzzy on finding boundaries. With my spec approach, my goal is to get precise start/end times for ffmpeg to do edits. The downside is, that there is a lot of pre-processing of raw footage in my approach. Vectors win on zero-shot flexibility here.
As a creator who films long form content, editing (specifically clipping for short form) is such a nightmare - this solves such a huge problem and the ui is insanely clean.
great to hear — I'd recommend using the clips tile to create clips, but you can also use the rough cut tile to help edit down the raw footage for the long-form
i playback parts of the cinematic edit I made to the conversation between Dwarkesh Patel and Satya Nadella (e.g. added cinematic captions, motion graphics)
i can post the full edit as well if you're interested
Well, half the comments are a variation of "this is so cool... I'm building something similar" so you'd expect them to be incredibly supportive, and with the churn in the AI field a 6 month old account with 100+ karma is relatively ancient!
This is one of those ideas that seems obvious after you hear about it, yet somehow didn't exist yet. So many potential applications. Met the founder back in SF and he's one of the coolest, down to earth dudes there is. Best of luck to the team!
[see https://news.ycombinator.com/item?id=45988611 for explanation]