Feeding subtitles to an LLM
Every talk, lecture and meeting recording has a transcript hiding in its subtitle track — and subtitle files are almost, but not quite, readable text. The "not quite" is what wastes tokens and confuses citation.
What's in the raw file
42
00:03:17,240 --> 00:03:21,900
<i>so the second condition
only holds when...</i>
Sequence numbers, millisecond-precise timing pairs, styling tags and mid-sentence line breaks — for every line of speech, roughly two lines of apparatus. Pasted raw, a one-hour talk spends a third of its tokens on numbers nobody will ask about.
The element mapping
| In the .srt / .vtt | In the Markdown |
|---|---|
| Sequence numbers | Dropped |
| Timing pairs | Compact [03:17] markers, one per block — enough to jump into the video |
Styling tags (<i>, positioning codes) | Stripped |
| Caption text | Flowed into readable transcript lines |
| WebVTT headers, NOTE/STYLE blocks | Skipped |
| Malformed blocks | Skipped and counted in a fidelity warning |
What this unlocks
- "Summarize the section around
[41:30]" — time markers give the model stable addresses into the recording. - Meeting minutes from the auto-captions your video tool already produced.
- Workspace-ready lecture transcripts that chunk sensibly (see Markdown for AI workspaces).
Before → after
In the file
42
00:03:17,240 --> 00:03:21,900
<i>so the second condition
only holds when…</i>In the Markdown
[03:17] so the second condition only holds when…FAQ
Speaker names? Kept when they're in the caption text (as most auto-captioners emit them); subtitle formats have no reliable speaker field to extract.
.ass/.ssa? Not yet — request it with a sample; the parser family makes additions cheap.
Full timestamps? Hours appear when the
recording passes one hour: [1:03:17].
Drop the .srt from your last recorded meeting.