Format guide · .srt / .vtt

Feeding subtitles to an LLM

Every talk, lecture and meeting recording has a transcript hiding in its subtitle track — and subtitle files are almost, but not quite, readable text. The "not quite" is what wastes tokens and confuses citation.

What's in the raw file

42
00:03:17,240 --> 00:03:21,900
<i>so the second condition
only holds when...</i>

Sequence numbers, millisecond-precise timing pairs, styling tags and mid-sentence line breaks — for every line of speech, roughly two lines of apparatus. Pasted raw, a one-hour talk spends a third of its tokens on numbers nobody will ask about.

The element mapping

In the .srt / .vttIn the Markdown
Sequence numbersDropped
Timing pairsCompact [03:17] markers, one per block — enough to jump into the video
Styling tags (<i>, positioning codes)Stripped
Caption textFlowed into readable transcript lines
WebVTT headers, NOTE/STYLE blocksSkipped
Malformed blocksSkipped and counted in a fidelity warning

What this unlocks

Before → after

In the file

42
00:03:17,240 --> 00:03:21,900
<i>so the second condition
only holds when…</i>

In the Markdown

[03:17] so the second condition only holds when…

FAQ

Speaker names? Kept when they're in the caption text (as most auto-captioners emit them); subtitle formats have no reliable speaker field to extract.

.ass/.ssa? Not yet — request it with a sample; the parser family makes additions cheap.

Full timestamps? Hours appear when the recording passes one hour: [1:03:17].

Drop the .srt from your last recorded meeting.