Feeding subtitles to an LLM

MakeItMarkdown · July 2026

Every talk, lecture and meeting recording has a transcript hiding in its subtitle track — and subtitle files are almost, but not quite, readable text. The "not quite" is what wastes tokens and confuses citation.

What's in the raw file

42
00:03:17,240 --> 00:03:21,900
<i>so the second condition
only holds when...</i>

Sequence numbers, millisecond-precise timing pairs, styling tags and mid-sentence line breaks — for every line of speech, roughly two lines of apparatus. Pasted raw, a one-hour talk spends a third of its tokens on numbers nobody will ask about.

The element mapping

In the .srt / .vtt	In the Markdown
Sequence numbers	Dropped
Timing pairs	Compact `[03:17]` markers, one per block — enough to jump into the video
Styling tags (`<i>`, positioning codes)	Stripped
Caption text	Flowed into readable transcript lines
WebVTT headers, NOTE/STYLE blocks	Skipped
Malformed blocks	Skipped and counted in a fidelity warning

What this unlocks

"Summarize the section around [41:30]" — time markers give the model stable addresses into the recording.
Meeting minutes from the auto-captions your video tool already produced.
Workspace-ready lecture transcripts that chunk sensibly (see Markdown for AI workspaces).

Before → after

In the file

42
00:03:17,240 --> 00:03:21,900
<i>so the second condition
only holds when…</i>

In the Markdown

[03:17] so the second condition only holds when…

FAQ

Speaker names? Kept when they're in the caption text (as most auto-captioners emit them); subtitle formats have no reliable speaker field to extract.

.ass/.ssa? Not yet — request it with a sample; the parser family makes additions cheap.

Full timestamps? Hours appear when the recording passes one hour: [1:03:17].

Drop the .srt from your last recorded meeting.