Case study: the lecture notes that made a smaller model smarter

A first-person case study · July 2026 · 7 min read

This is the story that convinced me the input format matters more than the model tier. Same workspace, same files, same questions — the only change was converting PDFs to Markdown. The fast, small model I'd written off went from fumbling to precise.

The point: same workspace, same model, same questions — only the input format changed.

1 · The setup

Like a lot of people, I keep course materials in an AI workspace — the kind where you upload reference files once and the assistant consults them in every chat (Claude calls these Projects; Perplexity calls them Spaces). Mine held a semester of lecture notes as PDFs, exported from slides: a dozen files, dense with definitions, derivations and tables.

For quick revision questions I use the fast, inexpensive model tier — for me that was Haiku-class. Speed matters when you're asking twenty small questions in a row.

📸 Media slot — save as /assets/media/blog/lecture-notes/01-project-files-pdf.png · screenshot of the workspace file list while it still contained the PDF lecture notes (crop/blur any personal details). This box is replaced by the image once the file lands.

2 · The failure mode

The small model kept hitting a wall. Sometimes it said outright that it couldn't read the PDF. Sometimes it did something subtler and worse: it needed extra rounds of prompting to even locate which file held the topic, then answered vaguely, clearly working from fragments — the retrieval had surfaced glyph soup and it was doing its best to improvise around it.

📸 Media slot — save as /assets/media/blog/lecture-notes/02-answer-before.png · screenshot of a real "before" exchange: the model failing on, or answering vaguely from, a PDF note. Pick one where the question is specific and the answer visibly isn't.

The instinctive diagnosis is "the model is too small." That diagnosis is expensive — it pushes you to a slower, pricier tier for every trivial question. It also turned out to be wrong.

3 · The one-variable experiment

I converted the same lecture notes to Markdown (drag the PDFs into the converter, download the .zip, upload the .md files to the workspace), deleted the PDFs from the space, and asked the same questions again. Nothing else changed — same model tier, same phrasing.

The experiment design: everything identical except the upload format.

Measure	PDF notes	Markdown notes
Total upload size	—	—
Storage cap used	—	—
"Can't read the file" replies	—	—
Found the right file first try	—	—

📸 Media slot — save as /assets/media/blog/lecture-notes/03-size-compare.png · screenshot comparing the folder sizes (Finder list view of the PDF folder vs the converted .md folder, or the workspace storage meter before/after). The table above will be filled with the real numbers from your files.

4 · What changed

The difference wasn't subtle. The same small model now:

Read the files, every time. No more refusals or "I don't have access" hedging.
Found the right note on the first try. Asking about a topic pulled the correct week's file without me naming it — the Markdown headings gave retrieval something to match against.
Answered specifically. Definitions came back with their conditions attached; multi-part derivations came back in order; when a topic spanned two notes it said so and used both.

📸 Media slot — save as /assets/media/blog/lecture-notes/04-answer-after.png · screenshot of the "after" exchange: same question as slot 02, now answered precisely from the .md note. If possible, capture the citation/source popover showing it pulled the right file.

5 · Why a format change makes a model "smarter"

Nothing about the model improved — its input did. Three mechanisms, in order of impact:

The text became text. A slide-export PDF stores positioned glyphs; whether an assistant can reconstruct readable text from it varies file by file, page by page. Markdown removes the reconstruction step entirely — the words are just there. (The gory details: Why PDFs are hostile input for LLMs.)
Retrieval got handles. Workspaces search your files and pull matching chunks into context. Markdown headings chunk along topic boundaries and carry searchable titles; PDF extraction chunks along arbitrary page breaks. Better chunks in, better answers out.
Tokens stopped being wasted. The Markdown notes are a fraction of the size, so each retrieved chunk is dense with content, and more distinct chunks fit into the answer budget.

Small models feel these effects most because they have the least slack to spend on repairing bad input — which is exactly why the upgrade looked like the model getting smarter.

6 · Honest caveats

Conversion is lossy for layout: multi-column slides and heavily visual pages flatten. The fidelity report tells you per file what was detected and what to double-check.
Scanned pages (no text layer) can't be converted yet — they're flagged, not faked.
Equations survive as text approximations, not rendered math; for equation-dense notes, spot-check the critical ones.

7 · The recipe

Drag your PDFs (or .docx, or notebooks) into MakeItMarkdown — multiple files at once is fine.
Skim each fidelity report for warnings.
Download the .zip, upload the .md files to your workspace, and remove the originals from it.
Keep the originals wherever they live — you're changing what the assistant reads, not your archive.

Try it on the file your assistant struggles with most. The conversion runs in your browser — nothing is uploaded.