LLMs don't just read Markdown — they speak it

MakeItMarkdown · July 2026 · 7 min read

Try a small experiment. Ask any chat model to explain something — a tradeoff, a recipe, an error message. Look at the shape of the answer: a bold lead, a bulleted list, maybe a ## heading and a fenced code block. Nobody asked it to format anything. The model reached for Markdown the way you reach for your native language.

That's not a UI trick. It's a fact about how these models were made, and it has a practical consequence most people miss: the format you paste in is not neutral. Some formats land in the model's native register. Others make it do translation work before it can even start thinking.

The point: the format you paste in is not neutral — Markdown lands in the model’s native register; everything else costs a translation step first.

1 · Why Markdown is the native register

Three forces, all pointing the same direction:

What your chat renders

Trade-offs

Speed — the cached path wins
Cost — batch where possible

What the model actually streamed

## Trade-offs

- **Speed** — the cached path wins
- **Cost** — batch where possible

The training corpus is soaked in it. README files, documentation sites, wikis, developer forums, chat logs — an enormous share of the technical text a model learns from is Markdown or renders from it. The model has seen # heading mean "heading" literally billions of times.
Chat tuning rewards it. Assistant models are fine-tuned on conversations where good answers are structured answers — and the structure is written in Markdown. Producing clean Markdown is, quite literally, what these models were graded on.
Every chat interface renders it. The bold text and tidy lists you see in a chat window are Markdown being rendered live. Output format and display format agree, so the whole ecosystem keeps reinforcing it.

So when your document arrives as Markdown, its structure is expressed in the exact vocabulary the model uses to organize its own thoughts. A ## is not a hint to be decoded; it's a first-class token pattern the model has an extremely strong prior about.

2 · What other formats make the model do first

Here is the same document arriving two ways, as a pipeline:

With a pasted PDF extraction, the model must first infer where lines break into paragraphs, which fragments are headings, which runs of numbers were once a table — all from typography that no longer exists. Modern models are impressively good at this guessing. But every guess consumes capacity, and a wrong guess doesn't announce itself: the model just answers confidently from a slightly wrong document. We measured what this does to tables and layout in Why PDFs are hostile input for LLMs.

With Markdown, that entire first stage disappears. Heading levels, list nesting, table cells, code boundaries — all explicit, all in the model's home notation. Reading comprehension starts at sentence one.

3 · The token bill

Structure has a price in tokens, and Markdown's price is close to the minimum. The same three-row table, three ways:

<table><tr><th>region</th><th>units</th></tr>
<tr><td>North</td><td>1204</td></tr>
<tr><td>South</td><td>980</td></tr></table>   ← 123 characters

{"rows":[{"region":"North","units":1204},
{"region":"South","units":980}]}              ← 72 characters, keys repeat per row

| region | units |
| --- | --- |
| North | 1204 |
| South | 980 |                               ← 63 characters

The gap widens with real documents, because office and notebook formats are containers: fonts, themes, XML plumbing, embedded previews. The text you care about is a minority of the bytes. Some honest numbers from our own built-in samples and test files:

Source	Original	As Markdown
2-page business PDF	26.6 KB	0.9 KB
Word report (.docx)	36.4 KB	0.9 KB
Data-analysis notebook, 100+ cells	3,465 KB	261 KB ≈ 64K tokens

The notebook row is the dramatic one: as raw JSON it doesn't fit in most context windows at all; as Markdown it fits with room to spare. (Small office files exaggerate the ratio — their fixed container overhead dominates — but the direction never flips: Markdown is the text, minus the plumbing.)

4 · Where Markdown is not the answer

Fairness requires one caveat. Deeply nested, machine-generated data — API payloads, config trees — is often better left as JSON, which models also read fluently; flattening it into prose can lose precision. Markdown wins for documents: things with headings, paragraphs, tables, figures and code, written for a reader. That is exactly the shape of lecture notes, reports, articles and notebooks — the things people actually paste into chat windows.

5 · The takeaway

Models answer from what they can parse, in a register they were trained to think in. Markdown is that register. Convert once, and every downstream use — pasting into a chat, uploading to an AI workspace, indexing for retrieval — starts from the model's native language instead of a guessing game.

See what your document looks like in the model's native format — converted locally, nothing uploaded.

Also see: the field manual