Start here · 01

LLMs don't just read Markdown — they speak it

Try a small experiment. Ask any chat model to explain something — a tradeoff, a recipe, an error message. Look at the shape of the answer: a bold lead, a bulleted list, maybe a ## heading and a fenced code block. Nobody asked it to format anything. The model reached for Markdown the way you reach for your native language.

That's not a UI trick. It's a fact about how these models were made, and it has a practical consequence most people miss: the format you paste in is not neutral. Some formats land in the model's native register. Others make it do translation work before it can even start thinking.

The point: the format you paste in is not neutral — Markdown lands in the model’s native register; everything else costs a translation step first.

1 · Why Markdown is the native register

Three forces, all pointing the same direction:

What your chat renders

Trade-offs

  • Speed — the cached path wins
  • Cost — batch where possible

What the model actually streamed

## Trade-offs

- **Speed** — the cached path wins
- **Cost** — batch where possible

So when your document arrives as Markdown, its structure is expressed in the exact vocabulary the model uses to organize its own thoughts. A ## is not a hint to be decoded; it's a first-class token pattern the model has an extremely strong prior about.

2 · What other formats make the model do first

Here is the same document arriving two ways, as a pipeline:

With a pasted PDF extraction, the model must first infer where lines break into paragraphs, which fragments are headings, which runs of numbers were once a table — all from typography that no longer exists. Modern models are impressively good at this guessing. But every guess consumes capacity, and a wrong guess doesn't announce itself: the model just answers confidently from a slightly wrong document. We measured what this does to tables and layout in Why PDFs are hostile input for LLMs.

With Markdown, that entire first stage disappears. Heading levels, list nesting, table cells, code boundaries — all explicit, all in the model's home notation. Reading comprehension starts at sentence one.

3 · The token bill

Structure has a price in tokens, and Markdown's price is close to the minimum. The same three-row table, three ways:

<table><tr><th>region</th><th>units</th></tr>
<tr><td>North</td><td>1204</td></tr>
<tr><td>South</td><td>980</td></tr></table>   ← 123 characters

{"rows":[{"region":"North","units":1204},
{"region":"South","units":980}]}              ← 72 characters, keys repeat per row

| region | units |
| --- | --- |
| North | 1204 |
| South | 980 |                               ← 63 characters

The gap widens with real documents, because office and notebook formats are containers: fonts, themes, XML plumbing, embedded previews. The text you care about is a minority of the bytes. Some honest numbers from our own built-in samples and test files:

SourceOriginalAs Markdown
2-page business PDF26.6 KB0.9 KB
Word report (.docx)36.4 KB0.9 KB
Data-analysis notebook, 100+ cells3,465 KB261 KB ≈ 64K tokens

The notebook row is the dramatic one: as raw JSON it doesn't fit in most context windows at all; as Markdown it fits with room to spare. (Small office files exaggerate the ratio — their fixed container overhead dominates — but the direction never flips: Markdown is the text, minus the plumbing.)

4 · Where Markdown is not the answer

Fairness requires one caveat. Deeply nested, machine-generated data — API payloads, config trees — is often better left as JSON, which models also read fluently; flattening it into prose can lose precision. Markdown wins for documents: things with headings, paragraphs, tables, figures and code, written for a reader. That is exactly the shape of lecture notes, reports, articles and notebooks — the things people actually paste into chat windows.

5 · The takeaway

Models answer from what they can parse, in a register they were trained to think in. Markdown is that register. Convert once, and every downstream use — pasting into a chat, uploading to an AI workspace, indexing for retrieval — starts from the model's native language instead of a guessing game.

See what your document looks like in the model's native format — converted locally, nothing uploaded.