Markdown for LLMs — the field manual

MakeItMarkdown · updated July 2026 · the rules our converter follows, usable by hand too

Markdown headed for a model's context window has one job: make the document's structure explicit in as few tokens as possible, without lying about what's missing. Everything below serves that sentence.

Syntax at a glance

# Document title
Source: original-file.docx · converted 2026-07

## Section heading
<a id="section-heading"></a>          <!-- pipelines only -->

| name (str) | units (int) | revenue (float) |
| --- | --- | --- |
| North | 1204 | 96432.10 |
Showing the first 50 of 10,000 rows.

[Figure: chart_1.png]

```python
compute()
```
**Output:**
```
(120, 4)
```

> Note: pages 4–6 are scans with no text layer — not included.

The ten rules

One H1, then a real hierarchy. Never skip levels, never fake a heading with bold text — retrieval chunks on headings, and bold paragraphs are invisible to it.
Tables are GFM pipes with a header row. Annotate ambiguous types (revenue (float), signup_date (date)) so serials and codes can't be misread.
State truncation in the text. Showing the first 50 of 10,000 rows. — a model told it has a sample answers like it has a sample; one not told invents the rest.
Figures become named placeholders. [Figure: chart_1.png]. Never base64 — one embedded screenshot ≈ 130,000 junk tokens.
Fence code, label the language, label outputs. ```python + **Output:** keep cause and effect apart; truncate long logs and say so.
Stable addresses for machine readers. <a id="slug"></a> anchors and  boundaries — citations that survive re-indexing.
Front matter only if the destination reads it. Obsidian and archives: yes. Chat paste: 15 wasted tokens.
Provenance in one line. Source: report-q1.docx · converted 2026-07 — enough for "where is this from?", no metadata block.
Delete boilerplate, never compress prose. Navigation, cookie banners, "12 of 96": token tax. The actual text: byte-faithful — models quote what exists.
Say what's missing. Scanned pages, dropped sheets, lost equations — one explicit line each. Honest gaps get worked around; silent gaps get papered over with inventions.

Per-destination adjustments

Destination	What changes
Chat paste	Token-lean: outputs truncated hard, token estimate up top, no front matter
RAG / pipelines	Chunk comments + stable anchors; tables whole inside sections
Obsidian vault	YAML front matter, callouts, `![[wikilink]]` figures
Archive / citation	Full front matter, faithful body, nothing extra truncated

These four are exactly the converter's output presets — same detected structure, different discipline per destination.

Or skip the hand-work: drop a file and get all ten rules applied, with a fidelity report of what was detected.

Deeper dives: why Markdown is the native register · tables · RAG anchors · Context Lab