Markdown for LLMs — the field manual
Markdown headed for a model's context window has one job: make the document's structure explicit in as few tokens as possible, without lying about what's missing. Everything below serves that sentence.
Syntax at a glance
# Document title
Source: original-file.docx · converted 2026-07
## Section heading
<a id="section-heading"></a> <!-- pipelines only -->
| name (str) | units (int) | revenue (float) |
| --- | --- | --- |
| North | 1204 | 96432.10 |
Showing the first 50 of 10,000 rows.
[Figure: chart_1.png]
```python
compute()
```
**Output:**
```
(120, 4)
```
> Note: pages 4–6 are scans with no text layer — not included.
The ten rules
- One H1, then a real hierarchy. Never skip levels, never fake a heading with bold text — retrieval chunks on headings, and bold paragraphs are invisible to it.
- Tables are GFM pipes with a header row.
Annotate ambiguous types (
revenue (float),signup_date (date)) so serials and codes can't be misread. - State truncation in the text.
Showing the first 50 of 10,000 rows.— a model told it has a sample answers like it has a sample; one not told invents the rest. - Figures become named placeholders.
[Figure: chart_1.png]. Never base64 — one embedded screenshot ≈ 130,000 junk tokens. - Fence code, label the language, label outputs.
```python+**Output:**keep cause and effect apart; truncate long logs and say so. - Stable addresses for machine readers.
<a id="slug"></a>anchors and<!-- chunk: slug -->boundaries — citations that survive re-indexing. - Front matter only if the destination reads it. Obsidian and archives: yes. Chat paste: 15 wasted tokens.
- Provenance in one line.
Source: report-q1.docx · converted 2026-07— enough for "where is this from?", no metadata block. - Delete boilerplate, never compress prose. Navigation, cookie banners, "12 of 96": token tax. The actual text: byte-faithful — models quote what exists.
- Say what's missing. Scanned pages, dropped sheets, lost equations — one explicit line each. Honest gaps get worked around; silent gaps get papered over with inventions.
Per-destination adjustments
| Destination | What changes |
|---|---|
| Chat paste | Token-lean: outputs truncated hard, token estimate up top, no front matter |
| RAG / pipelines | Chunk comments + stable anchors; tables whole inside sections |
| Obsidian vault | YAML front matter, callouts, ![[wikilink]] figures |
| Archive / citation | Full front matter, faithful body, nothing extra truncated |
These four are exactly the converter's output presets — same detected structure, different discipline per destination.
Or skip the hand-work: drop a file and get all ten rules applied, with a fidelity report of what was detected.