Reference

Markdown for LLMs — the field manual

Markdown headed for a model's context window has one job: make the document's structure explicit in as few tokens as possible, without lying about what's missing. Everything below serves that sentence.

Syntax at a glance

# Document title
Source: original-file.docx · converted 2026-07

## Section heading
<a id="section-heading"></a>          <!-- pipelines only -->

| name (str) | units (int) | revenue (float) |
| --- | --- | --- |
| North | 1204 | 96432.10 |
Showing the first 50 of 10,000 rows.

[Figure: chart_1.png]

```python
compute()
```
**Output:**
```
(120, 4)
```

> Note: pages 4–6 are scans with no text layer — not included.

The ten rules

  1. One H1, then a real hierarchy. Never skip levels, never fake a heading with bold text — retrieval chunks on headings, and bold paragraphs are invisible to it.
  2. Tables are GFM pipes with a header row. Annotate ambiguous types (revenue (float), signup_date (date)) so serials and codes can't be misread.
  3. State truncation in the text. Showing the first 50 of 10,000 rows. — a model told it has a sample answers like it has a sample; one not told invents the rest.
  4. Figures become named placeholders. [Figure: chart_1.png]. Never base64 — one embedded screenshot ≈ 130,000 junk tokens.
  5. Fence code, label the language, label outputs. ```python + **Output:** keep cause and effect apart; truncate long logs and say so.
  6. Stable addresses for machine readers. <a id="slug"></a> anchors and <!-- chunk: slug --> boundaries — citations that survive re-indexing.
  7. Front matter only if the destination reads it. Obsidian and archives: yes. Chat paste: 15 wasted tokens.
  8. Provenance in one line. Source: report-q1.docx · converted 2026-07 — enough for "where is this from?", no metadata block.
  9. Delete boilerplate, never compress prose. Navigation, cookie banners, "12 of 96": token tax. The actual text: byte-faithful — models quote what exists.
  10. Say what's missing. Scanned pages, dropped sheets, lost equations — one explicit line each. Honest gaps get worked around; silent gaps get papered over with inventions.

Per-destination adjustments

DestinationWhat changes
Chat pasteToken-lean: outputs truncated hard, token estimate up top, no front matter
RAG / pipelinesChunk comments + stable anchors; tables whole inside sections
Obsidian vaultYAML front matter, callouts, ![[wikilink]] figures
Archive / citationFull front matter, faithful body, nothing extra truncated

These four are exactly the converter's output presets — same detected structure, different discipline per destination.

Or skip the hand-work: drop a file and get all ten rules applied, with a fidelity report of what was detected.