What a converter built for LLMs does differently
Search "convert docx to markdown" and you'll find dozens of tools. Most share an unstated goal: make the output look like the original, for a human eyeball. That goal is why their output quietly fails when the reader is a language model. A model doesn't see how the page looks. It sees a token stream — and it needs different things from that stream than your eyes need from a page.
MakeItMarkdown is built around that difference. Here's what changes, element by element.
1 · Headings: semantics from styles, not from font size
In a .docx, "Heading 2" is a style object, and we map style names
directly to ## levels. Converters that render the page
visually and then transcribe it tend to emit big bold paragraphs —
which a model reads as ordinary text. The document's outline, the
single most useful retrieval structure it has, evaporates. Style-aware
mapping keeps it: your section tree arrives as a real
#/##/### hierarchy.
What Word stores
"Results"
style: Heading 2
(bold · 14 pt · spacing before)What the model needs
## Results2 · Tables: kept rectangular, typed, and honest about truncation
Tables are where converted documents lie most. Merged cells, header rows demoted to data, thousands of rows silently clipped. Our table handling does three unusual things:
- Column types are annotated — a model told
revenue (float)stops treating "1,204" as a string with a comma in it. We wrote up the details in Tables LLMs can actually read. - Truncation is explicit. A 10,000-row CSV becomes the first 50 rows plus a line stating exactly that, so the model knows it is looking at a sample — instead of confidently "summarizing" 0.5% of your data as if it were all of it.
- Ragged rows are repaired and confessed — padded to rectangular, with a warning in the fidelity report.
3 · Figures: placeholders, never base64 walls
Embedded images are the silent token bomb. One notebook we tested carried a 515 KB base64 screenshot inside a markdown cell — pasted into a chat window, that's roughly 130,000 tokens of pure noise, a third of many context windows, spent on one image the model can't even decode from text. We extract every embedded image to a real file and leave an explicit, addressable marker in the text:
[Figure: cell_12_figure_1.png]
The model sees that a figure exists, where it sits in the document, and can refer to it by name. Your token budget goes to words.
4 · Notebooks: cells you can point at
A .ipynb is JSON wrapping code, outputs, images and metadata. Our
notebook parser gives every cell a stable address
(Cell [7]), flags execution-order anomalies — the
classic "ran cell 40 before cell 12" state that makes results
unreproducible — and adds approximate dependency hints between cells,
so a model can trace where a variable came from without you pasting
the whole notebook twice. The full build log, with real numbers from
a 100-cell notebook, is in
Making Jupyter
notebooks LLM-addressable.
5 · The fidelity report: "detected", never "preserved"
This is the part we consider non-negotiable, and the part most converters simply don't have. Every conversion here returns three panels: the original, the Markdown, and a fidelity report — counts of tables, figures, equations and code cells the parser detected, every warning it accumulated, and a weighted structural quality score.
The wording is deliberate. "Preserved" is a promise nobody converting real-world files can keep; "detected" is a measurement. If your document contains four tables and the report says two, you've learned something vital before the model hallucinated around the missing half. Silent loss becomes visible loss. That's the whole trust model of the product — and it's why the report sits on equal footing with the output itself.
/assets/media/blog/not-a-converter/fidelity-zoom.mp4 ·
10–15 s screen recording: drop the sample notebook, then zoom
(cursor-follow) onto the fidelity panel while the counters animate
up and a warning appears. Muted, looping. This box is replaced by
the clip once the file lands.6 · What we refuse to do
- Invent text for scanned pages. A PDF with no text layer gets a clear flag and a suggested next step — not made-up sentences.
- Claim 100% anything. Layout, equations and complex formatting degrade in measurable ways; the report exists to measure them.
- Touch your files. Parsing runs entirely in your browser — the site works offline, which is only possible because nothing is uploaded. See privacy.
7 · One structure, many targets
Because every parser emits the same internal structure, the output presets are cheap and consistent: token-lean Chat paste, RAG with chunk boundaries and stable anchors, Obsidian with callouts and wikilinks, and a faithful Archive with full frontmatter. Same detected structure, four disciplines of output — pick per destination, not per file.
Drop a file and read its fidelity report. If the report surprises you, that's the point.