What a converter built for LLMs does differently

MakeItMarkdown · July 2026 · 8 min read

Search "convert docx to markdown" and you'll find dozens of tools. Most share an unstated goal: make the output look like the original, for a human eyeball. That goal is why their output quietly fails when the reader is a language model. A model doesn't see how the page looks. It sees a token stream — and it needs different things from that stream than your eyes need from a page.

MakeItMarkdown is built around that difference. Here's what changes, element by element.

1 · Headings: semantics from styles, not from font size

In a .docx, "Heading 2" is a style object, and we map style names directly to ## levels. Converters that render the page visually and then transcribe it tend to emit big bold paragraphs — which a model reads as ordinary text. The document's outline, the single most useful retrieval structure it has, evaporates. Style-aware mapping keeps it: your section tree arrives as a real #/##/### hierarchy.

What Word stores

"Results"
  style: Heading 2
  (bold · 14 pt · spacing before)

What the model needs

## Results

2 · Tables: kept rectangular, typed, and honest about truncation

Tables are where converted documents lie most. Merged cells, header rows demoted to data, thousands of rows silently clipped. Our table handling does three unusual things:

Column types are annotated — a model told revenue (float) stops treating "1,204" as a string with a comma in it. We wrote up the details in Tables LLMs can actually read.
Truncation is explicit. A 10,000-row CSV becomes the first 50 rows plus a line stating exactly that, so the model knows it is looking at a sample — instead of confidently "summarizing" 0.5% of your data as if it were all of it.
Ragged rows are repaired and confessed — padded to rectangular, with a warning in the fidelity report.

3 · Figures: placeholders, never base64 walls

Embedded images are the silent token bomb. One notebook we tested carried a 515 KB base64 screenshot inside a markdown cell — pasted into a chat window, that's roughly 130,000 tokens of pure noise, a third of many context windows, spent on one image the model can't even decode from text. We extract every embedded image to a real file and leave an explicit, addressable marker in the text:

[Figure: cell_12_figure_1.png]

The model sees that a figure exists, where it sits in the document, and can refer to it by name. Your token budget goes to words.

4 · Notebooks: cells you can point at

A .ipynb is JSON wrapping code, outputs, images and metadata. Our notebook parser gives every cell a stable address (Cell [7]), flags execution-order anomalies — the classic "ran cell 40 before cell 12" state that makes results unreproducible — and adds approximate dependency hints between cells, so a model can trace where a variable came from without you pasting the whole notebook twice. The full build log, with real numbers from a 100-cell notebook, is in Making Jupyter notebooks LLM-addressable.

5 · The fidelity report: "detected", never "preserved"

This is the part we consider non-negotiable, and the part most converters simply don't have. Every conversion here returns three panels: the original, the Markdown, and a fidelity report — counts of tables, figures, equations and code cells the parser detected, every warning it accumulated, and a weighted structural quality score.

✓Sectioned structure×2 — headings split the document into addressable sections
✓Elements recovered×2 — 5 of 5 detected element(s) made it into the output
△No loss reported — 1 warning(s) report dropped or truncated content (75%)

5code cells detected

0figures detected

0tables detected

0equations detected

The fidelity report with the QC breakdown expanded — detected counts below.

The wording is deliberate. "Preserved" is a promise nobody converting real-world files can keep; "detected" is a measurement. If your document contains four tables and the report says two, you've learned something vital before the model hallucinated around the missing half. Silent loss becomes visible loss. That's the whole trust model of the product — and it's why the report sits on equal footing with the output itself.

🎬 Media slot — save as /assets/media/blog/not-a-converter/fidelity-zoom.mp4 · 10–15 s screen recording: drop the sample notebook, then zoom (cursor-follow) onto the fidelity panel while the counters animate up and a warning appears. Muted, looping. This box is replaced by the clip once the file lands.

6 · What we refuse to do

Invent text for scanned pages. A PDF with no text layer gets a clear flag and a suggested next step — not made-up sentences.
Claim 100% anything. Layout, equations and complex formatting degrade in measurable ways; the report exists to measure them.
Touch your files. Parsing runs entirely in your browser — the site works offline, which is only possible because nothing is uploaded. See privacy.

7 · One structure, many targets

Because every parser emits the same internal structure, the output presets are cheap and consistent: token-lean Chat paste, RAG with chunk boundaries and stable anchors, Obsidian with callouts and wikilinks, and a faithful Archive with full frontmatter. Same detected structure, four disciplines of output — pick per destination, not per file.

Drop a file and read its fidelity report. If the report surprises you, that's the point.