A context workbench, not a file converter
MakeItMarkdown turns technical documents — Jupyter notebooks, Word files, spreadsheets, web articles, PDFs, subtitles — into Markdown that language models can actually use. Not "a Markdown file that opens without errors": Markdown with real headings, tables that survive as tables, figures that become explicit placeholders instead of base64 noise, and a fidelity report that tells you what was detected and what to double-check.
The distinction matters because most conversion tools optimize for human readers — they try to make the output look like the original. A model doesn't care how it looks. It cares whether the structure is explicit, whether the tokens are spent on content or on formatting debris, and whether anything silently vanished. That is what we optimize for.
How it works
Every format gets its own parser, but they all emit the same internal structure: title, sections, tables, figures, references — plus a fidelity record of what was detected along the way. Output presets (chat paste, RAG, Obsidian, archive) are thin formatters on top of that shared structure, so improving a parser improves every output at once.
Principles
Private by architecture, not by promise. The converter is a static site. There is no server to upload to — parsing runs in your browser's own JavaScript engine. You can verify this the blunt way: load the page once, turn off your network, and convert a file. It works offline, because nothing ever needed to leave your machine. Details in the privacy page.
Honest about loss. Every conversion ships with a fidelity report: counts of detected tables, figures, equations and code cells, explicit warnings, and a weighted eight-check structural quality score. We say "detected", never "preserved" — a converter that claims perfection is describing its blind spots.
Free to run, free to use. The whole product is static files on a CDN. No accounts, no quotas, no file-size upsell. The libraries we build on are MIT, BSD or Apache-2.0 licensed and credited in THIRD_PARTY_LICENSES.txt.
What it handles today
- .ipynb — addressable cells, execution-order warnings, approximate dependency hints, embedded images extracted to placeholders
- .docx — style-aware headings, tables as GFM, images as figure placeholders
- .pptx — slides as an addressable outline, speaker notes kept, charts confessed
- .xlsx / .csv / .tsv — column types annotated, explicit truncation, dates normalized
- .html — article extraction (navigation and ads stripped), links and figures kept
- .pdf — per-page text with honest layout warnings; scanned pages flagged instead of faked
- .json / .jsonl — structural outline plus record tables
- .tex — LaTeX sources: outline, fenced math, keyed citations (no macro expansion — stated honestly)
- .eml / .mbox — decoded headers, quoted chains truncated explicitly, attachments listed
- .srt / .vtt — timestamped transcript sections
- .md / .txt — normalized structure and a token overview
Want another format? Ask for it — a small sample file is the fastest path to support.
Who makes this
MakeItMarkdown is an independent project, developed in the open at github.com/L-Sangmin/makeitmarkdown. The test suite runs every parser against deliberately messy fixtures — ragged CSVs, out-of-order notebooks, corrupt archives — because real files are never clean.
Drop a file and read its fidelity report — everything runs in your browser.