Where this goes next: a CLI, agent skills, and stronger local OCR

MakeItMarkdown · July 2026 · roadmap — plans, not promises

The point: the parsers are plain, dependency-light JavaScript that already runs in Node for every test — the browser is just their first surface. The next surfaces are your terminal, your agent, and better eyes for scans.

The order of operations — each surface reuses the parsers the tests already run in Node.

1 · An installable CLI

Every parser in this site runs in Node today (that's how the 99 tests work). Packaging them as a command-line tool is mostly plumbing, and it unlocks the batch cases a browser tab shouldn't own:

npx makeitmarkdown docs/*.pdf --preset rag --out docs-md/
# or: pip install makeitmarkdown  (a thin wrapper over the same engine)

Whole-folder conversion into the docs-md/ mirror pattern, scriptable in CI.
The same fidelity report, as JSON on stdout — pipelines can gate on QC score ("reject sources under 0.8").
Same honesty rules: detected-not-preserved, explicit truncation, no network calls.

2 · An agent skill

Coding agents (Claude Code, Codex-class assistants) hit unreadable files mid-task constantly — a .docx spec, a .xlsx of test cases, a PDF datasheet. A packaged skill lets the agent convert in-session instead of giving up:

The agent detects an unreadable format, invokes the skill, and reads the Markdown — with the fidelity warnings attached, so it knows what not to trust (agents act on their input; honest gaps matter more for them than for you).
Same deterministic output as the site, so converted docs are diff-stable across sessions.

3 · OCR beyond tesseract: local models, CPU and GPU

Today's opt-in OCR is tesseract — honest, local, decent on clean print. The ceiling above it is model-based recognition, and it can stay local:

Tier	Engine	Where it runs	Trade
Today	tesseract.js (LSTM)	Browser, CPU/WASM	~8 MB, seconds/page; weak on tables & handwriting
Next	Compact recognition models (ONNX)	Browser, WebGPU when available, WASM fallback	tens of MB; markedly better layout & table recovery
With the CLI	Full local models	Your machine's CPU/GPU, no size ceiling	Best quality; still zero upload

The rule that doesn't change: OCR output stays labelled approximate, opt-in, and QC-capped. Better eyes, same honesty.

4 · More languages than English

The interface and the Context Lab ship in English today. The converter itself is already language-neutral — headings, tables and code fences work the same in any script — but two places still assume English, and both are on the roadmap:

The site. English stays the default; Korean comes first after it (the maintainer's mother tongue), further languages as demand shows up.
OCR. The opt-in pass currently loads the English model only. The plan: detect the document's script, fetch the matching tesseract language pack, and record which language was used in the fidelity report — so a Korean scan isn't read with English eyes and quietly garbled.

5 · What will never change

Local first. Files don't leave your machine — browser, CLI or agent alike.
One schema. Every surface emits the same structure, so presets and fidelity reports carry over.
Detected, never "preserved".

Which surface would you use first — CLI, agent skill, or stronger OCR? Votes steer the order.

Tell us on GitHub →