Where this goes next: a CLI, agent skills, and stronger local OCR
The point: the parsers are plain, dependency-light JavaScript that already runs in Node for every test — the browser is just their first surface. The next surfaces are your terminal, your agent, and better eyes for scans.
1 · An installable CLI
Every parser in this site runs in Node today (that's how the 99 tests work). Packaging them as a command-line tool is mostly plumbing, and it unlocks the batch cases a browser tab shouldn't own:
npx makeitmarkdown docs/*.pdf --preset rag --out docs-md/
# or: pip install makeitmarkdown (a thin wrapper over the same engine)
- Whole-folder conversion into the docs-md/ mirror pattern, scriptable in CI.
- The same fidelity report, as JSON on stdout — pipelines can gate on QC score ("reject sources under 0.8").
- Same honesty rules: detected-not-preserved, explicit truncation, no network calls.
2 · An agent skill
Coding agents (Claude Code, Codex-class assistants) hit unreadable files mid-task constantly — a .docx spec, a .xlsx of test cases, a PDF datasheet. A packaged skill lets the agent convert in-session instead of giving up:
- The agent detects an unreadable format, invokes the skill, and reads the Markdown — with the fidelity warnings attached, so it knows what not to trust (agents act on their input; honest gaps matter more for them than for you).
- Same deterministic output as the site, so converted docs are diff-stable across sessions.
3 · OCR beyond tesseract: local models, CPU and GPU
Today's opt-in OCR is tesseract — honest, local, decent on clean print. The ceiling above it is model-based recognition, and it can stay local:
| Tier | Engine | Where it runs | Trade |
|---|---|---|---|
| Today | tesseract.js (LSTM) | Browser, CPU/WASM | ~8 MB, seconds/page; weak on tables & handwriting |
| Next | Compact recognition models (ONNX) | Browser, WebGPU when available, WASM fallback | tens of MB; markedly better layout & table recovery |
| With the CLI | Full local models | Your machine's CPU/GPU, no size ceiling | Best quality; still zero upload |
The rule that doesn't change: OCR output stays labelled approximate, opt-in, and QC-capped. Better eyes, same honesty.
4 · More languages than English
The interface and the Context Lab ship in English today. The converter itself is already language-neutral — headings, tables and code fences work the same in any script — but two places still assume English, and both are on the roadmap:
- The site. English stays the default; Korean comes first after it (the maintainer's mother tongue), further languages as demand shows up.
- OCR. The opt-in pass currently loads the English model only. The plan: detect the document's script, fetch the matching tesseract language pack, and record which language was used in the fidelity report — so a Korean scan isn't read with English eyes and quietly garbled.
5 · What will never change
- Local first. Files don't leave your machine — browser, CLI or agent alike.
- One schema. Every surface emits the same structure, so presets and fidelity reports carry over.
- Detected, never "preserved".
Which surface would you use first — CLI, agent skill, or stronger OCR? Votes steer the order.
Tell us on GitHub →