Advanced · 06 · Roadmap

Where this goes next: a CLI, agent skills, and stronger local OCR

The point: the parsers are plain, dependency-light JavaScript that already runs in Node for every test — the browser is just their first surface. The next surfaces are your terminal, your agent, and better eyes for scans.

today browser tab + tesseract OCR next CLI · npx / pip fidelity JSON in CI later agent skill · local OCR models CPU/GPU same parsers, same honesty rules at every step
The order of operations — each surface reuses the parsers the tests already run in Node.

1 · An installable CLI

Every parser in this site runs in Node today (that's how the 99 tests work). Packaging them as a command-line tool is mostly plumbing, and it unlocks the batch cases a browser tab shouldn't own:

npx makeitmarkdown docs/*.pdf --preset rag --out docs-md/
# or: pip install makeitmarkdown  (a thin wrapper over the same engine)

2 · An agent skill

Coding agents (Claude Code, Codex-class assistants) hit unreadable files mid-task constantly — a .docx spec, a .xlsx of test cases, a PDF datasheet. A packaged skill lets the agent convert in-session instead of giving up:

3 · OCR beyond tesseract: local models, CPU and GPU

Today's opt-in OCR is tesseract — honest, local, decent on clean print. The ceiling above it is model-based recognition, and it can stay local:

TierEngineWhere it runsTrade
Todaytesseract.js (LSTM)Browser, CPU/WASM~8 MB, seconds/page; weak on tables & handwriting
NextCompact recognition models (ONNX)Browser, WebGPU when available, WASM fallbacktens of MB; markedly better layout & table recovery
With the CLIFull local modelsYour machine's CPU/GPU, no size ceilingBest quality; still zero upload

The rule that doesn't change: OCR output stays labelled approximate, opt-in, and QC-capped. Better eyes, same honesty.

4 · More languages than English

The interface and the Context Lab ship in English today. The converter itself is already language-neutral — headings, tables and code fences work the same in any script — but two places still assume English, and both are on the roadmap:

5 · What will never change

Which surface would you use first — CLI, agent skill, or stronger OCR? Votes steer the order.

Tell us on GitHub →