Token budgeting 101 for document context

MakeItMarkdown · July 2026 · 6 min read

Context windows are quoted in tokens; your documents are sized in pages and megabytes; the exchange rate between them decides whether your paste fits, what it costs, and how much room the model has left to think. Most people budget with folklore. Here's the practical version.

1 · The estimate, and how it lies

The folk rule — one token ≈ 4 characters of English — is a fine starting estimate, and it's what our size line shows by default (≈ 438 tokens). But technical text tokenizes denser: code, pipes, underscores, numbers and non-English words split into more tokens per character. Converting our sample notebook: the chars/4 estimate said 438; the real o200k count is 542 — 24% more. On a 100K-token paste, that error is a whole section that didn't fit.

That's why the Markdown pane has a count exactly button: it loads a real tokenizer (o200k, the encoding family used by current frontier models) in your browser, once, and replaces the estimate. Budget with the exact number when the paste is large or the window is tight.

The Markdown pane's size note, before and after count exactly — the ≈438 estimate becomes the true 542 (o200k).

2 · Where document tokens actually hide

Hiding place	Typical cost	Cure
Base64 images in notebooks	100K+ tokens each	Extracted to `[Figure: …]` automatically
Notebook JSON plumbing	2–5× the code itself	Conversion strips it
Repeated page headers/footers in PDFs	hundreds per long doc	Flagged by structure; trim on review
Full-precision dumps of long outputs	thousands	Truncated with an explicit note (Chat preset truncates harder)
Webpage chrome	often > the article	Readability extraction
JSON key repetition (JSONL)	15–40% of the data	Records table

3 · Budgets by destination

One-shot chat paste: spend at most ~half the window on the document — the model needs the rest for your conversation and its own reasoning. The Chat preset (hard output truncation + token estimate up top) is built for this shape.
Workspaces (Projects/Spaces): per-question retrieval budgets are small (a handful of chunks), so what matters is chunk density, not file totals — see the three workspace wins.
RAG pipelines: budget per chunk (embedding-model limits are typically 512–8K tokens); the chunk comments in the RAG preset give your splitter clean units to count.
Agents: assume every document is read multiple times; token waste compounds — the agent-context guide.

4 · A five-minute budgeting routine

Convert the document; read the size line.
Click count exactly if the number matters.
Over budget? Switch to the Chat preset and re-count — output truncation is usually the biggest single lever.
Still over? Feed by section (headings make clean cut points) and tell the model what you omitted — an honest gap beats a silent one.

Check a real number now — estimate first, exact on click.

Token budgeting 101 for document context

1 · The estimate, and how it lies

Markdown 1.8K chars ≈ 438 tokens

Markdown 1,751 chars · 542 tokens (o200k)

2 · Where document tokens actually hide

3 · Budgets by destination

4 · A five-minute budgeting routine