Token budgeting 101 for document context
Context windows are quoted in tokens; your documents are sized in pages and megabytes; the exchange rate between them decides whether your paste fits, what it costs, and how much room the model has left to think. Most people budget with folklore. Here's the practical version.
1 · The estimate, and how it lies
The folk rule — one token ≈ 4 characters of English — is a fine
starting estimate, and it's what our size line shows by default
(≈ 438 tokens). But technical text tokenizes
denser: code, pipes, underscores, numbers and non-English words
split into more tokens per character. Converting our sample
notebook: the chars/4 estimate said 438; the real
o200k count is 542 — 24% more. On a 100K-token
paste, that error is a whole section that didn't fit.
That's why the Markdown pane has a count exactly button: it loads a real tokenizer (o200k, the encoding family used by current frontier models) in your browser, once, and replaces the estimate. Budget with the exact number when the paste is large or the window is tight.
2 · Where document tokens actually hide
| Hiding place | Typical cost | Cure |
|---|---|---|
| Base64 images in notebooks | 100K+ tokens each | Extracted to [Figure: …] automatically |
| Notebook JSON plumbing | 2–5× the code itself | Conversion strips it |
| Repeated page headers/footers in PDFs | hundreds per long doc | Flagged by structure; trim on review |
| Full-precision dumps of long outputs | thousands | Truncated with an explicit note (Chat preset truncates harder) |
| Webpage chrome | often > the article | Readability extraction |
| JSON key repetition (JSONL) | 15–40% of the data | Records table |
3 · Budgets by destination
- One-shot chat paste: spend at most ~half the window on the document — the model needs the rest for your conversation and its own reasoning. The Chat preset (hard output truncation + token estimate up top) is built for this shape.
- Workspaces (Projects/Spaces): per-question retrieval budgets are small (a handful of chunks), so what matters is chunk density, not file totals — see the three workspace wins.
- RAG pipelines: budget per chunk (embedding-model limits are typically 512–8K tokens); the chunk comments in the RAG preset give your splitter clean units to count.
- Agents: assume every document is read multiple times; token waste compounds — the agent-context guide.
4 · A five-minute budgeting routine
- Convert the document; read the size line.
- Click count exactly if the number matters.
- Over budget? Switch to the Chat preset and re-count — output truncation is usually the biggest single lever.
- Still over? Feed by section (headings make clean cut points) and tell the model what you omitted — an honest gap beats a silent one.
Check a real number now — estimate first, exact on click.