Advanced · 04

Token budgeting 101 for document context

Context windows are quoted in tokens; your documents are sized in pages and megabytes; the exchange rate between them decides whether your paste fits, what it costs, and how much room the model has left to think. Most people budget with folklore. Here's the practical version.

1 · The estimate, and how it lies

The folk rule — one token ≈ 4 characters of English — is a fine starting estimate, and it's what our size line shows by default (≈ 438 tokens). But technical text tokenizes denser: code, pipes, underscores, numbers and non-English words split into more tokens per character. Converting our sample notebook: the chars/4 estimate said 438; the real o200k count is 542 — 24% more. On a 100K-token paste, that error is a whole section that didn't fit.

That's why the Markdown pane has a count exactly button: it loads a real tokenizer (o200k, the encoding family used by current frontier models) in your browser, once, and replaces the estimate. Budget with the exact number when the paste is large or the window is tight.

The Markdown pane's size note, before and after count exactly — the ≈438 estimate becomes the true 542 (o200k).

2 · Where document tokens actually hide

Hiding placeTypical costCure
Base64 images in notebooks100K+ tokens eachExtracted to [Figure: …] automatically
Notebook JSON plumbing2–5× the code itselfConversion strips it
Repeated page headers/footers in PDFshundreds per long docFlagged by structure; trim on review
Full-precision dumps of long outputsthousandsTruncated with an explicit note (Chat preset truncates harder)
Webpage chromeoften > the articleReadability extraction
JSON key repetition (JSONL)15–40% of the dataRecords table

3 · Budgets by destination

4 · A five-minute budgeting routine

  1. Convert the document; read the size line.
  2. Click count exactly if the number matters.
  3. Over budget? Switch to the Chat preset and re-count — output truncation is usually the biggest single lever.
  4. Still over? Feed by section (headings make clean cut points) and tell the model what you omitted — an honest gap beats a silent one.

Check a real number now — estimate first, exact on click.