Your notebook is too big to paste: beating the token limit
The point: your notebook isn't too big — its embedded output images are; the text a model can actually use fits comfortably.
"Message exceeds the maximum length." "File too large." Or the
upload works and the model clearly saw only the first third. Jupyter
notebooks hit input limits far before their content
justifies it — and the reason is what a .ipynb actually
stores.
1 · Where the megabytes live
A notebook file is JSON wrapping four things: your code, your prose, every output the cells ever produced, and metadata. The killers are outputs: each rendered chart is a base64-encoded PNG embedded as text — hundreds of KB each — and dataframe previews, progress bars and tracebacks pile up behind them. Metadata adds widget state and execution bookkeeping the model has no use for. In the notebook we use as a stress test, the meaningful text is about 7% of the bytes:
| Raw .ipynb | Converted Markdown | |
|---|---|---|
| Size | 3,465 KB | 261 KB |
| Fits a chat input? | No | Yes (~64K tokens) |
| Cell addresses | — | kept (Cell [7]) |
| Figures | base64 walls | [Figure: cell_5_output_1.png] placeholders |
One markdown cell in that notebook contained a single pasted screenshot worth ~130,000 tokens of base64 — a third of a large context window, spent on characters no model can even decode back into an image.
2 · What conversion keeps (this is the part that matters)
Shrinking is easy — jupyter nbconvert --to script
strips outputs too, and loses everything else you care about. The
point is what survives:
- Cell addresses — every cell keeps a stable
label (
Cell [12]), so you and the model can point at code precisely; - Outputs, truncated honestly — the first lines of each output stay, with an explicit truncation note (a model that sees the shape of a result reasons better than one that sees nothing);
- Execution-order warnings — if you ran cell 40 before cell 12, the conversion says so, which is often the very bug you're pasting the notebook to ask about;
- Dependency hints — "depends on
df(defined in Cell [2])", so questions about one cell don't require the model to re-derive the whole notebook. How that analysis works: Making Jupyter notebooks LLM-addressable.
3 · The fix
- Drop the .ipynb here — conversion is local, your unpublished analysis stays on your machine.
- Still tight on budget? Switch the preset to Chat — outputs truncate harder and a token estimate appears at the top. The Markdown pane's count exactly button gives you a real o200k token count before you paste.
- Paste, and ask your question with cell addresses ("why does Cell [12] change the result of Cell [40]?").
Try it on the sample notebook — or the giant one that keeps failing.