Making Jupyter notebooks LLM-addressable
"Fix the error in cell 5" is how people actually talk about notebooks. Yet almost every notebook-to-text converter throws the cell structure away, flattening 60 cells into one scroll of code the model can neither address nor navigate. This article is a build log: what a notebook really contains, what LLMs need from it, and two things we got wrong before measuring.
1 · What a .ipynb actually is
A notebook is JSON: a list of cells, each with a type (code,
markdown), a source, outputs, and — critically — an
execution_count, the In[n] number you see
in Jupyter. That number is the notebook's honest confession: it
records the order cells actually ran, which after a normal
working session is not the order they appear. A cell reading
In[2] sitting below In[5] means the state
your outputs reflect never existed top-to-bottom.
Our converter emits every cell under an addressable header and keeps those confessions visible:
## Cell [5] · type:code · id:d4e5f6
> ⚠️ depends on: `df` (defined in Cell [3])
```python
summary = df.describe()
```
The dependency line comes from static analysis of assignments and imports — deliberately labeled an approximate hint, because regexes cannot see through re-assignment in branches or dynamically built code. Honesty about the method matters more than the method: an LLM told "hints, not dataflow" treats them accordingly.
2 · Two things measurement changed
1. Imports are ambient, not dataflow. Our first
pass drew a dependency edge for every name — which meant the import
cell pointed at nearly every other cell. A real 110-cell notebook
produced 341 edges and 227 warnings, mostly np and
torch. Treating imported names as ambient (listed once,
excluded from edges) dropped it to 169 edges and 13 warnings — every
survivor a genuine "this value came from a previous run"
signal.
2. The token wall was hiding in a markdown cell.
The same notebook converted to 771,000 characters — about 196K
tokens, too large for most contexts. Profiling the output found a
single markdown cell containing a pasted image as a base64 data URI:
515KB of iVBORw0KGgo… masquerading as documentation.
Extracting embedded images into real files with
[Figure: …] placeholders cut the output to 256K
characters — a 67% reduction from one fix. If your notebook "doesn't
fit," check for pasted screenshots before blaming the code.
| Stage | Output size | ≈ tokens |
|---|---|---|
| Naive conversion | 771K chars | ~196K |
| + import-aware hints | 771K chars | ~196K (but 94% fewer false warnings) |
| + embedded-image extraction | 256K chars | ~64K |
3 · The dependency mini-map
Cell dependencies also render as a small SVG arc diagram — nodes are cells in notebook order labeled with their execution counts, arcs carry the variable names, and out-of-order cells get an amber mark. Dense notebooks cap the view at the busiest 16 cells and say so. It exists because "your notebook ran out of order" lands harder as a picture than as a warning list.
4 · Checklist for feeding notebooks to a model
- Keep cell addresses (
Cell [n]) so instructions and answers can point somewhere. - Surface execution-order anomalies — they explain "why doesn't this reproduce" more often than the code does.
- Extract pasted images; they are token walls with no information for a text model.
- Truncate long outputs, never source. The model needs the code; it rarely needs 400 rows of a printed dataframe.
- State your analysis limits in the document itself.
Drop a notebook and read its fidelity report — cell map, dependency hints and all, locally in your browser.