Making Jupyter notebooks LLM-addressable

MakeItMarkdown · July 2026 · 8 min read

"Fix the error in cell 5" is how people actually talk about notebooks. Yet almost every notebook-to-text converter throws the cell structure away, flattening 60 cells into one scroll of code the model can neither address nor navigate. This article is a build log: what a notebook really contains, what LLMs need from it, and two things we got wrong before measuring.

1 · What a .ipynb actually is

A notebook is JSON: a list of cells, each with a type (code, markdown), a source, outputs, and — critically — an execution_count, the In[n] number you see in Jupyter. That number is the notebook's honest confession: it records the order cells actually ran, which after a normal working session is not the order they appear. A cell reading In[2] sitting below In[5] means the state your outputs reflect never existed top-to-bottom.

Our converter emits every cell under an addressable header and keeps those confessions visible:

## Cell [5] · type:code · id:d4e5f6

> ⚠️ depends on: `df` (defined in Cell [3])

```python
summary = df.describe()
```

The dependency line comes from static analysis of assignments and imports — deliberately labeled an approximate hint, because regexes cannot see through re-assignment in branches or dynamically built code. Honesty about the method matters more than the method: an LLM told "hints, not dataflow" treats them accordingly.

2 · Two things measurement changed

1. Imports are ambient, not dataflow. Our first pass drew a dependency edge for every name — which meant the import cell pointed at nearly every other cell. A real 110-cell notebook produced 341 edges and 227 warnings, mostly np and torch. Treating imported names as ambient (listed once, excluded from edges) dropped it to 169 edges and 13 warnings — every survivor a genuine "this value came from a previous run" signal.

2. The token wall was hiding in a markdown cell. The same notebook converted to 771,000 characters — about 196K tokens, too large for most contexts. Profiling the output found a single markdown cell containing a pasted image as a base64 data URI: 515KB of iVBORw0KGgo… masquerading as documentation. Extracting embedded images into real files with [Figure: …] placeholders cut the output to 256K characters — a 67% reduction from one fix. If your notebook "doesn't fit," check for pasted screenshots before blaming the code.

Stage	Output size	≈ tokens
Naive conversion	771K chars	~196K
+ import-aware hints	771K chars	~196K (but 94% fewer false warnings)
+ embedded-image extraction	256K chars	~64K

3 · The dependency mini-map

Cell dependencies also render as a small SVG arc diagram — nodes are cells in notebook order labeled with their execution counts, arcs carry the variable names, and out-of-order cells get an amber mark. Dense notebooks cap the view at the busiest 16 cells and say so. It exists because "your notebook ran out of order" lands harder as a picture than as a warning list.

The dependency mini-map for the out-of-order sample: the out-of-order cell flagged amber, variables riding the arcs.

4 · Checklist for feeding notebooks to a model

Keep cell addresses (Cell [n]) so instructions and answers can point somewhere.
Surface execution-order anomalies — they explain "why doesn't this reproduce" more often than the code does.
Extract pasted images; they are token walls with no information for a text model.
Truncate long outputs, never source. The model needs the code; it rarely needs 400 rows of a printed dataframe.
State your analysis limits in the document itself.

Drop a notebook and read its fidelity report — cell map, dependency hints and all, locally in your browser.