Feeding Jupyter notebooks to an LLM

MakeItMarkdown · July 2026

The notebook is the hardest general format we handle, and the one where conversion adds the most: a .ipynb is a JSON container in which your actual reasoning — code and prose — is buried under everything the notebook ever displayed.

What's inside a .ipynb

Four layers: code cells (source + an execution_count recording when you ran it), markdown cells (prose, sometimes with pasted images embedded as base64), outputs (every chart as an embedded PNG, every dataframe preview, every traceback), and metadata (kernel info, widget state). In real notebooks the outputs dominate: our stress-test file is 3.5 MB of which about 7% is meaningful text.

What breaks if you paste it raw

The JSON wrapper spends tokens on "cell_type": plumbing around every line of your code.
Base64 images are token bombs — one pasted screenshot ≈ 130,000 junk tokens.
Cells have no names, so neither you nor the model can point at one.
Out-of-order execution — the notebook's most famous bug — is invisible in the raw file unless you know to compare execution counts.

The element mapping

In the notebook	In the Markdown
Code cell, run as `In [7]`	`## Cell [7] · type:code · id:…` + fenced ```python block
Never-run cell	`Cell [p12]` (position-based address)
Markdown cell	Verbatim prose under its own cell header
Text output	`Output:` block, truncated at 30 lines with an explicit note
Image output / pasted image	`[Figure: cell_7_output_1.png]` placeholder; the image itself is extracted into the .zip
Decreasing execution counts	An execution-order warning in the fidelity report
Variable reuse across cells	"depends on `df` (defined in Cell [2])" annotations
Kernel/widget metadata	Dropped (one-line overview keeps language + cell counts)

The dependency annotations come from static analysis of assignments and imports, and they're labelled what they are: approximate cell dependency hints. Imports are treated as ambient (a notebook that uses pd in 60 cells doesn't need 60 arrows). The design decisions and their failure cases: Making Jupyter notebooks LLM-addressable.

Before → after

In the file

{"cell_type":"code","execution_count":2,
 "source":["df = pd.read_csv(\"sales.csv\")\n","df.shape"],
 "outputs":[{"data":{"text/plain":["(120, 4)"]}}],
 "metadata":{"scrolled":true,"tags":[]}}

In the Markdown

## Cell [2] · type:code · id:bb22cc33

```python
df = pd.read_csv("sales.csv")
df.shape
```

**Output:**
```
(120, 4)
```

Honest limits

Dependency hints are regex-based, not real dataflow — branchy reassignments can fool them (the output says so).
Dependency analysis is Python-only; other-language notebooks convert fine but without hints.
Interactive widget state is dropped — it has no textual meaning.

FAQ

Does it work on non-Python notebooks? Yes — the structure conversion is language-agnostic; only the dependency hints are Python-specific.

My notebook has cleared outputs — still worth converting? Yes: you keep addresses, structure and the much smaller paste; there are just no output blocks.

My analysis is unpublished. Where does the file go? Nowhere. Parsing runs in your browser — the site works offline. See privacy.

See the mapping live on a sample notebook — cell addresses, dependency hints, and the fidelity report.