Format guide · .ipynb

Feeding Jupyter notebooks to an LLM

The notebook is the hardest general format we handle, and the one where conversion adds the most: a .ipynb is a JSON container in which your actual reasoning — code and prose — is buried under everything the notebook ever displayed.

What's inside a .ipynb

Four layers: code cells (source + an execution_count recording when you ran it), markdown cells (prose, sometimes with pasted images embedded as base64), outputs (every chart as an embedded PNG, every dataframe preview, every traceback), and metadata (kernel info, widget state). In real notebooks the outputs dominate: our stress-test file is 3.5 MB of which about 7% is meaningful text.

What breaks if you paste it raw

The element mapping

In the notebookIn the Markdown
Code cell, run as In [7]## Cell [7] · type:code · id:… + fenced ```python block
Never-run cellCell [p12] (position-based address)
Markdown cellVerbatim prose under its own cell header
Text output**Output:** block, truncated at 30 lines with an explicit note
Image output / pasted image[Figure: cell_7_output_1.png] placeholder; the image itself is extracted into the .zip
Decreasing execution countsAn execution-order warning in the fidelity report
Variable reuse across cells"depends on df (defined in Cell [2])" annotations
Kernel/widget metadataDropped (one-line overview keeps language + cell counts)

The dependency annotations come from static analysis of assignments and imports, and they're labelled what they are: approximate cell dependency hints. Imports are treated as ambient (a notebook that uses pd in 60 cells doesn't need 60 arrows). The design decisions and their failure cases: Making Jupyter notebooks LLM-addressable.

Before → after

In the file

{"cell_type":"code","execution_count":2,
 "source":["df = pd.read_csv(\"sales.csv\")\n","df.shape"],
 "outputs":[{"data":{"text/plain":["(120, 4)"]}}],
 "metadata":{"scrolled":true,"tags":[]}}

In the Markdown

## Cell [2] · type:code · id:bb22cc33

```python
df = pd.read_csv("sales.csv")
df.shape
```

**Output:**
```
(120, 4)
```

Honest limits

FAQ

Does it work on non-Python notebooks? Yes — the structure conversion is language-agnostic; only the dependency hints are Python-specific.

My notebook has cleared outputs — still worth converting? Yes: you keep addresses, structure and the much smaller paste; there are just no output blocks.

My analysis is unpublished. Where does the file go? Nowhere. Parsing runs in your browser — the site works offline. See privacy.

See the mapping live on a sample notebook — cell addresses, dependency hints, and the fidelity report.