Format guide · .xlsx

Feeding Excel workbooks to an LLM

Spreadsheets look like tables but store something stranger: typed cells whose displayed value and stored value differ. That gap — dates stored as day-counts, formulas stored as expressions, layout stored as merges — is where models get systematically misled.

What breaks if you hand it over raw

The element mapping

In the workbookIn the Markdown
Each sheetIts own ## section, in workbook order
Data rangeGFM table with type-annotated headersunits (int), month (date)
Date cellsISO dates (2025-09-22), never serials
Formula cellsThe cached computed value, with a note that values are cached
Rows beyond 50Truncated with an explicit "first 50 of N" sentence
Ragged/merged regionsPadded to rectangular + a fidelity warning
Empty sheetsNamed and flagged, not silently skipped
Charts, pivot caches, stylingNot extracted — values only (the report says what was detected)

Why the type annotations matter more than anything else here: Tables LLMs can actually read.

Before → after

In the file

A2: 45922            (formatted as 2025-09-22)
B2: =SUM(B3:B14)     (cached value 79)
C1: "Revenue
     (EUR)"           (merged across C1:D1)

In the Markdown

| month (date) | total (int) | revenue_eur (float) |
| --- | --- | --- |
| 2025-09-22 | 79 | 96432.10 |
Values are cached formula results from the last save.

Honest limits

FAQ

Legacy .xls? Yes — both the modern ZIP format and the legacy binary are read (each is verified by its magic bytes first, so mislabeled files fail loudly, not weirdly).

Several sheets, one question? Every sheet is a titled section, so you can tell the model "use the Summary section".

Financial data privacy? Local conversion, nothing uploaded — verifiably.

Convert a workbook and inspect the typed headers and the cached-value note.