Tables LLMs can actually read

MakeItMarkdown · July 2026 · 6 min read

The point: a model can only be honest about a table it actually received — typed headers and explicit truncation make that possible.

Spreadsheets look like the easy case — they're already structured. Yet "summarize this CSV" produces some of the most confident hallucinations a model can make: invented columns, averaged text, totals of truncated data presented as totals of everything. Each of those failures traces to information the file had but the paste lost.

1 · The model never saw your schema

Raw CSV carries no types. 1188.00 might be revenue, a zip code, or an ID; 2026-01-05 might be a date or a version string. Humans infer from headers; models do too — and infer wrongly. The fix costs a few lines: annotate every column with its observed type before the data appears.

What the model received

North 1204 998
South 872 914

With the schema attached

| region (str) | q1_units (int) | q2_units (int) |
| --- | --- | --- |
| North | 1204 | 998 |
| South | 872 | 914 |

## Columns
- `order_id` — int
- `date` — date
- `region` — str
- `revenue` — float (2 empty)

Now "average revenue by region" has an anchor. The annotation also surfaces dirty data honestly: mixed-type columns come out as mostly int, and empty-cell counts stop the model from averaging blanks as zeros.

2 · Silent truncation

Every chat interface truncates long pastes somewhere — the model then totals what survived and calls it the total. If you must cut (and for large sheets you must), cut explicitly:

| 1049 | 2026-03-30 | South | 897.75 |

… 250 more rows omitted (kept the first 50).

A model that reads "250 more rows omitted" answers "based on the first 50 rows…" — which is the correct answer. The information about what's missing is as valuable as the data that's present.

3 · Excel is not CSV with extra steps

Workbooks add three traps of their own:

Dates are serial numbers. Naive extraction yields 46027 where you saw 2026-01-05. Convert to ISO strings or the model will treat your dates as quantities.
Formulas vs. values. =SUM(B2:B40) means nothing without the sheet. Emit the cached calculated value, and say once, up front, that formulas appear as their last computed results.
Multi-sheet blindness. A workbook's meaning is often split across sheets. One section per sheet — including an explicit "(empty sheet)" note — keeps the model from conflating Orders with Summary.

4 · Markdown pipes, the boring detail that breaks everything

GFM tables delimit cells with |. Any cell containing a literal pipe — pipe|in|note happens constantly in log exports — shears the row, shifting every subsequent cell one column left. Escape pipes, flatten newlines inside cells, and your table survives; skip it and the corruption is invisible until an answer is wrong.

📸 Media slot — save as /assets/media/blog/tables/serials-vs-typed.png · side-by-side shot: the same sales.xlsx pasted raw into a chat (dates as serials) vs. the converted Markdown with the Columns section. One image, two panes. This box is replaced by the image once the file lands.

5 · The checklist

Types annotated per column, empties counted
Row counts stated; truncation explicit, never silent
Dates as ISO strings, formulas as cached values (and say so)
One section per sheet; empty sheets noted
Pipes escaped, in-cell newlines flattened
A fidelity line at the end: what was detected, what was cut

Drop a .csv or .xlsx and get exactly this shape — typed columns, honest truncation, per-sheet sections. Locally, in your browser.