Tables LLMs can actually read
The point: a model can only be honest about a table it actually received — typed headers and explicit truncation make that possible.
Spreadsheets look like the easy case — they're already structured. Yet "summarize this CSV" produces some of the most confident hallucinations a model can make: invented columns, averaged text, totals of truncated data presented as totals of everything. Each of those failures traces to information the file had but the paste lost.
1 · The model never saw your schema
Raw CSV carries no types. 1188.00 might be revenue,
a zip code, or an ID; 2026-01-05 might be a date or a
version string. Humans infer from headers; models do too — and infer
wrongly. The fix costs a few lines: annotate every column with its
observed type before the data appears.
What the model received
North 1204 998
South 872 914With the schema attached
| region (str) | q1_units (int) | q2_units (int) |
| --- | --- | --- |
| North | 1204 | 998 |
| South | 872 | 914 |## Columns
- `order_id` — int
- `date` — date
- `region` — str
- `revenue` — float (2 empty)
Now "average revenue by region" has an anchor. The annotation also
surfaces dirty data honestly: mixed-type columns come out as
mostly int, and empty-cell counts stop the model from
averaging blanks as zeros.
2 · Silent truncation
Every chat interface truncates long pastes somewhere — the model then totals what survived and calls it the total. If you must cut (and for large sheets you must), cut explicitly:
| 1049 | 2026-03-30 | South | 897.75 |
… 250 more rows omitted (kept the first 50).
A model that reads "250 more rows omitted" answers "based on the first 50 rows…" — which is the correct answer. The information about what's missing is as valuable as the data that's present.
3 · Excel is not CSV with extra steps
Workbooks add three traps of their own:
- Dates are serial numbers. Naive extraction
yields
46027where you saw2026-01-05. Convert to ISO strings or the model will treat your dates as quantities. - Formulas vs. values.
=SUM(B2:B40)means nothing without the sheet. Emit the cached calculated value, and say once, up front, that formulas appear as their last computed results. - Multi-sheet blindness. A workbook's meaning is often split across sheets. One section per sheet — including an explicit "(empty sheet)" note — keeps the model from conflating Orders with Summary.
4 · Markdown pipes, the boring detail that breaks everything
GFM tables delimit cells with |. Any cell containing
a literal pipe — pipe|in|note happens constantly in log
exports — shears the row, shifting every subsequent cell one column
left. Escape pipes, flatten newlines inside cells, and your table
survives; skip it and the corruption is invisible until an answer is
wrong.
/assets/media/blog/tables/serials-vs-typed.png ·
side-by-side shot: the same sales.xlsx pasted raw into a chat
(dates as serials) vs. the converted Markdown with the Columns
section. One image, two panes. This box is replaced by the image
once the file lands.5 · The checklist
- Types annotated per column, empties counted
- Row counts stated; truncation explicit, never silent
- Dates as ISO strings, formulas as cached values (and say so)
- One section per sheet; empty sheets noted
- Pipes escaped, in-cell newlines flattened
- A fidelity line at the end: what was detected, what was cut
Drop a .csv or .xlsx and get exactly this shape — typed columns, honest truncation, per-sheet sections. Locally, in your browser.