When the model invents numbers from your data
The point: models don't confess to missing rows — silent truncation and headerless values invite confident invention.
You ask for the total, the model gives one — and it's wrong. Not wildly wrong, which would be easy to catch, but plausibly wrong: close to the real figure, confidently stated, sometimes with a fake breakdown attached. Before blaming the model's arithmetic, check what it was actually shown. Two input defects cause most invented numbers.
1 · Silent truncation
Upload paths and extractors cap what they pass to the model — by characters, rows, or pages — and rarely say so. The model receives 400 of your 4,812 rows with no marker that anything is missing, so when you ask for "total revenue" it sums what it has, or worse, senses the data looks partial and extrapolates. Both produce a specific, wrong number.
The cure is stating the cut in the text itself:
Showing the first 50 of 4,812 rows.
A model that knows it has a sample answers like it has a sample: "based on the first 50 rows…" — which is exactly the honesty you want, and it will usually ask for the rest or suggest computing it properly. Every table MakeItMarkdown truncates carries this sentence automatically.
2 · Values with no header context
When a paste or extraction drops the header row (merged title
cells are a common cause — see why tables mangle in
chat), the model faces bare columns of numbers and infers what
they mean from magnitude and position. Serial dates
(45922) get read as amounts; IDs get averaged; units
and revenue swap. The inventions here aren't arithmetic errors at
all — they're semantic guesses that happen to be numbers.
Typed headers close the gap:
| month (date) | units (int) | revenue (float) |
3 · The two-minute audit
- Convert the actual file with the converter and look at the Markdown pane: is every table rectangular, headed, typed, and explicitly truncated?
- Check the fidelity report: tables detected should match what you know the document contains. A missing table means the model would have answered numeric questions from prose — a classic invention source. The report's weighted QC breakdown shows exactly which structural check failed.
- Re-ask with the Markdown as input, and ask the model to cite the rows it used.
4 · What this can't fix
Clean input removes the induced hallucinations. Models still make genuine arithmetic slips on long columns — for real computation, ask for the method ("write the pandas expression") rather than the result, or use a tool-enabled assistant. Structure gets you truthful reading; it doesn't make a language model a calculator.
Audit the file the model keeps misquoting — typed tables, explicit truncation, and a report of what was detected.