Format guide · .csv / .tsv

Feeding CSV files to an LLM

CSV is the format everyone assumes is trivial — it's "just commas". In the wild it's a family of dialects: semicolons in European locales, tabs from database exports, quoted fields with embedded commas and newlines, and rows whose column counts drift. Models read messy CSV the way you'd expect: confidently and inconsistently.

What breaks in a raw paste

The element mapping

In the fileIn the Markdown
Delimiter (comma/semicolon/tab/pipe)Auto-sniffed, then parsed with real RFC-4180 quoting rules
Header rowGFM header with inferred per-column types: region (str) · units (int) · revenue (float)
RowsFirst 50 as a GFM table + an explicit "first 50 of N" sentence
Ragged rowsPadded to rectangular; counted in a fidelity warning
Pipes in valuesEscaped so the table can't shear
OverviewRow/column counts up front, so the model knows the dataset's true size

Before → after

In the file

region;units;"revenue, gross";signup
North;12;"1.204,50";45922
South;;"980,00";45923

In the Markdown

| region (str) | units (int) | revenue, gross (str) | signup (int) |
| --- | --- | --- | --- |
| North | 12 | 1.204,50 | 45922 |
| South |  | 980,00 | 45923 |
1 ragged row was padded (see fidelity warnings).

Honest limits

FAQ

.tsv? Yes — tabs are one of the sniffed delimiters.

Huge files? Parsing happens in your browser's memory; hundreds of MB may be slow. The output stays small by design.

Why not JSON output for data? Per-row key repetition costs ~15–40% more tokens for flat tables — measured in LLMs speak Markdown.

Drop a CSV and check the typed header line — it's the piece models thank you for.