Markdown is the best upload format for AI workspaces
Claude Projects. Perplexity Spaces. Custom assistants with knowledge files. Notebook-style research tools. Different products, same architecture: you upload reference files, the tool splits them into chunks, and when you ask a question it searches those chunks and feeds the best matches to the model alongside your prompt.
That architecture has three bottlenecks — storage caps, token budgets, and retrieval accuracy. The format you upload in touches all three, and Markdown wins all three. Not by a little.
The point: storage caps, token budgets, retrieval accuracy — three bottlenecks, one format change, all three measured below.
1 · Storage — same knowledge, a fraction of the cap
Workspace storage is capped (per-file limits, per-project totals). PDFs and Office files spend most of their bytes on things retrieval can't use: embedded fonts, themes, XML plumbing, image previews. Measured on our own built-in samples and test files:
| File | Original | As Markdown | Ratio |
|---|---|---|---|
| 2-page report (.pdf) | 26.6 KB | 0.9 KB | ~30× |
| Operations review (.docx) | 36.4 KB | 0.9 KB | ~40× |
| Data-analysis notebook (.ipynb, 100+ cells) | 3,465 KB | 261 KB | ~13× |
Tiny office files exaggerate the ratio — container overhead is fixed — but the direction holds at every size we've measured, because Markdown is the text minus the plumbing. In practice a workspace that was straining its cap with PDFs drops to a few percent after conversion, with the identical textual content.
/assets/media/blog/workspaces/01-space-storage.png ·
screenshot pair (or one combined image) of a real workspace's
storage/capacity indicator before and after replacing PDFs with the
converted .md files. This box is replaced by the image once the
file lands.2 · Tokens — denser chunks, more of them
When the workspace answers a question, it can only afford to stuff a limited number of tokens of retrieved context into the prompt. Every token spent on formatting debris — reconstructed line breaks, page headers repeated forty times, "Page 12 of 96" — is a token of your actual content that didn't make it in.
Markdown chunks are nearly all signal. The practical effects stack: each retrieved chunk carries more content, more distinct chunks fit per question, and long documents that flatly exceeded per-file token limits start fitting at all. Our stress-test notebook is the sharpest example: as raw .ipynb JSON it's roughly 3.5 MB and simply can't be attached to most chats; as Markdown it's ~64,000 tokens — large but usable, with cell addresses intact.
3 · Retrieval — headings are search handles
Chunk quality decides answer quality. Retrieval works best when
each chunk is a coherent topic with a descriptive label — and that is
precisely what Markdown headings produce. A chunk that begins
## Week 5 · Convergence conditions practically indexes
itself: the words your question will use are sitting in the chunk's
own header.
PDF-extracted text gives the chunker none of that. Splits happen at arbitrary character counts, mid-table and mid-sentence; the chunk that matches your question may begin halfway through a definition whose first half lives in another chunk. The model receives two half-thoughts and quotes neither correctly. Structure in, structure out — we saw this shift a small model from vague to precise in the lecture-notes case study.
/assets/media/blog/workspaces/02-cited-answer.png ·
screenshot of a workspace answer that cites/quotes a converted .md
file (source popover or inline citation visible), showing it pulled
the right section.4 · The recipe
- Convert: drag your files — multiple at once — into MakeItMarkdown. Conversion runs in your browser; nothing is uploaded to us.
- Check: skim each fidelity report. Warnings (truncated tables, scanned pages, layout loss) tell you what to double-check before you rely on a file.
- Upload: download the .zip and add the .md files to your Project / Space / knowledge base. Keep filenames stable so citations stay meaningful.
- Replace, don't duplicate: remove the original PDFs from the workspace — if both versions stay, retrieval sometimes surfaces the bad one.
5 · When to leave a format alone
Structured data your assistant should compute over — spreadsheets it will run code on, JSON an integration consumes — can stay native; convert a Markdown copy only for the reading-and-citing use case. And truly visual documents (design mockups, scanned forms) need their images, which text-first conversion won't carry. For everything meant to be read, quoted and searched, Markdown is the format your workspace wishes you'd uploaded.
Convert your workspace files in one drop — batch conversion, fidelity report per file, .zip out.