Real-world workflows · 02

Markdown is the best upload format for AI workspaces

Claude Projects. Perplexity Spaces. Custom assistants with knowledge files. Notebook-style research tools. Different products, same architecture: you upload reference files, the tool splits them into chunks, and when you ask a question it searches those chunks and feeds the best matches to the model alongside your prompt.

That architecture has three bottlenecks — storage caps, token budgets, and retrieval accuracy. The format you upload in touches all three, and Markdown wins all three. Not by a little.

The point: storage caps, token budgets, retrieval accuracy — three bottlenecks, one format change, all three measured below.

1 · Storage — same knowledge, a fraction of the cap

Workspace storage is capped (per-file limits, per-project totals). PDFs and Office files spend most of their bytes on things retrieval can't use: embedded fonts, themes, XML plumbing, image previews. Measured on our own built-in samples and test files:

FileOriginalAs MarkdownRatio
2-page report (.pdf)26.6 KB0.9 KB~30×
Operations review (.docx)36.4 KB0.9 KB~40×
Data-analysis notebook (.ipynb, 100+ cells)3,465 KB261 KB~13×

Tiny office files exaggerate the ratio — container overhead is fixed — but the direction holds at every size we've measured, because Markdown is the text minus the plumbing. In practice a workspace that was straining its cap with PDFs drops to a few percent after conversion, with the identical textual content.

📸 Media slot — save as /assets/media/blog/workspaces/01-space-storage.png · screenshot pair (or one combined image) of a real workspace's storage/capacity indicator before and after replacing PDFs with the converted .md files. This box is replaced by the image once the file lands.

2 · Tokens — denser chunks, more of them

When the workspace answers a question, it can only afford to stuff a limited number of tokens of retrieved context into the prompt. Every token spent on formatting debris — reconstructed line breaks, page headers repeated forty times, "Page 12 of 96" — is a token of your actual content that didn't make it in.

Markdown chunks are nearly all signal. The practical effects stack: each retrieved chunk carries more content, more distinct chunks fit per question, and long documents that flatly exceeded per-file token limits start fitting at all. Our stress-test notebook is the sharpest example: as raw .ipynb JSON it's roughly 3.5 MB and simply can't be attached to most chats; as Markdown it's ~64,000 tokens — large but usable, with cell addresses intact.

3 · Retrieval — headings are search handles

Chunk quality decides answer quality. Retrieval works best when each chunk is a coherent topic with a descriptive label — and that is precisely what Markdown headings produce. A chunk that begins ## Week 5 · Convergence conditions practically indexes itself: the words your question will use are sitting in the chunk's own header.

PDF-extracted text gives the chunker none of that. Splits happen at arbitrary character counts, mid-table and mid-sentence; the chunk that matches your question may begin halfway through a definition whose first half lives in another chunk. The model receives two half-thoughts and quotes neither correctly. Structure in, structure out — we saw this shift a small model from vague to precise in the lecture-notes case study.

📸 Media slot — save as /assets/media/blog/workspaces/02-cited-answer.png · screenshot of a workspace answer that cites/quotes a converted .md file (source popover or inline citation visible), showing it pulled the right section.

4 · The recipe

  1. Convert: drag your files — multiple at once — into MakeItMarkdown. Conversion runs in your browser; nothing is uploaded to us.
  2. Check: skim each fidelity report. Warnings (truncated tables, scanned pages, layout loss) tell you what to double-check before you rely on a file.
  3. Upload: download the .zip and add the .md files to your Project / Space / knowledge base. Keep filenames stable so citations stay meaningful.
  4. Replace, don't duplicate: remove the original PDFs from the workspace — if both versions stay, retrieval sometimes surfaces the bad one.

5 · When to leave a format alone

Structured data your assistant should compute over — spreadsheets it will run code on, JSON an integration consumes — can stay native; convert a Markdown copy only for the reading-and-citing use case. And truly visual documents (design mockups, scanned forms) need their images, which text-first conversion won't carry. For everything meant to be read, quoted and searched, Markdown is the format your workspace wishes you'd uploaded.

Convert your workspace files in one drop — batch conversion, fidelity report per file, .zip out.