When your AI workspace keeps missing half a document

MakeItMarkdown · July 2026 · 5 min read

The file is uploaded. The assistant reads it — sometimes. It answers from section 2 and acts like section 5 doesn't exist; it finds the topic when you use the document's exact words and misses it when you paraphrase. This isn't the model forgetting. It's retrieval returning the wrong chunks, and it has three usual causes.

1 · Chunks that don't align with topics

Workspaces split each file into chunks and search those, not the whole document. When the text arrives structureless (a PDF extraction, a headingless wall of prose), the splitter cuts by length — mid-topic, mid-table — and your question's best match is a fragment that starts halfway through the relevant idea. The model answers from the fragment; the rest of the topic sits in a neighboring chunk that didn't score high enough to be fetched.

Chunk windows that straddle topic boundaries return halves of two topics — neither answers the question.

Fix: give the splitter boundaries. Converted Markdown carries the document's own headings, and heading-aligned chunks are whole topics with searchable names. (Full mechanics: RAG-ready Markdown — the same logic applies inside hosted workspaces.)

2 · The duplicate-copy trap

You uploaded the PDF, later added a cleaner version, and never deleted the original. Now retrieval draws from both — and the glyph-soup copy sometimes outranks the clean one for exactly the queries you care about. Symptoms look random because they are.

Fix: one canonical version per document. If you convert, replace — upload the .md, remove the source file from the workspace (keep it in your own archive, of course).

3 · Ambiguous file identity

Ten files named notes.pdf, notes(1).pdf, final_v2.pdf give retrieval no way to respect "in the week-5 notes…" instructions, and citations come back unintelligible.

Fix: descriptive, stable filenames (week-05-convergence.md). Batch-converting with MakeItMarkdown keeps each output named after its source, so a disciplined source folder produces a disciplined workspace.

4 · The five-minute overhaul

Batch-drop the workspace's documents into the converter (multiple files at once; all local).
Check each fidelity report — sections detected ≥ 3 is the number that predicts good chunking; the QC breakdown shows it explicitly.
Upload the .md files; delete the originals from the workspace.
Re-ask the question that failed, paraphrased. Watch whether the citation lands in the right section.

This overhaul is exactly what turned a struggling small model into a precise one in our lecture-notes case study — retrieval quality was the whole story.

Rebuild your workspace's files in one batch drop — fidelity report per file.