Feeding slide decks to an LLM
Slide decks are the most common "please summarize this" upload
after PDFs — and they're PDFs' close cousin in hostility: a
.pptx is a ZIP of XML where meaning lives in visual
arrangement, not document structure. What can be recovered
is the deck's skeleton: titles, bullet hierarchies, tables, and the
one part most people forget exists — speaker notes, which often
contain the actual argument the slides only gesture at.
What breaks if you upload a deck raw
- Platform extractors flatten slides into an unordered word soup — titles indistinguishable from bullets, tables destroyed.
- Slide numbers, footers and dates repeat across every slide and pollute the token budget.
- Speaker notes are usually dropped entirely.
The element mapping
| In the deck | In the Markdown |
|---|---|
| Each slide | ## Slide N · {title} — an addressable outline |
| Bullet text | Markdown lists with indent levels kept |
| Tables | GFM pipe tables |
| Images | [Figure: slide_3_image_1.png] placeholder; the image file is extracted into the .zip |
| Speaker notes | Kept — **Speaker notes:** … under the slide |
| Charts | Counted and confessed — chart data lives in embedded worksheets and is not extracted (an honest warning, not fake numbers) |
| Slide numbers, footers, dates | Dropped — furniture, not content |
Before → after
In the file
<p:sp><p:nvSpPr><p:nvPr><p:ph type="title"/></p:nvPr>…
<a:p><a:r><a:t>Priorities</a:t></a:r></a:p>
<a:p><a:pPr lvl="1"/><a:r><a:t>Finish contract review</a:t>…In the Markdown
## Slide 2 · Priorities
- Ship the vendor consolidation
- Finish contract review
**Speaker notes:** Emphasize the onboarding metric.Honest limits
- Within-slide reading order is approximate. Shapes are read in file order, which usually — not always — matches visual order. The fidelity report says so on every deck.
- Diagram-as-shapes decks (boxes and arrows drawn in the editor) flatten to their text fragments; the spatial relationships are the casualty.
- .ppt (legacy binary) isn't supported — resave as .pptx first.
FAQ
Why do speaker notes matter so much? Slides are prompts for a talk; notes are the talk. For "summarize this deck" questions, the notes routinely carry more answerable content than the slides.
Charts? You get an explicit [Chart on slide N — data not extracted] marker. If the numbers matter, export the chart's source table to .xlsx and convert that alongside.
Confidential decks? Local conversion, nothing uploaded — verifiably.
Try the sample deck — outline, table, figure and speaker notes in one conversion.