Format guide · .docx

Feeding Word documents to an LLM

A .docx is not a document — it's a ZIP archive of XML files describing one: content in one part, styles in another, images in a media folder, plus themes, fonts and settings. That architecture is why a 3-page memo weighs 36 KB and why naive extraction loses precisely the things a model needs.

What breaks if you paste or extract it raw

The element mapping

In the .docxIn the Markdown
Title / Heading 1–4 styles# / ## / ### — a real outline, from the style objects
Bold / italic runs**bold** / *italic*
Bulleted / numbered listsMarkdown lists, nesting kept
TablesGFM pipe tables; first row promoted to a real header when Word didn't mark one
Embedded images[Figure: …] placeholder in place; image file extracted into the .zip's figures/
HyperlinksInline Markdown links
Headers, footers, page numbersDropped — page furniture, not content

Before → after

In the file

<w:p><w:pPr><w:pStyle w:val="Heading2"/></w:pPr>
  <w:r><w:t>Regional performance</w:t></w:r></w:p>
<w:tbl><w:tr><w:tc><w:p><w:r><w:t>Region</w:t></w:r>…

In the Markdown

## Regional performance

| Region | Units shipped | Avg delivery (days) |
| --- | --- | --- |
| North | 18,420 | 2.1 |

Honest limits

FAQ

Where do the images go? Into the downloaded .zip, next to the Markdown, named as the placeholders say — workspace uploads can include or skip them.

Does it read password-protected files? No — remove protection first; the parser sees only encrypted bytes.

Confidential documents? Conversion is local to your browser; nothing is uploaded. Verify by converting offline. See privacy.

Watch a styled Word report become an outlined, table-intact Markdown file.