Feeding emails to an LLM
Drag a message out of most mail clients and you get a
.eml — the raw wire format. It's text, technically.
It's also transfer encodings, MIME boundaries, an HTML copy of the
plain copy, and — the real token sink — the entire thread quoted
again inside every reply. A ten-message thread pastes as the first
message repeated ten times.
What's in the raw file
- Encoded headers —
=?UTF-8?B?…?=subjects that arrive as gibberish without RFC 2047 decoding. - MIME multipart plumbing — boundaries, content types, and base64/quoted-printable encodings wrapping the actual words.
- Two copies of the body — plain text and HTML, one of which your paste will duplicate.
- Quoted reply pyramids —
> > >chains that dwarf the new content.
The element mapping
| In the .eml | In the Markdown |
|---|---|
| From / To / Cc / Date / Subject | A compact Message section, encoded-words decoded |
| plain + HTML alternatives | The plain part, once (HTML converted only when it's all there is) |
| base64 / quoted-printable bodies | Decoded to readable text (em-dashes stop being =E2=80=94) |
| Long quoted chains | First lines kept, the rest cut with an explicit > … (N more quoted lines truncated) — the newest message is never touched |
| Attachments | Listed by name and size, honestly marked not extracted — convert the attachment file itself separately |
Before → after
In the file
Subject: =?UTF-8?B?UmU6IFZlbmRvciDigJQgbmV4dCBzdGVwcw==?=
Content-Transfer-Encoding: quoted-printable
Agreed =E2=80=94 let's lock it Friday.
> On Tue, Jordan wrote:
> > On Mon, Sam wrote:
> > > (…the whole thread again…)In the Markdown
## Message
- **Subject:** Re: Vendor — next steps
## Body
Agreed — let's lock it Friday.
> On Tue, Jordan wrote:
> … (23 more quoted lines truncated)Honest limits
- One message per .eml — mailbox archives (.mbox) aren't split yet; request it if you need it.
- Attachment contents aren't parsed from inside the email — drop the attachment file itself into the converter instead.
- Inline images become nothing (they're attachments by another name); the attachment list still records them.
FAQ
How do I get a .eml? Drag the message from your mail client to the desktop (most clients), or use "Save as" / "Show original → Download".
Why truncate quotes at all? Because the chain is the same text repeated — and because we say so in the output, the model knows history was elided rather than missing. That's the difference between lean and lossy.
Private correspondence? Local conversion, nothing uploaded — verifiably.
Try the sample thread — decoded subject, truncated quote pyramid, attachment listed.