Format guide · .eml

Feeding emails to an LLM

Drag a message out of most mail clients and you get a .eml — the raw wire format. It's text, technically. It's also transfer encodings, MIME boundaries, an HTML copy of the plain copy, and — the real token sink — the entire thread quoted again inside every reply. A ten-message thread pastes as the first message repeated ten times.

What's in the raw file

The element mapping

In the .emlIn the Markdown
From / To / Cc / Date / SubjectA compact Message section, encoded-words decoded
plain + HTML alternativesThe plain part, once (HTML converted only when it's all there is)
base64 / quoted-printable bodiesDecoded to readable text (em-dashes stop being =E2=80=94)
Long quoted chainsFirst lines kept, the rest cut with an explicit > … (N more quoted lines truncated) — the newest message is never touched
AttachmentsListed by name and size, honestly marked not extracted — convert the attachment file itself separately

Before → after

In the file

Subject: =?UTF-8?B?UmU6IFZlbmRvciDigJQgbmV4dCBzdGVwcw==?=
Content-Transfer-Encoding: quoted-printable

Agreed =E2=80=94 let's lock it Friday.
> On Tue, Jordan wrote:
> > On Mon, Sam wrote:
> > > (…the whole thread again…)

In the Markdown

## Message
- **Subject:** Re: Vendor — next steps

## Body
Agreed — let's lock it Friday.
> On Tue, Jordan wrote:
> … (23 more quoted lines truncated)

Honest limits

FAQ

How do I get a .eml? Drag the message from your mail client to the desktop (most clients), or use "Save as" / "Show original → Download".

Why truncate quotes at all? Because the chain is the same text repeated — and because we say so in the output, the model knows history was elided rather than missing. That's the difference between lean and lossy.

Private correspondence? Local conversion, nothing uploaded — verifiably.

Try the sample thread — decoded subject, truncated quote pyramid, attachment listed.