Feeding webpages to an LLM

MakeItMarkdown · July 2026

HTML is the best-structured source most people ever feed a model — real headings, real tables, real links — wrapped in the worst noise: navigation, ads, consent banners, recommendation sidebars. The conversion problem isn't recovering structure (it's there); it's finding the content.

How article extraction works

We run Mozilla's Readability — the algorithm family behind Reader View — in your browser. It scores the page's element tree for the densest coherent block of text and discards the rest. The surviving article then converts to Markdown with its structure intact.

The element mapping

In the page	In the Markdown
Article headings	The `#/##` outline (levels normalized)
Article tables	GFM pipe tables
Images + captions	`[Figure: …]` placeholders with captions; original URLs kept in the figure list
Links	Inline Markdown links; relative links are flagged in the fidelity report (a saved file can't resolve `/pricing`)
Code blocks	Fenced blocks
Nav, ads, banners, sidebars, footers	Discarded by the extraction pass

Before → after

In the file

<nav>Home · Pricing · Blog</nav>
<div class="cookie-banner">We value your privacy…</div>
<article><h2>The measurement</h2><p>We logged…</p></article>
<aside>Related: 14 stories</aside>

In the Markdown

## The measurement

We logged…

(nav, cookie banner and sidebar discarded by extraction)

Honest limits

JavaScript-rendered pages: what you saved is what converts. Save the page after it has rendered (Ctrl/Cmd-S in the browser), not via a raw source download.
Extraction can pick the wrong block on unusual layouts — comment threads have out-scored short articles before. The fidelity report's detected counts (title? sections? the table you expected?) are how you catch it in five seconds.
No URL fetching. You give us the file, not the address — the site makes no network requests with your content, which is the whole privacy model. Saving the page first is the one extra step.

FAQ

Can I paste HTML instead of saving a file? Yes — paste page source onto the landing page with ⌘V.

Documentation pages with heavy chrome? That's the sweet spot — see the junk-flood walkthrough in the fix article.

Whole sites? One page per file today; batch-drop several saved pages at once and the .zip bundles them.

Try the sample article — nav and ads in, clean outline out.