Copying from a webpage floods the chat with junk
Select-all on an article, paste into a chat, and the model receives: a navigation menu, a cookie notice, the newsletter box, fourteen "related stories", the share bar — and somewhere inside, your article. The junk isn't just ugly. It costs tokens (often more than the article itself), and it actively misleads: models quote teaser headlines from the sidebar as if they were part of the text.
1 · Why paste picks up the chrome
The clipboard copies what the page renders, not what the article is. Modern pages are 80–95% scaffolding by markup weight; the article body is one branch of a very noisy tree. Your eye filters the chrome instantly. A paste doesn't.
2 · What article extraction does
The fix is a readability pass — the same family of algorithm behind your browser's Reader View. It scores the DOM for the densest coherent text block, keeps the article with its headings, links, images and tables, and discards navigation, ads and boilerplate. MakeItMarkdown runs exactly that (Mozilla's Readability, in your browser), then converts the surviving article to clean Markdown:
Raw paste
Home Products Pricing Blog
Accept all cookies?
Subscribe to our newsletter →
The measured results held across
every configuration we tried…
RELATED: 10 stories like thisArticle extraction
# The measured results
The measured results held across
every configuration we tried…- headings arrive as a real
##outline; - the article's tables survive as GFM (they're usually the first casualty of raw pastes);
- images become
[Figure: …]placeholders with their captions; - relative links are flagged in the fidelity report (a saved
page can't resolve
/prices— you'd want to know).
/assets/media/blog/fixes/02-paste-vs-extracted.png ·
side-by-side: raw select-all paste of a news article (menus/banners
visible) vs the converted Markdown pane. One image, two panes.3 · The fix, two ways
- Save the page (Ctrl/Cmd-S → "HTML only"),
then drop the
.htmlfile into MakeItMarkdown. - Or paste the page's HTML source directly onto the landing page (⌘V works there) — same extraction path.
Compare the Markdown pane with what a raw paste would have carried; the fidelity report shows what was detected (title, sections, tables, figures) so you can spot when extraction picked the wrong main block — it happens on unusual layouts, and the report makes it visible instead of silent.
4 · When raw paste is fine
Short, plain pages — documentation without heavy chrome, a gist, a plain-text mail archive — paste fine. The extraction pass earns its keep on anything with a menu bar and a business model.
Try it on the sample article — nav and ads in, clean Markdown out.