Tuesday, June 16, 2026
HomeSEOGoogle Says Markdown For AI SEO Strips Away The Parts That Matter

Google Says Markdown For AI SEO Strips Away The Parts That Matter


On a recent Search Off the Record podcast, hosts John Mueller and Martin Splitt pushed back on the idea promoted by AI SEOs that stripped-down, content-only versions are a better way to optimize for AI Search. They made the case that all the things AI SEOs want to remove are actually useful for ranking.

Non-Content Parts Of Web Pages Matter

The TL;DR of this part is that HTML is for browsers to render into a visible page for humans, as well as for screen readers to read.

Martin Splitt begins the discussion by explaining why plain HTML appears not to be the ideal way to provide content to AI agents and LLMs. The idea is that, in addition to content, there’s a lot of other code in the HTML that is irrelevant for an LLM or AI agent that may be visiting a site for the content.

The appeal of markdown, then, is that it can provide the content in a manner that breaks free of all the HTML that’s meant to make a web page visible for humans or readable by a screen reader.

Splitt explains:

“And I think that’s also why people think it’s good for LLMs, because you have less stuff, less tokens. And if you look at an HTML file without a browser rendering it, if you just look at the plain HTML in a text editor, basically, then it’s hard to read the content, because there’s so much cruft, so much stuff in it. There’s all these HTML tags and all this maybe even inline styles and all that kind of stuff.”

He also praises markdown for the ability to still communicate the essence of the content:

“But if a Markdown render fails and you look at the Markdown file in a text editor, it still is structured and readable. Like a link is the word of the link text, like the anchor text, and then in square brackets and then in normal brackets. It’s probably what I would do if text was all I had available.

If I was writing an email without the possibility to actually link things, I would probably mark up some sort of link text and then put some sort of way to say, like, and this is where you need to go to actually see that.

And I think this minimalism is probably what makes people think, yeah, this is great for a machine that needs to understand this content, unlike HTML.”

Converting HTML To Text Is Trivial

Mueller and Splitt noted that despite how complex HTML looks, crawling and making sense of it is trivial and very easy to do. The selling point about using markdown for LLMs, that it simplifies crawling and indexing content, completely breaks down at this point.

John Mueller explains:

“I think the big thing is that the web with HTML and everything has been around for really long time, longer than Markdown. And all of the crawlers out there, have practiced with HTML. And converting HTML into text is trivial. There are lots of libraries out there that can do that for you. So if you think about what an average web crawler might look for or might need to find on a page to be able to understand it, then probably that’s just HTML.”

Markdown Fails For Content Discovery

Discovery is when any crawler visits a web page and discovers other web pages within a single website, and also from website to website.

Splitt said that markdown is focused on just one part of the content: the content itself. He explained that this makes it harder for search engines to see a web page in the context of how it connects to the rest of a website’s content through links, which aid discovery.

He explained:

“Yeah, and I mean, the other thing is, yes, it’s nice that Markdown is usually then focusing on a piece of content, but HTML with all the links and navigation and the headers and all that kind of stuff that kind of gets stripped out in the Markdown files that make the website are important to understand the structure and how this connects to the rest of the site.

So I guess that’s also a bad thing. If we were to lose this, that’s probably not so good for crawling in Discovery, huh? “

Takeaway

Reading patents and research papers, it becomes clear that search engines see a website as a collection of individual web pages, but also as groups of web pages that belong to sections and categories, and also as the entire website itself as a whole. Zoom out, and the website is but one point among thousands and thousands of other websites in a neighborhood of websites, self-organized by links into categories and quality levels.

For SEO, we have to understand a site from both the zoomed-out and zoomed-in view to conceptualize how all the pieces fit together. The reason is because that’s what search engines do.

AI-based SEO seems to be hung up on making it easy for LLMs and AI agents to crawl and index content. Crawling and indexing are valid concerns. But by insisting on markdown files, they are not considering the fundamentals of discovery and how trivial it is to extract content from an HTML web page, which makes markdown files redundant.

Aside from the above issues, there is also the one about trustworthiness. There used to be a thing called a keyword meta tag that some search engines used to get a hint about what a web page was about. Naturally, site owners and SEOs used it to dump all the keywords they wanted to rank for, regardless of the content.

I’m not saying that SEOs and website owners are untrustworthy, but search traffic is money, and people are going to do what they’re going to do. So the last consideration is that search engines will never trust markdown content and use it as the canonical when it’s a trivial thing to crawl and extract the original content from the HTML.

Circling back to what Mueller and Splitt discussed, Google insists that the AI SEO insistence on markdown strips away a significant amount of context that matters.

Watch Search Off The Record Episode 111 here:

RELATED ARTICLES

Most Popular

Recent Comments