I Made My Website Readable for AI. Here Is How.
A website built for humans is not the same as one built to be read by AI. Search engines crawl structure and keywords. AI systems rely more on structured metadata to understand who you are, what your content is about, and how to cite you correctly.
If you ask an AI assistant about a personal site and the answer comes back vague or off, that gap is usually why. The HTML is fine. The signals AI looks for just aren’t there.
This post is a working list of what I changed across the sites I run to close that gap. Each section covers one piece (a meta tag, a schema block, a robots rule), why it matters for AI visibility, and the code that goes with it.
What is LLM SEO?
LLM SEO, also called Generative Engine Optimization (GEO), is the practice of structuring a website so AI systems like ChatGPT, Claude, Perplexity, and Google AI Overviews can read, understand, and cite it correctly. It overlaps with classic SEO but emphasizes structured metadata, quotable passages, and clear entity signals over pure keyword targeting.
The goal is no longer “rank on page one”. The goal is “get quoted inside the answer”.
Why does it matter right now?
A few numbers from late 2025 and early 2026 that pushed me to take this seriously:
The last point is the one most people miss. Backlinks still matter for classic search. For AI citations, getting talked about on YouTube, Reddit, Wikipedia, and LinkedIn matters more than a Domain Rating score. AI systems pick sources by entity confidence, not by who has the strongest link profile.
Quick summary: the six things that move the needle
If you only have an afternoon for this, do these six things in order:
- Add JSON-LD
BlogPostingschema to every post withauthor,datePublished, andsameAslinks to GitHub and LinkedIn. - Add Open Graph and Twitter Card meta tags with explicit titles, descriptions, and an image on every page.
- Allow GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, PerplexityBot, and Google-Extended in
robots.txt. - Use question-based H2 and H3 headings (“How do I add structured data?” rather than “Structured data”).
- Write self-contained answer blocks of 40 to 75 words, each leading with the answer in the first sentence or two.
- Use your full name and consistent biographical claims across every page so AI can resolve you as a single entity.
Everything below this section is the detail on why each step matters and how I implemented it.
What I added
Open Graph and Twitter Cards
These tags create rich previews when you share a link. They’re also one of the clearest signals you can give an AI about what a page contains. Both Open Graph and Twitter Cards are widely read by AI systems when they crawl pages.
<meta property="og:title" content="I Made My Website Readable for AI. Here Is How.">
<meta property="og:description" content="How I made my portfolio readable for AI systems.">
<meta property="og:type" content="article">
<meta property="og:url" content="https://milos.oroz.space/en/blog/llm-crawling-optimization/">
<meta property="og:image" content="https://milos.oroz.space/images/blog/home-page.png">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="I Made My Website Readable for AI. Here Is How.">
<meta name="twitter:description" content="How I made my portfolio readable for AI systems.">
Instead of leaving AI to guess from the page content, this gives it a definitive description upfront.
JSON-LD Structured Data
This is the most impactful change. JSON-LD lets you tell a machine exactly what type of content this is, who wrote it, and when. Structured data measurably improves how accurately AI systems describe a site.
The sameAs field is the part most people skip. It links your name to your verified profiles on GitHub, LinkedIn, and elsewhere. That cross-referencing is how AI systems build confidence about who you actually are.
{
"@context": "https://schema.org",
"@type": "BlogPosting",
"headline": "I Made My Website Readable for AI. Here Is How.",
"datePublished": "2026-01-15",
"dateModified": "2026-01-15",
"url": "https://milos.oroz.space/en/blog/llm-crawling-optimization/",
"image": "https://milos.oroz.space/images/blog/home-page.png",
"keywords": ["LLM optimization", "AI SEO", "GEO", "structured data", "JSON-LD"],
"author": {
"@type": "Person",
"name": "Milos Oroz",
"url": "https://milos.oroz.space",
"sameAs": [
"https://github.com/hz47",
"https://www.linkedin.com/in/milosoroz/"
]
},
"publisher": {
"@type": "Person",
"name": "Milos Oroz",
"url": "https://milos.oroz.space"
}
}
I add this to every blog post. On homepages I wrap a Person inside a ProfilePage, and reuse the same @id (https://milos.oroz.space/#milos-oroz) across every page so AI systems treat all the references as one entity instead of several lookalikes. The @id reuse is the small detail that does the heavy lifting.
Which AI crawlers should I allow in robots.txt?
AI companies run their own crawlers with specific user-agent names. Most default robots.txt files say nothing about them, which means they crawl anyway. Better to be explicit.
The ones worth knowing about, grouped by owner, as of early 2026:
GPTBot · ChatGPT training and web searchOAI-SearchBot · OpenAI search featuresChatGPT-User · ChatGPT browsing on user requestClaudeBot · Claude web featuresClaude-SearchBot · Claude searchPerplexityBot · Perplexity AI searchGoogle-Extended · Gemini training, AI featuresMeta-ExternalAgent · Meta AICCBot · Training data (often blocked)Bytespider · TikTok / Douyin AI# Explicit allow-list for AI training and AI search crawlers
User-agent: GPTBot
User-agent: OAI-SearchBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-SearchBot
User-agent: PerplexityBot
User-agent: Google-Extended
User-agent: Meta-ExternalAgent
Allow: /
Disallow: /private/
# Everyone else (Googlebot, Bingbot, and the long tail)
User-agent: *
Allow: /
Disallow: /private/
One gotcha worth knowing. When a bot has its own User-agent section, it only reads that section. It does not also read the * block. So any Disallow rule you want applied to AI crawlers has to be repeated inside their block. Easy to forget, and the kind of mistake that quietly leaks private paths.
If you want your content discoverable, allow them. If you don’t, Disallow: /. robots.txt is a signal, not a hard wall. Compliant crawlers like GPTBot and ClaudeBot respect it, but others have been documented rotating IPs to bypass it.
Sitemap
A sitemap tells crawlers exactly which pages exist. Without one, they have to follow links and hope they find everything. I use @astrojs/sitemap to generate it automatically at build time, with a small post-processing script that adds <lastmod> from each post’s frontmatter date.
Hreflang for bilingual content
Some of my sites are bilingual (English and German). Without hreflang tags, AI might treat the two language versions as duplicate content rather than translations.
<link rel="alternate" hreflang="en" href="https://milos.oroz.space/en/blog/">
<link rel="alternate" hreflang="de" href="https://milos.oroz.space/de/blog/">
<link rel="alternate" hreflang="x-default" href="https://milos.oroz.space/en/blog/">
Canonical URLs
These tell crawlers which version of a page is the authoritative one. I also add article timestamps for blog posts.
<link rel="canonical" href="https://milos.oroz.space/en/blog/llm-crawling-optimization/">
<meta property="article:published_time" content="2026-01-15T00:00:00.000Z">
<meta property="article:author" content="Milos Oroz">
llms.txt
I add an llms.txt file at the root of each site. It’s a plain text file explaining what the site is about and who I am. The idea is solid: give AI crawlers a quick summary without making them parse all the HTML. But in practice, as of early 2026, no major AI crawler actually requests it. Adoption is tiny and Google has explicitly said they have no plans to support it. I kept mine since it costs nothing, but I wouldn’t prioritize it.
How do I write so AI actually quotes me?
The metadata helps AI understand the structure of the site. The content matters more, and how you write it changes how often AI cites you.
A few patterns that work:
- Write self-contained passages of 40 to 75 words. A 2025 study of 10,000 AI citations found this range gets pulled into AI summaries roughly 3x more often than longer passages. Each passage should answer one question completely, without relying on the paragraph before it.
- Use question-based headings. “How do I add structured data?” works better than “Structured Data” alone, because real user queries are phrased as questions. Headings that look like queries get matched and pulled into AI answers.
- Start each section with the answer. Direct claims in the first 40 to 60 words of a section are more likely to be quoted. Stories and setup can follow.
- Use your full name consistently across every page. “Milos Oroz” everywhere, not “I” or “the author” or a shortened version. AI systems build entity graphs, and consistent naming helps them recognize you across pages and sites.
- State things directly. “I build cloud infrastructure and automation tools” is clearer than “I work in tech.” Specific claims are easier to parse and more likely to be cited accurately.
- Link to your work. A blog post that references a live project, a GitHub repo, or a published tool gives AI something verifiable to anchor your claims to. It also creates signal that you are a real person who ships real things.
- Add tables for comparative data. AI systems pull tabular data into answers more readily than the same information buried in prose. A five-row comparison table can be cited verbatim where five paragraphs cannot.
How is LLM SEO different from classic SEO?
Classic SEO optimizes for rankings on a results page. LLM SEO optimizes for being quoted inside an AI-generated answer. The two overlap, but the tactics diverge:
There’s also a structural difference. Google AI Mode (launched May 2025) shows zero blue links in its conversational tab. If the AI doesn’t quote you, you don’t exist on that surface. That changes the stakes.
Frequently asked questions
Do AI crawlers execute JavaScript?
Most don’t. GPTBot, ClaudeBot, and PerplexityBot read static HTML. If your important content only renders after a JavaScript bundle hydrates, an AI crawler may not see it at all. Server-side rendering or static generation (which is what Astro does by default) avoids the problem.
Does llms.txt actually do anything yet?
Not much, as of early 2026. Major AI crawlers don’t request it. Google has said it has no plans to support it. I keep mine because it costs nothing and may matter later, but I wouldn’t build a strategy around it.
Do I still need backlinks for AI citations?
Less than you’d think. Ahrefs’ December 2025 study of 75,000 brands found brand mentions correlate roughly three times more strongly with AI citations than backlinks do. Time spent getting talked about on YouTube, Reddit, and Wikipedia tends to outperform time spent building links.
Does the same content work for ChatGPT and Google AI Overviews?
Only about 11% of domains are cited across major AI platforms (ChatGPT, Perplexity, Google AI Overviews) for the same query. The platforms select sources differently. ChatGPT leans heavily on Wikipedia (around 48% of citations) and Reddit (about 11%). Google AI Overviews used to cite mostly top-10 ranking pages (76% in mid-2025), but that overlap has been falling sharply, down to roughly 38% by early 2026 as Gemini 3 pulls from wider source pools. Optimize for each separately if both matter to you.
How long should a blog post be for LLM SEO?
Length isn’t the lever. What matters is that the post contains several self-contained 40 to 75 word answer blocks, each addressing a distinct question. A 1500-word post with five clean answer blocks usually beats a 4000-word essay with none.
Should I write FAQ schema?
For AI platforms like ChatGPT and Perplexity, yes. They parse FAQPage as a hint that a passage is a discrete question and answer. For Google rich results, no. Google restricted FAQPage to government and healthcare sites in August 2023, so it has no SERP benefit for anyone else. The schema is cheap to add and has no downside, so it’s worth doing if you have legitimate FAQ content.
What I’m watching
The changes are in. Now I’m watching whether AI assistants summarize my sites more accurately, pick up the right expertise areas, and get less confused by translated content.
LLM behavior changes constantly, so this is an ongoing experiment. But the basics here are sound regardless of how AI evolves. Structured metadata has always mattered. It probably matters more now.
Sources
- Google · AI Overviews expansion to 200+ countries (May 2025)
- Search Engine Land · ChatGPT reaches 900M weekly active users (Feb 2026)
- Backlinko · Perplexity AI user and query statistics
- Ahrefs · Top Brand Visibility Factors in ChatGPT, AI Mode, and AI Overviews (Dec 2025, 75K brands)
- Search Engine Land · AI traffic is up 527% (2025 Previsible / SparkToro data)
- The Digital Bloom · 2025 AI Visibility Report (passage-length citation analysis)
- Ahrefs · Update: 38% of AI Overview citations pull from the Top 10 (2026)
- Profound · AI Platform Citation Patterns (Wikipedia 47.9%, Reddit 11.3% in ChatGPT)
- Leapd · How ChatGPT, Google AI Overviews, and Perplexity Source Information (~11% domain overlap)