Back to blog

I Made My Website Readable for AI. Here Is How.

I Made My Website Readable for AI. Here Is How.

A website built for humans is not the same as one built to be read by AI. Search engines crawl structure and keywords. AI systems rely more on structured metadata to understand who you are, what your content is about, and how to cite you correctly.

If you ask an AI assistant about a personal site and the answer comes back vague or off, that gap is usually why. The HTML is fine. The signals AI looks for just aren’t there.

This post is a working list of what I changed across the sites I run to close that gap. Each section covers one piece (a meta tag, a schema block, a robots rule), why it matters for AI visibility, and the code that goes with it.

What is LLM SEO?

LLM SEO, also called Generative Engine Optimization (GEO), is the practice of structuring a website so AI systems like ChatGPT, Claude, Perplexity, and Google AI Overviews can read, understand, and cite it correctly. It overlaps with classic SEO but emphasizes structured metadata, quotable passages, and clear entity signals over pure keyword targeting.

The goal is no longer “rank on page one”. The goal is “get quoted inside the answer”.

Why does it matter right now?

A few numbers from late 2025 and early 2026 that pushed me to take this seriously:

Google AI Overviews reach
1.5 billion users per month, across 200+ countries
Source · Google
AI Overviews query coverage
50%+ of all searches
Source · Industry data
ChatGPT weekly active users
800 million (October 2025)
Source · OpenAI
Perplexity monthly queries
780 million (May 2025)
Source · Perplexity (Bloomberg Tech Summit)
AI-referred sessions growth
+527% (January to May 2025)
Source · Previsible
Brand mentions vs backlinks for AI citations
Mentions correlate roughly 3x more strongly with AI citations than backlinks
Source · Ahrefs, December 2025 study of 75,000 brands

The last point is the one most people miss. Backlinks still matter for classic search. For AI citations, getting talked about on YouTube, Reddit, Wikipedia, and LinkedIn matters more than a Domain Rating score. AI systems pick sources by entity confidence, not by who has the strongest link profile.

Quick summary: the six things that move the needle

If you only have an afternoon for this, do these six things in order:

  1. Add JSON-LD BlogPosting schema to every post with author, datePublished, and sameAs links to GitHub and LinkedIn.
  2. Add Open Graph and Twitter Card meta tags with explicit titles, descriptions, and an image on every page.
  3. Allow GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, PerplexityBot, and Google-Extended in robots.txt.
  4. Use question-based H2 and H3 headings (“How do I add structured data?” rather than “Structured data”).
  5. Write self-contained answer blocks of 40 to 75 words, each leading with the answer in the first sentence or two.
  6. Use your full name and consistent biographical claims across every page so AI can resolve you as a single entity.

Everything below this section is the detail on why each step matters and how I implemented it.

What I added

Open Graph and Twitter Cards

These tags create rich previews when you share a link. They’re also one of the clearest signals you can give an AI about what a page contains. Both Open Graph and Twitter Cards are widely read by AI systems when they crawl pages.

<meta property="og:title" content="I Made My Website Readable for AI. Here Is How.">
<meta property="og:description" content="How I made my portfolio readable for AI systems.">
<meta property="og:type" content="article">
<meta property="og:url" content="https://milos.oroz.space/en/blog/llm-crawling-optimization/">
<meta property="og:image" content="https://milos.oroz.space/images/blog/home-page.png">

<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="I Made My Website Readable for AI. Here Is How.">
<meta name="twitter:description" content="How I made my portfolio readable for AI systems.">

Instead of leaving AI to guess from the page content, this gives it a definitive description upfront.

JSON-LD Structured Data

This is the most impactful change. JSON-LD lets you tell a machine exactly what type of content this is, who wrote it, and when. Structured data measurably improves how accurately AI systems describe a site.

The sameAs field is the part most people skip. It links your name to your verified profiles on GitHub, LinkedIn, and elsewhere. That cross-referencing is how AI systems build confidence about who you actually are.

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "I Made My Website Readable for AI. Here Is How.",
  "datePublished": "2026-01-15",
  "dateModified": "2026-01-15",
  "url": "https://milos.oroz.space/en/blog/llm-crawling-optimization/",
  "image": "https://milos.oroz.space/images/blog/home-page.png",
  "keywords": ["LLM optimization", "AI SEO", "GEO", "structured data", "JSON-LD"],
  "author": {
    "@type": "Person",
    "name": "Milos Oroz",
    "url": "https://milos.oroz.space",
    "sameAs": [
      "https://github.com/hz47",
      "https://www.linkedin.com/in/milosoroz/"
    ]
  },
  "publisher": {
    "@type": "Person",
    "name": "Milos Oroz",
    "url": "https://milos.oroz.space"
  }
}

I add this to every blog post. On homepages I wrap a Person inside a ProfilePage, and reuse the same @id (https://milos.oroz.space/#milos-oroz) across every page so AI systems treat all the references as one entity instead of several lookalikes. The @id reuse is the small detail that does the heavy lifting.

Which AI crawlers should I allow in robots.txt?

AI companies run their own crawlers with specific user-agent names. Most default robots.txt files say nothing about them, which means they crawl anyway. Better to be explicit.

The ones worth knowing about, grouped by owner, as of early 2026:

OpenAI
GPTBot · ChatGPT training and web search
OAI-SearchBot · OpenAI search features
ChatGPT-User · ChatGPT browsing on user request
Anthropic
ClaudeBot · Claude web features
Claude-SearchBot · Claude search
Perplexity
PerplexityBot · Perplexity AI search
Google
Google-Extended · Gemini training, AI features
Meta
Meta-ExternalAgent · Meta AI
Common Crawl
CCBot · Training data (often blocked)
ByteDance
Bytespider · TikTok / Douyin AI
# Explicit allow-list for AI training and AI search crawlers
User-agent: GPTBot
User-agent: OAI-SearchBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-SearchBot
User-agent: PerplexityBot
User-agent: Google-Extended
User-agent: Meta-ExternalAgent
Allow: /
Disallow: /private/

# Everyone else (Googlebot, Bingbot, and the long tail)
User-agent: *
Allow: /
Disallow: /private/

One gotcha worth knowing. When a bot has its own User-agent section, it only reads that section. It does not also read the * block. So any Disallow rule you want applied to AI crawlers has to be repeated inside their block. Easy to forget, and the kind of mistake that quietly leaks private paths.

If you want your content discoverable, allow them. If you don’t, Disallow: /. robots.txt is a signal, not a hard wall. Compliant crawlers like GPTBot and ClaudeBot respect it, but others have been documented rotating IPs to bypass it.

Sitemap

A sitemap tells crawlers exactly which pages exist. Without one, they have to follow links and hope they find everything. I use @astrojs/sitemap to generate it automatically at build time, with a small post-processing script that adds <lastmod> from each post’s frontmatter date.

Hreflang for bilingual content

Some of my sites are bilingual (English and German). Without hreflang tags, AI might treat the two language versions as duplicate content rather than translations.

<link rel="alternate" hreflang="en" href="https://milos.oroz.space/en/blog/">
<link rel="alternate" hreflang="de" href="https://milos.oroz.space/de/blog/">
<link rel="alternate" hreflang="x-default" href="https://milos.oroz.space/en/blog/">

Canonical URLs

These tell crawlers which version of a page is the authoritative one. I also add article timestamps for blog posts.

<link rel="canonical" href="https://milos.oroz.space/en/blog/llm-crawling-optimization/">
<meta property="article:published_time" content="2026-01-15T00:00:00.000Z">
<meta property="article:author" content="Milos Oroz">

llms.txt

I add an llms.txt file at the root of each site. It’s a plain text file explaining what the site is about and who I am. The idea is solid: give AI crawlers a quick summary without making them parse all the HTML. But in practice, as of early 2026, no major AI crawler actually requests it. Adoption is tiny and Google has explicitly said they have no plans to support it. I kept mine since it costs nothing, but I wouldn’t prioritize it.

How do I write so AI actually quotes me?

The metadata helps AI understand the structure of the site. The content matters more, and how you write it changes how often AI cites you.

A few patterns that work:

  • Write self-contained passages of 40 to 75 words. A 2025 study of 10,000 AI citations found this range gets pulled into AI summaries roughly 3x more often than longer passages. Each passage should answer one question completely, without relying on the paragraph before it.
  • Use question-based headings. “How do I add structured data?” works better than “Structured Data” alone, because real user queries are phrased as questions. Headings that look like queries get matched and pulled into AI answers.
  • Start each section with the answer. Direct claims in the first 40 to 60 words of a section are more likely to be quoted. Stories and setup can follow.
  • Use your full name consistently across every page. “Milos Oroz” everywhere, not “I” or “the author” or a shortened version. AI systems build entity graphs, and consistent naming helps them recognize you across pages and sites.
  • State things directly. “I build cloud infrastructure and automation tools” is clearer than “I work in tech.” Specific claims are easier to parse and more likely to be cited accurately.
  • Link to your work. A blog post that references a live project, a GitHub repo, or a published tool gives AI something verifiable to anchor your claims to. It also creates signal that you are a real person who ships real things.
  • Add tables for comparative data. AI systems pull tabular data into answers more readily than the same information buried in prose. A five-row comparison table can be cited verbatim where five paragraphs cannot.

How is LLM SEO different from classic SEO?

Classic SEO optimizes for rankings on a results page. LLM SEO optimizes for being quoted inside an AI-generated answer. The two overlap, but the tactics diverge:

Win metric
Classic SEO · Top-10 ranking
LLM SEO / GEO · Citation inside the answer
Primary signals
Classic SEO · Backlinks, keywords, on-page
LLM SEO / GEO · Brand mentions, entity clarity, structure
Best content shape
Classic SEO · Comprehensive long-form
LLM SEO / GEO · Self-contained passages of 40 to 75 words
Heading style
Classic SEO · Keyword-led
LLM SEO / GEO · Question-based
Metadata weight
Classic SEO · Title tag, meta description
LLM SEO / GEO · JSON-LD, sameAs, llms.txt
Where the reader sees you
Classic SEO · Search results page
LLM SEO / GEO · Inside the AI answer itself

There’s also a structural difference. Google AI Mode (launched May 2025) shows zero blue links in its conversational tab. If the AI doesn’t quote you, you don’t exist on that surface. That changes the stakes.

Frequently asked questions

Do AI crawlers execute JavaScript?

Most don’t. GPTBot, ClaudeBot, and PerplexityBot read static HTML. If your important content only renders after a JavaScript bundle hydrates, an AI crawler may not see it at all. Server-side rendering or static generation (which is what Astro does by default) avoids the problem.

Does llms.txt actually do anything yet?

Not much, as of early 2026. Major AI crawlers don’t request it. Google has said it has no plans to support it. I keep mine because it costs nothing and may matter later, but I wouldn’t build a strategy around it.

Less than you’d think. Ahrefs’ December 2025 study of 75,000 brands found brand mentions correlate roughly three times more strongly with AI citations than backlinks do. Time spent getting talked about on YouTube, Reddit, and Wikipedia tends to outperform time spent building links.

Does the same content work for ChatGPT and Google AI Overviews?

Only about 11% of domains are cited across major AI platforms (ChatGPT, Perplexity, Google AI Overviews) for the same query. The platforms select sources differently. ChatGPT leans heavily on Wikipedia (around 48% of citations) and Reddit (about 11%). Google AI Overviews used to cite mostly top-10 ranking pages (76% in mid-2025), but that overlap has been falling sharply, down to roughly 38% by early 2026 as Gemini 3 pulls from wider source pools. Optimize for each separately if both matter to you.

How long should a blog post be for LLM SEO?

Length isn’t the lever. What matters is that the post contains several self-contained 40 to 75 word answer blocks, each addressing a distinct question. A 1500-word post with five clean answer blocks usually beats a 4000-word essay with none.

Should I write FAQ schema?

For AI platforms like ChatGPT and Perplexity, yes. They parse FAQPage as a hint that a passage is a discrete question and answer. For Google rich results, no. Google restricted FAQPage to government and healthcare sites in August 2023, so it has no SERP benefit for anyone else. The schema is cheap to add and has no downside, so it’s worth doing if you have legitimate FAQ content.

What I’m watching

The changes are in. Now I’m watching whether AI assistants summarize my sites more accurately, pick up the right expertise areas, and get less confused by translated content.

LLM behavior changes constantly, so this is an ongoing experiment. But the basics here are sound regardless of how AI evolves. Structured metadata has always mattered. It probably matters more now.

Sources