Teaching AI to Read My Website: A Practical Guide to LLM Optimization

Why I’m Thinking About LLM Optimization

I’ve been paying attention to how AI systems consume web content lately. It’s becoming clear that just having a clean, fast website with semantic HTML isn’t enough anymore. The way LLMs parse and understand websites is different from traditional search engines, and honestly, I’m still figuring out what works best.

When I tested AI assistants on my site, the results were inconsistent. Sometimes they got my background right, sometimes they missed key details, sometimes they seemed confused about what I actually do. It’s not that my site is broken, it’s that I haven’t optimized it for this new way of content discovery.

What Might Be Missing

The way I see it, AI systems crawling websites need context just like humans do. They want to know who I am, what I do, and what my site contains. Without clear signals, they have to guess from the page content alone.

My site covers the basics:

Clean semantic HTML
Proper heading hierarchy
Good accessibility
Fast loading times

But I’m probably missing metadata that helps AI understand the bigger picture. These are the areas I’m looking to improve.

Possible Improvements I’m Trying

1. Open Graph & Twitter Cards (The “Business Card”)

Open Graph tags are the standard way to tell anyone (human or machine) what a page contains. They’re what create rich previews when you share links. I’m implementing these to give AI systems clearer context about each page.

<!-- Open Graph -->
<meta property="og:title" content="Teaching AI to Read My Website: A Practical Guide to LLM Optimization">
<meta property="og:description" content="How I made my portfolio readable for AI systems using Open Graph tags, JSON-LD structured data, and sitemaps. Simple techniques that help machines understand your content.">
<meta property="og:image" content="https://milos.oroz.space/images/blog/home-page.png">
<meta property="og:url" content="https://milos.oroz.space/en/blog/llm-crawling-optimization/">
<meta property="og:type" content="article">
<meta property="og:site_name" content="milos oroz">
<meta property="og:locale" content="en_US">

<!-- Twitter Cards -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Teaching AI to Read My Website: A Practical Guide to LLM Optimization">
<meta name="twitter:description" content="How I made my portfolio readable for AI systems using Open Graph tags, JSON-LD structured data, and sitemaps.">
<meta name="twitter:image" content="https://milos.oroz.space/images/blog/home-page.png">

Why I’m trying this: These tags provide structured, definitive descriptions rather than leaving AI to guess from page content.

2. JSON-LD Structured Data (The “Nutrition Label”)

JSON-LD lets you explicitly tell a machine “This is a Person” or “This is a Blog Post.” It formats content into structured data that machines can parse more easily than HTML alone.

For the blog index:

{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Projects - milos oroz",
  "description": "Portfolio of milos oroz - Projects in Cloud, DevOps, and Automation",
  "url": "https://milos.oroz.space/en/blog/",
  "author": {
    "@type": "Person",
    "name": "Milos Oroz",
    "url": "https://milos.oroz.space/contact"
  }
}

For this blog post:

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Teaching AI to Read My Website: A Practical Guide to LLM Optimization",
  "description": "How I made my portfolio readable for AI systems using Open Graph tags, JSON-LD structured data, and sitemaps. Simple techniques that help machines understand your content.",
  "datePublished": "2026-01-15",
  "dateModified": "2026-01-15",
  "url": "https://milos.oroz.space/en/blog/llm-crawling-optimization/",
  "image": "https://milos.oroz.space/images/blog/home-page.png",
  "author": {
    "@type": "Person",
    "name": "Milos Oroz",
    "url": "https://milos.oroz.space/contact"
  },
  "publisher": {
    "@type": "Person",
    "name": "Milos Oroz",
    "url": "https://milos.oroz.space"
  }
}

Why I’m trying this: This should give AI systems clearer understanding of content type, authorship, and entity relationships. Schema.org markup provides structured context that helps machines parse content more accurately.

3. Automatic Sitemaps (The “Map”)

A sitemap essentially says, “Here is a list of everything on my site.” Without one, crawlers have to follow links to discover content. I’m using @astrojs/sitemap to generate this automatically.

<url>
  <loc>https://milos.oroz.space/en/blog/llm-crawling-optimization/</loc>
  <lastmod>2026-01-15</lastmod>
</url>

Why I’m trying this: Crawlers shouldn’t have to guess which pages exist. This gives them a complete directory. It’s a basic SEO practice that might be even more important as LLMs start crawling more aggressively.

4. Hreflang for Bilingual Content (The “Translator”)

My site has both English and German versions, so I’m adding hreflang tags. Without them, AI might see the same content in different languages and get confused about which to use.

<link rel="alternate" hreflang="en" href="https://milos.oroz.space/en/blog/">
<link rel="alternate" hreflang="de" href="https://milos.oroz.space/de/blog/">
<link rel="alternate" hreflang="x-default" href="https://milos.oroz.space/en/blog/">

Why I’m trying this: This should help AI understand these are translations, not duplicate content. Important for a bilingual site.

5. Canonical URLs & Article Meta (The “Source of Truth”)

Canonical URLs tell crawlers which version of a page is the authoritative one. Article meta tags provide timestamps for blog posts.

<!-- Canonical URL -->
<link rel="canonical" href="https://milos.oroz.space/en/blog/llm-crawling-optimization/">

<!-- Article-specific meta (for this blog post) -->
<meta property="article:published_time" content="2026-01-15T00:00:00.000Z">
<meta property="article:modified_time" content="2026-01-15T00:00:00.000Z">
<meta property="article:author" content="Milos Oroz">

Why I’m trying this: These are standard SEO practices that might help LLMs understand content freshness and authority.

6. llms.txt (The “AI README”)

I’m experimenting with an llms.txt file at the root of my site. It’s basically an “AI README” — a simple text file explaining what the site is about, who I am, and how to cite my content.

The idea is that AI crawlers could get quick context without parsing all the HTML. It’s a low-effort addition that might give AI systems better context. Whether it actually helps, time will tell.

Why I’m trying this: It’s a new pattern some developers are experimenting with. Worth testing to see if it makes a difference.

Technical Implementation Details (click to expand)

While implementing this in Astro, I ran into a few specific hiccups:

Dynamic Meta Generation: Since this is a static site, all meta tags must be generated at build time. I updated my Astro layout to accept props for title, description, images, dates, and author. The layout then constructs both Open Graph tags and Twitter Cards from these props, ensuring consistency across all platforms.

Conditional Article Meta: Blog posts need article-specific meta tags (article:published_time, etc.) while regular pages don’t. I conditionally render these only when the type prop is set to “article”.

Absolute vs. Relative Paths: Open Graph and Twitter Card images require absolute URLs (e.g., https://...), not relative paths (/images/...). Relative paths work for browsers, but they break link previews and confuse external crawlers. I prepend the site domain to all image paths.

Tailwind CSS Opacity Conflict: I accidentally had both opacity-50 and opacity-100 classes applied simultaneously within my dynamic components. The fix was using a ternary operator to ensure only one class exists at render time.

JSON-LD Publisher Field: The Schema.org BlogPosting type benefits from a publisher field alongside author. Since this is a personal site, both point to me, but having both provides better entity recognition for LLMs.

What I’m Watching For

I’ve implemented these changes, but I’m not claiming victory yet. Here’s what I’ll be watching:

Do AI assistants summarize my site more accurately?
Do they pick up on the right expertise areas?
Does the bilingual structure confuse them less?

I’ll need to test over time and see what actually works. LLM behavior changes constantly, so what’s effective today might not be tomorrow.

Why This Matters

AI systems are becoming a primary way people discover content. I want my site to be understood correctly by these systems, not just by humans. That means paying attention to the metadata and structured data standards we’ve had for years—they might matter even more now.

This isn’t about gaming some algorithm. It’s about making sure when AI systems do crawl my site, they understand what I’m actually saying.