AI-Ready CMO
0 of 12 lessons visited0%
I — Infrastructure6 minutes

Making Your Site Readable by AI

You can have the most authoritative brand and the best content in the world, and none of it matters if AI crawlers can't access it or AI agents can't parse it. Infrastructure is the prerequisite layer — get it wrong and every other optimization fails silently. Most marketing teams skip this because it feels “technical.” You don't need to implement this yourself — but you need to know what to ask your dev team for.

AI Crawler Access: Robots.txt Configuration

AI platforms send bots to crawl your website, using different bot names. Many sites accidentally block them — the most common silent failure in AI SEO.

robots.txt Configuration

# AI Search Crawlers — Allow All
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bytespider
Allow: /

# Standard Search Engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Quick audit you can do right now: Visit yourdomain.com/robots.txt. If you see Disallow: / under any of these user agents, your content is invisible to that AI platform. Fix it today.

Schema Markup: The Structured Data AI Systems Consume

Microsoft's Fabrice Canel confirmed at SMX Munich in March 2025 that schema markup helps LLMs understand content.

Schema ImpactData Point
LLM accuracy with knowledge graph grounding vs. unstructured data300% higher
Perplexity citation likelihood with comprehensive schema28% more likely
AI recommendation frequency for products with structured schema3–5x more frequently

Priority Schema Types

Schema TypeUse CasePriority
OrganizationCompany entity — name, logo, social profiles, founding dateCritical
FAQPageFAQ sections — each Q&A is a discrete extractable unitCritical
Article / BlogPostingBlog content — author, date published, date modifiedCritical
ProductProduct pages — name, description, price, ratingHigh
PersonAuthor/expert pages — name, credentials, organizationHigh
Review / AggregateRatingReview data — rating value, review countHigh
HowToTutorial/guide — step-by-step structured dataMedium
BreadcrumbListSite navigation structureMedium

JSON-LD Example: FAQPage Schema

JSON-LD Schema Markup

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What is AI SEO?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "AI SEO (also called GEO or AEO) is the practice of optimizing content to be cited and recommended by AI search engines like ChatGPT, Perplexity, and Google AI Overviews."
    }
  }]
}
</script>

Validation step: After implementing any schema, validate with Google's Rich Results Test and Schema.org Validator. Invalid schema is worse than no schema.

Server-Side Rendering: The Agentic Imperative

AI agents do NOT render JavaScript. They need plain-text, machine-readable information from the server.

BrightEdge reports that AI agents now account for ~33% of organic search activity. If your site is a JavaScript-heavy SPA that requires client-side rendering, it's invisible to these agents.

The fix: server-side rendering (SSR) or static site generation (SSG). These ensure that when any bot or agent requests your page, they receive the full content as plain HTML.

Quick Test

“If I curl our homepage, do I get the full content, or an empty div?” — If the answer is an empty div, your content is invisible to AI agents.

Page Speed and Retrieval Timeouts

AI agents operate within 1–5 second retrieval timeouts. If your page doesn't load within that window, the agent moves on and your content never gets processed.

BenchmarkTargetWhy
Time to First Byte (TTFB)< 200msServer responsiveness — AI agents won't wait
First Contentful Paint< 1.5 secondsWhen crawlers see actual content
Full page load< 3 secondsWithin the 1–5 second agent timeout window

llms.txt: The Honest Assessment

Proposed by Jeremy Howard (co-founder of Answer.AI) in September 2024, llms.txt is a plain-text file placed at your domain root that provides LLMs with a structured overview of your site.

What Works

LangChain agents outperform with optimized llms.txt. Anthropic has requested it. AI agents visit llms-full.txt at 2x the rate of llms.txt.

What Doesn't

Major providers haven't implemented native support. Search Engine Land's 2+ month test showed zero visits from AI bots.

Verdict

Implement it. 1–4 hours, zero downside. Most valuable for dev docs and AI coding assistants now. Consumer impact minimal today, likely to grow.

llms.txt Basic Structure

# YourBrand

> A brief description of your company and what you do.

## Docs

- [Getting Started](https://yourdomain.com/docs/getting-started): How to get started with our platform
- [API Reference](https://yourdomain.com/docs/api): Complete API documentation
- [Pricing](https://yourdomain.com/pricing): Plans and pricing information

## Blog

- [Latest Release](https://yourdomain.com/blog/latest): What's new in v2.0
- [Best Practices](https://yourdomain.com/blog/best-practices): How to get the most out of our product

## Optional

- [About Us](https://yourdomain.com/about): Company background and team
- [Case Studies](https://yourdomain.com/cases): Customer success stories

Google Merchant Center for Commerce

Google's Universal Commerce Protocol (UCP), announced January 2026, is designed to make product data universally accessible to AI agents and shopping experiences.

Product feed requirements: title, description, price, availability, images, category, brand, GTIN/MPN, condition, and shipping information.

We'll cover commerce protocols in full detail in Lesson 11: AI Commerce Protocols. For now, ensure your Google Merchant Center feed is complete and accurate.

Technical AI-Readiness Checklist

CheckWhat to VerifyPriority
Robots.txt allows GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-ExtendedCritical
HTTPS across entire siteCritical
Organization schema in JSON-LD on homepageCritical
FAQPage schema on all FAQ content sectionsCritical
Article/BlogPosting schema on blog with author + datesCritical
Product schema on all product pages (if applicable)High
Server-side rendering or static site generation enabledHigh
Full page load under 3 secondsHigh
Google Merchant Center feed complete (if e-commerce)High
TTFB under 200msMedium
llms.txt placed at domain rootMedium
XML sitemap current and submitted to Google + BingMedium
All schema validated with Google Rich Results TestMedium

For consultants: This checklist is one of the most valuable deliverables you can offer a client. Hand it over, schedule a review in two weeks, and verify completion.

How to Brief Your Dev Team

Copy-paste this template and send it to your development team today. Modify the specifics for your brand.

Dev Team Brief Template

Subject: AI Search Optimization — Technical Requirements

Hi [Dev Team / Name],

We need to ensure our site is optimized for AI search engines
(ChatGPT, Perplexity, Google AI Overviews). Here are five
priority items:

1. ROBOTS.TXT UPDATE
   Confirm that GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot,
   and Google-Extended are all set to Allow: / in our robots.txt.
   Current file: [yourdomain.com/robots.txt]

2. SCHEMA MARKUP (JSON-LD)
   Add the following schema types:
   - Organization schema on homepage
   - FAQPage schema on all FAQ sections
   - Article/BlogPosting schema on blog posts
   - Product schema on product pages (if applicable)

3. SERVER-SIDE RENDERING CHECK
   Run: curl [yourdomain.com] and confirm full page content
   is returned in the HTML response (not loaded via JS).
   If content requires client-side rendering, we need to
   implement SSR or SSG.

4. PAGE SPEED
   Targets:
   - TTFB: < 200ms
   - First Contentful Paint: < 1.5s
   - Full page load: < 3s

5. LLMS.TXT
   Create a plain-text file at [yourdomain.com/llms.txt]
   with a structured overview of our site content.
   Format spec: https://llmstxt.org/

Business context: AI agents now account for ~33% of organic
search activity. If our site isn't readable by these agents,
we're invisible to a growing share of search traffic.

Timeline: Please complete within two weeks. Let me know if
you have questions or need clarification on any item.

Thanks,
[Your Name]

Key Takeaways

  • The most common infrastructure failure is blocked crawlers in robots.txt — takes 5 minutes to fix
  • Schema markup has confirmed endorsement from Microsoft, with 300% accuracy improvement and 28% higher Perplexity citations
  • Server-side rendering is the emerging imperative as AI agents (33% of organic activity) can't process JavaScript
  • Use the technical checklist — hand it to your dev team this week and verify completion within two weeks