Lesson 2: How AI Engines Actually Choose Sources

The Foundation

The “how it works” foundation. The Princeton study, the Ahrefs inversion, and platform-by-platform trust models. The research that separates real understanding from cargo-cult tactics.

Everything in this lesson is backed by peer-reviewed research and large-scale empirical studies. No opinions — just data.

The Princeton Study That Changed Everything

Princeton and Georgia Tech's GEO paper, published at ACM KDD 2024, tested 9 optimization methods across 10,000 queries. The results upend decades of SEO assumptions.

Optimization Method	Visibility Impact	Verdict
Citing credible sources	+115.1%	Highest impact — for sites ranked 5th in SERPs
Adding statistics	+22–41%	Consistent gain across all site ranks
Including expert quotations	+22–37%	Named experts with credentials outperform
Precise technical terminology	+28%	Specificity beats generality
Keyword stuffing	-10%	Worse than doing nothing

Keyword stuffing — the backbone of old-school SEO — performs 10% worse than baseline in AI contexts.

Lower-ranked sites benefit disproportionately more from GEO optimization. If you're not already dominating SERPs, this data is especially relevant.

Tactical Implication

Stop optimizing for keyword density. Start optimizing for citation density, statistical evidence, and expert attribution.

The Ahrefs Study That Inverted SEO Wisdom

December 2025. 75,000 brands analyzed across ChatGPT, Google AI Mode, and AI Overviews. The correlations shatter conventional SEO priorities.

Signal	Correlation	Implication
YouTube mentions	~0.737	Strongest single factor — models trained on transcripts
Branded web mentions	0.66–0.71	Volume of brand references across the web
Branded anchor text	0.511–0.628	How other sites describe you in links
Brand search volume	0.334–0.466	How many people Google your brand name
Domain Rating	0.266	Weak — the metric most SEO teams obsess over
Backlink count	0.10–0.218	Negligible — former king of SEO
Content volume (pages)	0.194	Barely registers — more pages ≠ more AI visibility

The bottom three — Domain Rating, backlink count, and content volume — show weak to negligible correlation with AI visibility. YouTube mentions and brand mentions dominate.

YouTube first. Brand mentions second. Backlinks... distant third.

Why YouTube?

Large language models are trained on massive YouTube transcript datasets. When you publish a video, its transcript becomes part of what AI systems know. Your spoken words become training data — and that feeds directly into AI responses.

How Each Platform Trusts Differently

Yext's analysis of 6.8 million AI citations revealed a critical insight: “Gemini trusts what your brand says. ChatGPT trusts what the internet agrees on. Perplexity trusts industry experts and customer reviews.”

ChatGPT

Volume Leader

7.92 citations per question
Wikipedia = 47.9% of citations
87% from Bing's top 10 when browsing
Only 12% match Google's first page
Mentions brands 3.2x more than it links to them
Top cited: Reddit, Wikipedia, Amazon, Forbes, Business Insider
SparkToro: less than 1-in-100 chance of the same brand list twice

Perplexity

Citation Leader

21.87 citations per question — 2.8x more than ChatGPT
Reddit leads at 46.7% of top-10 citations
Real-time search against 200+ billion URLs
~50% of citations from 2025 content
Repeats websites only 25.11% vs Google's 58.49%
40% more citations from high-authority domains

Google AI Overviews

The Incumbent

76.1% of cited URLs rank in Google's top 10
169 words avg with 7.2 links from ~4 unique domains
Content changes 70% for the same query
45.5% of citations get replaced upon regeneration
Semantic coverage 8.5/10+ is 4.2x more likely to appear
Multi-modal content shows 156% higher selection rates

Google AI Mode

The Consensus Engine

Highest correlation with branded authority signals
Only 13.7% citation overlap with AI Overviews
75% of sessions end without an external visit
Brand building is the primary optimization lever

The Freshness Imperative

AI systems don't just prefer fresh content — they filter for it. If your content hasn't been updated in 12 months, you're functionally invisible to most AI engines.

Freshness Signal	Data Point
AI Overview citations published in last 2 years	85%
Citations from 2025 content alone	44%
ChatGPT freshness bias vs. Google results	393–458 days newer
New Reddit content appearing in Perplexity citations	Within 7–14 days

Every key asset needs a “Last Updated” timestamp and a quarterly refresh schedule. If content hasn't been updated in 12 months, it's invisible.

Content Formats That Get Cited Most

Not all content is equally extractable. AI systems reward structured, scannable, answer-first, fact-dense content.

Format / Structure	Impact on Citations
Comparative listicles	32.5% of all AI citations
First 30% of page text	44.2% of all LLM citations
Self-contained sections (50–150 words)	2.3x more citations
HTML tables	47% higher AI citation rate
Bullet points and numbered lists	28–40% more likely to be cited
Long-form (2,000+ words)	3x more citations — but architecture matters

AI systems reward structured, scannable, answer-first, fact-dense content. Length alone doesn't win — how you structure that length does.

The Mental Model Shift

Everything traces back to this data. Nothing in this course is opinion.

Traditional SEO	AI SEO (GEO)
Optimize for Google's algorithm	Optimize for probabilistic citation across platforms
Backlinks are king	Brand mentions and YouTube are king
Domain Rating matters	Content extractability matters
Rank #1 for keywords	Get cited across multiple AI platforms
Traffic = success	Citation + brand mention = success
One algorithm to master	Multiple AI systems with different trust models
Content length matters	Content structure matters more
Keywords in content	Statistics and citations in content
Keyword stuffing works	Keyword stuffing performs −10% vs. baseline
Freshness is a ranking factor	Freshness is a hard filter — 85% of citations from last 2 years

What You Just Learned

The Princeton study proved keyword stuffing hurts (-10%) while statistics, citations, and expert quotes drive AI visibility (+22–115%)
The Ahrefs study proved YouTube mentions (0.737) and brand mentions (0.66–0.71) dominate, while Domain Rating (0.266) and backlinks (0.10–0.218) are nearly irrelevant
Every platform trusts differently: ChatGPT trusts internet consensus, Perplexity trusts experts and reviews, Gemini trusts your brand, AI Mode trusts consensus
The CITED framework maps directly to this research — the next five lessons show you exactly how to execute it

Lesson 1: The New Search Landscape Next: C — Clarity