Lesson 2: How AI Engines Actually Choose Sources
The Foundation
The “how it works” foundation. The Princeton study, the Ahrefs inversion, and platform-by-platform trust models. The research that separates real understanding from cargo-cult tactics.
Everything in this lesson is backed by peer-reviewed research and large-scale empirical studies. No opinions — just data.
The Princeton Study That Changed Everything
Princeton and Georgia Tech's GEO paper, published at ACM KDD 2024, tested 9 optimization methods across 10,000 queries. The results upend decades of SEO assumptions.
| Optimization Method | Visibility Impact | Verdict |
|---|---|---|
| Citing credible sources | +115.1% | Highest impact — for sites ranked 5th in SERPs |
| Adding statistics | +22–41% | Consistent gain across all site ranks |
| Including expert quotations | +22–37% | Named experts with credentials outperform |
| Precise technical terminology | +28% | Specificity beats generality |
| Keyword stuffing | -10% | Worse than doing nothing |
Keyword stuffing — the backbone of old-school SEO — performs 10% worse than baseline in AI contexts.
Lower-ranked sites benefit disproportionately more from GEO optimization. If you're not already dominating SERPs, this data is especially relevant.
Tactical Implication
Stop optimizing for keyword density. Start optimizing for citation density, statistical evidence, and expert attribution.
The Ahrefs Study That Inverted SEO Wisdom
December 2025. 75,000 brands analyzed across ChatGPT, Google AI Mode, and AI Overviews. The correlations shatter conventional SEO priorities.
| Signal | Correlation | Implication |
|---|---|---|
| YouTube mentions | ~0.737 | Strongest single factor — models trained on transcripts |
| Branded web mentions | 0.66–0.71 | Volume of brand references across the web |
| Branded anchor text | 0.511–0.628 | How other sites describe you in links |
| Brand search volume | 0.334–0.466 | How many people Google your brand name |
| Domain Rating | 0.266 | Weak — the metric most SEO teams obsess over |
| Backlink count | 0.10–0.218 | Negligible — former king of SEO |
| Content volume (pages) | 0.194 | Barely registers — more pages ≠ more AI visibility |
The bottom three — Domain Rating, backlink count, and content volume — show weak to negligible correlation with AI visibility. YouTube mentions and brand mentions dominate.
YouTube first. Brand mentions second. Backlinks... distant third.
Why YouTube?
Large language models are trained on massive YouTube transcript datasets. When you publish a video, its transcript becomes part of what AI systems know. Your spoken words become training data — and that feeds directly into AI responses.
How Each Platform Trusts Differently
Yext's analysis of 6.8 million AI citations revealed a critical insight: “Gemini trusts what your brand says. ChatGPT trusts what the internet agrees on. Perplexity trusts industry experts and customer reviews.”
ChatGPT
Volume Leader- 7.92 citations per question
- Wikipedia = 47.9% of citations
- 87% from Bing's top 10 when browsing
- Only 12% match Google's first page
- Mentions brands 3.2x more than it links to them
- Top cited: Reddit, Wikipedia, Amazon, Forbes, Business Insider
- SparkToro: less than 1-in-100 chance of the same brand list twice
Perplexity
Citation Leader- 21.87 citations per question — 2.8x more than ChatGPT
- Reddit leads at 46.7% of top-10 citations
- Real-time search against 200+ billion URLs
- ~50% of citations from 2025 content
- Repeats websites only 25.11% vs Google's 58.49%
- 40% more citations from high-authority domains
Google AI Overviews
The Incumbent- 76.1% of cited URLs rank in Google's top 10
- 169 words avg with 7.2 links from ~4 unique domains
- Content changes 70% for the same query
- 45.5% of citations get replaced upon regeneration
- Semantic coverage 8.5/10+ is 4.2x more likely to appear
- Multi-modal content shows 156% higher selection rates
Google AI Mode
The Consensus Engine- Highest correlation with branded authority signals
- Only 13.7% citation overlap with AI Overviews
- 75% of sessions end without an external visit
- Brand building is the primary optimization lever
The Freshness Imperative
AI systems don't just prefer fresh content — they filter for it. If your content hasn't been updated in 12 months, you're functionally invisible to most AI engines.
| Freshness Signal | Data Point |
|---|---|
| AI Overview citations published in last 2 years | 85% |
| Citations from 2025 content alone | 44% |
| ChatGPT freshness bias vs. Google results | 393–458 days newer |
| New Reddit content appearing in Perplexity citations | Within 7–14 days |
Every key asset needs a “Last Updated” timestamp and a quarterly refresh schedule. If content hasn't been updated in 12 months, it's invisible.
Content Formats That Get Cited Most
Not all content is equally extractable. AI systems reward structured, scannable, answer-first, fact-dense content.
| Format / Structure | Impact on Citations |
|---|---|
| Comparative listicles | 32.5% of all AI citations |
| First 30% of page text | 44.2% of all LLM citations |
| Self-contained sections (50–150 words) | 2.3x more citations |
| HTML tables | 47% higher AI citation rate |
| Bullet points and numbered lists | 28–40% more likely to be cited |
| Long-form (2,000+ words) | 3x more citations — but architecture matters |
AI systems reward structured, scannable, answer-first, fact-dense content. Length alone doesn't win — how you structure that length does.
The Mental Model Shift
Everything traces back to this data. Nothing in this course is opinion.
| Traditional SEO | AI SEO (GEO) |
|---|---|
| Optimize for Google's algorithm | Optimize for probabilistic citation across platforms |
| Backlinks are king | Brand mentions and YouTube are king |
| Domain Rating matters | Content extractability matters |
| Rank #1 for keywords | Get cited across multiple AI platforms |
| Traffic = success | Citation + brand mention = success |
| One algorithm to master | Multiple AI systems with different trust models |
| Content length matters | Content structure matters more |
| Keywords in content | Statistics and citations in content |
| Keyword stuffing works | Keyword stuffing performs −10% vs. baseline |
| Freshness is a ranking factor | Freshness is a hard filter — 85% of citations from last 2 years |
What You Just Learned
- The Princeton study proved keyword stuffing hurts (-10%) while statistics, citations, and expert quotes drive AI visibility (+22–115%)
- The Ahrefs study proved YouTube mentions (0.737) and brand mentions (0.66–0.71) dominate, while Domain Rating (0.266) and backlinks (0.10–0.218) are nearly irrelevant
- Every platform trusts differently: ChatGPT trusts internet consensus, Perplexity trusts experts and reviews, Gemini trusts your brand, AI Mode trusts consensus
- The CITED framework maps directly to this research — the next five lessons show you exactly how to execute it