Assess AI Capabilities
Use the AI Capability Ladder to evaluate what AI does well, inconsistently, poorly, or not at all — so you can spot overselling before signing a contract.
Not all AI capabilities are equal. Some tasks AI handles brilliantly. Others it fumbles. And some it simply cannot do — no matter what a vendor claims. Here's a five-level framework for evaluating any AI claim.
LEVEL 5: AI DOES THIS BETTER THAN HUMANS
Data pattern recognition at scale. Image classification. Transcription and translation. Generating content variations.
Examples that work: Analyzing 50,000 support tickets to find recurring themes. A/B testing 100 ad variations simultaneously. Transcribing and summarizing sales calls.
What it still can't do: Understand WHY patterns exist. Make strategic decisions about what to do with findings.
LEVEL 4: AI DOES THIS WELL (WITH HUMAN OVERSIGHT)
Drafting content from structured prompts. Summarizing long documents. Generating initial creative concepts. Answering customer questions from a knowledge base.
Examples that work: Blog post first drafts where a human edits for brand voice. Email subject line variations where a human picks winners. FAQ chatbots where a human reviews monthly.
What it still can't do: Understand brand voice without extensive training examples. Know what NOT to say. Adapt to entirely new contexts without guidance.
LEVEL 3: AI DOES THIS INCONSISTENTLY
Long-form strategic content. Original creative concepts. Predicting behavior in new markets. Maintaining consistent brand voice across channels.
Examples that are hit-or-miss: Campaign strategy documents. Thought leadership articles (often generic). Trend predictions (sometimes right, sometimes wrong).
Why: Results depend heavily on prompt quality, require significant editing, and show wildly variable quality.
LEVEL 2: AI DOES THIS POORLY
Understanding company politics and culture. Reading between the lines. Knowing when to break the brand rules. Generating truly novel, original ideas.
Examples that fail: Navigating executive disagreements on messaging. Pitching controversial campaigns internally. Deciding which trends to ignore.
LEVEL 1: AI CANNOT DO THIS AT ALL
Building genuine relationships. Negotiating budgets and timelines. Defending creative decisions in a room. Knowing when to kill a project.
Examples: Convincing a CEO to take a brand risk. Managing a team through a reorganization. Presenting to the board and reading the room.
TESTING VENDOR CLAIMS WITH THE LADDER
Claim: "Our AI fully automates your content calendar — no human input needed." Writing content = Level 4 (needs oversight). Strategy = Level 2–3 (AI does poorly). Verdict: Overselling.
Claim: "AI analyzes customer data and tells you which segments to target." Pattern recognition = Level 5 (great). Strategic prioritization = Level 2 (poor). Verdict: Half true. AI finds patterns; you decide what they mean.
Claim: "Our AI writes emails that match your brand voice perfectly." Drafting = Level 4 (needs oversight). Brand voice consistency = Level 3 (inconsistent). Verdict: Requires training data, ongoing refinement, and human editing.
KEY TAKEAWAY
When a vendor claims AI does something, plot it on the Capability Ladder. If they're claiming Level 4–5 performance for a task that realistically sits at Level 2–3, they're overselling. Use this framework to push back with confidence.
