STACK3 of 5
APart 3 · 2 min read

Assess AI Capabilities

Use the AI Capability Ladder to evaluate what AI does well, inconsistently, poorly, or not at all — so you can spot overselling before signing a contract.

Not all AI capabilities are equal. Some tasks AI handles brilliantly. Others it fumbles. And some it simply cannot do — no matter what a vendor claims. Here's a five-level framework for evaluating any AI claim.

LEVEL 5: AI DOES THIS BETTER THAN HUMANS

Data pattern recognition at scale. Image classification. Transcription and translation. Generating content variations.

Examples that work: Analyzing 50,000 support tickets to find recurring themes. A/B testing 100 ad variations simultaneously. Transcribing and summarizing sales calls.

What it still can't do: Understand WHY patterns exist. Make strategic decisions about what to do with findings.

LEVEL 4: AI DOES THIS WELL (WITH HUMAN OVERSIGHT)

Drafting content from structured prompts. Summarizing long documents. Generating initial creative concepts. Answering customer questions from a knowledge base.

Examples that work: Blog post first drafts where a human edits for brand voice. Email subject line variations where a human picks winners. FAQ chatbots where a human reviews monthly.

What it still can't do: Understand brand voice without extensive training examples. Know what NOT to say. Adapt to entirely new contexts without guidance.

LEVEL 3: AI DOES THIS INCONSISTENTLY

Long-form strategic content. Original creative concepts. Predicting behavior in new markets. Maintaining consistent brand voice across channels.

Examples that are hit-or-miss: Campaign strategy documents. Thought leadership articles (often generic). Trend predictions (sometimes right, sometimes wrong).

Why: Results depend heavily on prompt quality, require significant editing, and show wildly variable quality.

LEVEL 2: AI DOES THIS POORLY

Understanding company politics and culture. Reading between the lines. Knowing when to break the brand rules. Generating truly novel, original ideas.

Examples that fail: Navigating executive disagreements on messaging. Pitching controversial campaigns internally. Deciding which trends to ignore.

LEVEL 1: AI CANNOT DO THIS AT ALL

Building genuine relationships. Negotiating budgets and timelines. Defending creative decisions in a room. Knowing when to kill a project.

Examples: Convincing a CEO to take a brand risk. Managing a team through a reorganization. Presenting to the board and reading the room.

TESTING VENDOR CLAIMS WITH THE LADDER

Claim: "Our AI fully automates your content calendar — no human input needed." Writing content = Level 4 (needs oversight). Strategy = Level 2–3 (AI does poorly). Verdict: Overselling.

Claim: "AI analyzes customer data and tells you which segments to target." Pattern recognition = Level 5 (great). Strategic prioritization = Level 2 (poor). Verdict: Half true. AI finds patterns; you decide what they mean.

Claim: "Our AI writes emails that match your brand voice perfectly." Drafting = Level 4 (needs oversight). Brand voice consistency = Level 3 (inconsistent). Verdict: Requires training data, ongoing refinement, and human editing.

KEY TAKEAWAY

When a vendor claims AI does something, plot it on the Capability Ladder. If they're claiming Level 4–5 performance for a task that realistically sits at Level 2–3, they're overselling. Use this framework to push back with confidence.