Simpson's Paradox

2026-03-23

A hospital has two surgeons. Surgeon A has a better success rate with easy surgeries. Surgeon A also has a better success rate with hard surgeries. So Surgeon A must be the better surgeon, right?

Not necessarily. When you combine the numbers, Surgeon B can have a better overall success rate, even though A is better in every category. This is a real statistical phenomenon called Simpson's Paradox, and it has changed the outcomes of lawsuits, medical studies, and policy decisions.

The Setup

Surgeon A is the hospital's best. She takes the hard cases, the ones other surgeons refer away. Surgeon B plays it safe and mostly takes the routine cases. Here are their actual numbers:

Easy Surgeries

Surgeon A: 9 of 10 successful 90%

Surgeon B: 80 of 100 successful 80%

A wins (90% vs 80%)

A did 10 easy • B did 100 easy

Hard Surgeries

Surgeon A: 30 of 100 successful 30%

Surgeon B: 2 of 10 successful 20%

A wins (30% vs 20%)

A did 100 hard • B did 10 hard

Surgeon A wins both categories. She's better with easy cases AND hard cases. Now combine the numbers:

PARADOX! Surgeon A wins both categories, but loses overall:

Combined Results

35.5%

Surgeon A: 39 / 110

74.5%

Surgeon B: 82 / 110

Surgeon B wins overall, by a huge margin

Surgeon B looks twice as good as Surgeon A in the combined stats. But Surgeon A is actually the better surgeon in every measurable category. If you needed surgery, you'd want Surgeon A.

How Is This Possible?

The trick is in the caseloads. Surgeon A did 100 hard surgeries (where even a great surgeon only succeeds 30% of the time) and only 10 easy ones. Surgeon B did 100 easy surgeries (where even a mediocre surgeon succeeds 80% of the time) and only 10 hard ones.

When you combine the numbers, Surgeon A's overall gets dragged down by her mountain of hard cases. Surgeon B's overall gets propped up by her mountain of easy cases. The combined statistic hides the fact that they were playing completely different games.

The lurking variable, case difficulty, is doing all the work. Remove it, and the data lies to you.

See It With Every Patient

Each dot below is a patient. The lopsided caseloads make the paradox visually obvious. Look at how many more hard cases (low success rate) Surgeon A takes on.

Easy Surgeries

A success A fail B success B fail

Surgeon A:

Surgeon B:

Hard Surgeries

A success A fail B success B fail

Surgeon A:

Surgeon B:

Combined Results

Build Your Own

Drag the sliders to set each surgeon's caseload and success rates. The paradox appears when one surgeon handles far more of the hard cases. Watch the overall winner flip even though the sub-group winners stay the same.

Easy Surgeries

A's patients: 10

A's success rate: 90%

B's patients: 100

B's success rate: 80%

Hard Surgeries

A's patients: 100

A's success rate: 30%

B's patients: 10

B's success rate: 20%

Why This Matters

What makes this unsettling is how natural it is to miss. Aggregated data can tell the opposite story from disaggregated data. It's worth developing the habit of asking: what subgroups might be hidden inside this number, and are they evenly represented?

Real-World Examples

UC Berkeley admissions (1973): Overall, it looked like the university discriminated against women. But department by department, women were admitted at equal or higher rates. Women simply applied more to competitive departments with low acceptance rates.
Kidney stone treatments: Treatment A had a higher success rate for small stones AND large stones, but Treatment B had a higher overall rate, because Treatment B was disproportionately used on easy (small stone) cases.
Baseball batting averages: A player can have a higher batting average than another in every individual season, yet a lower career average overall.

Once you see Simpson's Paradox, you start noticing opportunities for it everywhere. It's a reminder to always ask: what subgroups might be hiding underneath this summary statistic?

The Habit Worth Building

Whenever you're comparing two things using a single combined number (success rates, averages, conversion rates, batting averages, test scores) ask: are the groups being compared actually doing the same thing in the same proportions?

If a new marketing campaign has a higher overall conversion rate, check whether it was just tested on easier audiences. If one school district outperforms another on test scores, check whether they serve the same student demographics. If one doctor has better outcomes, check what kinds of cases they take on.

The transferable insight: a single number summarizing a complex situation is always a weighted average of hidden sub-stories. When the weights are unequal, the summary can point in the opposite direction from every sub-story. The cure is the habit of asking "what's underneath?", not blanket suspicion of all data.