Simpson's Paradox
A hospital has two surgeons. Surgeon A has a better success rate with easy surgeries. Surgeon A also has a better success rate with hard surgeries. So Surgeon A must be the better surgeon, right?
Not necessarily. When you combine the numbers, Surgeon B can have a better overall success rate, even though A is better in every category. This is a real statistical phenomenon called Simpson's Paradox, and it has changed the outcomes of lawsuits, medical studies, and policy decisions.
The Setup
Surgeon A is the hospital's best. She takes the hard cases, the ones other surgeons refer away. Surgeon B plays it safe and mostly takes the routine cases. Here are their actual numbers:
Easy Surgeries
Hard Surgeries
Combined Results
How Is This Possible?
The trick is in the caseloads. Surgeon A did 100 hard surgeries (where even a great surgeon only succeeds 30% of the time) and only 10 easy ones. Surgeon B did 100 easy surgeries (where even a mediocre surgeon succeeds 80% of the time) and only 10 hard ones.
When you combine the numbers, Surgeon A's overall gets dragged down by her mountain of hard cases. Surgeon B's overall gets propped up by her mountain of easy cases. The combined statistic hides the fact that they were playing completely different games.
The lurking variable, case difficulty, is doing all the work. Remove it, and the data lies to you.
See It With Every Patient
Each dot below is a patient. The lopsided caseloads make the paradox visually obvious. Look at how many more hard cases (low success rate) Surgeon A takes on.
Easy Surgeries
Hard Surgeries
Combined Results
Build Your Own
Drag the sliders to set each surgeon's caseload and success rates. The paradox appears when one surgeon handles far more of the hard cases. Watch the overall winner flip even though the sub-group winners stay the same.
Why This Matters
What makes this unsettling is how natural it is to miss. Aggregated data can tell the opposite story from disaggregated data. It's worth developing the habit of asking: what subgroups might be hidden inside this number, and are they evenly represented?
Real-World Examples
- UC Berkeley admissions (1973): Overall, it looked like the university discriminated against women. But department by department, women were admitted at equal or higher rates. Women simply applied more to competitive departments with low acceptance rates.
- Kidney stone treatments: Treatment A had a higher success rate for small stones AND large stones, but Treatment B had a higher overall rate, because Treatment B was disproportionately used on easy (small stone) cases.
- Baseball batting averages: A player can have a higher batting average than another in every individual season, yet a lower career average overall.
Once you see Simpson's Paradox, you start noticing opportunities for it everywhere. It's a reminder to always ask: what subgroups might be hiding underneath this summary statistic?
The Habit Worth Building
Whenever you're comparing two things using a single combined number (success rates, averages, conversion rates, batting averages, test scores) ask: are the groups being compared actually doing the same thing in the same proportions?
If a new marketing campaign has a higher overall conversion rate, check whether it was just tested on easier audiences. If one school district outperforms another on test scores, check whether they serve the same student demographics. If one doctor has better outcomes, check what kinds of cases they take on.
The transferable insight: a single number summarizing a complex situation is always a weighted average of hidden sub-stories. When the weights are unequal, the summary can point in the opposite direction from every sub-story. The cure is the habit of asking "what's underneath?", not blanket suspicion of all data.