Regression to the Mean

2026-03-23

A basketball player shoots 80% from the free throw line over her career. In last night's game, she went 2 for 10. Just 20%. The coach benches her. The announcer says she's "lost her touch." Fans wonder if something is wrong.

Then she goes 9 for 10 the next game. Everyone says the benching "worked" or she "bounced back." But what if nothing actually changed? What if she was the same 80% shooter the whole time, and small samples are simply noisy?

The Core Idea

Extreme results tend to be followed by less extreme results. Not because of any force pulling things back to average, but because extreme outcomes usually require unusual luck, and unusual luck usually doesn't repeat.

An 80% shooter who goes 2 for 10 probably didn't suddenly become a 20% shooter. She just hit the unlucky tail of her normal range. Next game, she'll probably be closer to 80%. She didn't "recover." That's just where her true ability lives, and where most samples will land.

See It: One Shooter, Many Games

Below is a simulated player with a true shooting ability you can set. Click "Shoot a game" to see individual game results, and watch how wildly they bounce around with small sample sizes.

True shooting %: 80%

Shots per game: 10

Press "Shoot a game" to start

True ability: 80%

Game-by-game results (each cell = one game):

Running average: --

The Danger: Mistaking Noise for Signal

We're wired to find causes for everything we see. When a player has an extreme game, we invent explanations: she's tired, she's distracted, the other team's defense is too good. When she "bounces back," we credit the coach's halftime speech or a change in strategy.

But if her true ability hasn't changed, the bounce-back isn't a comeback. It's the numbers settling back to where they usually live. That's a subtle but important distinction.

This is why regression to the mean fools us so badly:

Sports Illustrated jinx: Athletes featured on the cover often "decline" afterward. They made the cover because of an extreme peak. Regression was coming regardless.
Sophomore slumps: Rookies of the Year often have a "worse" second season. They won the award because of an unusually good first year.
Medical treatments: People seek treatment when symptoms are at their worst. They'd often improve anyway. The treatment gets the credit.
Punishment vs. praise: Daniel Kahneman found that flight instructors punished after bad landings "improved," while those praised after good landings "got worse." Instructors concluded punishment works better than praise. In reality, both groups were just regressing to their normal performance.

Sample Size Is Everything

The amount of noise depends directly on how many observations you have. Use the simulation below to see this in action. Generate thousands of 5-game stretches vs 50-game stretches and compare how wildly they vary.

True ability: 80%

5 shots per sample

Lowest

Average

Highest

→

50 shots per sample

Lowest

Average

Highest

Small sample size: 5

Large sample size: 50

Click a simulate button to generate samples and compare the spread.

The Takeaway

Regression to the mean isn't a force. Nothing "causes" results to move toward average. It's the mathematical reality that extreme outcomes require extreme luck, and luck doesn't persist.

A useful habit: before explaining why something went up or down, first ask how big is the sample? If it's small, the most likely explanation for an extreme result might just be randomness. And the most likely next result is a less extreme one. Not because anything changed, but because that's where probability tends to land.

Where This Shows Up

Once you internalize this, you start seeing misattributed regression everywhere:

A company has a record-breaking quarter and hires aggressively. The next quarter is "disappointing." Was it bad strategy, or was the record quarter the outlier?
You try a new productivity system during a great week. It "worked." But were you going to have a great week anyway?
A city installs speed cameras and accidents drop. Effective policy, or did they install cameras at the worst intersections, which were likely to improve regardless?

The transferable skill: before you credit the intervention, ask whether the outcome was likely to moderate on its own. This doesn't mean interventions never work. It means the bar for claiming they do is higher than "things got better afterward." Things that are extreme tend to get less extreme. That's the default, not the exception.