Survivorship Bias: The Wisdom of the Dead Planes
During World War II, the Statistical Research Group at Columbia University received a request from the US military. Bombers were returning from missions with bullet holes concentrated in certain areas — the fuselage, the wings. The military wanted to add armour. Where should it go? The intuitive answer: where the bullet holes are. The correct answer, delivered by mathematician Abraham Wald, was exactly the opposite. The planes that returned with holes in the fuselage and wings had survived. The planes that had been hit in the engines and fuel systems hadn't returned at all. The data available was data from survivors. The holes that mattered were the ones that weren't there.
The Logic of Missing Data
Survivorship bias occurs whenever we analyse a non-representative sample consisting only of subjects that passed some selection filter — and treat it as if it were representative of the whole population. The "survivors" are visible and measurable; the non-survivors have disappeared from the data. Any conclusions drawn from the survivors alone will be systematically skewed.
Wald's insight was brilliant because it made the invisible data explicit. He reasoned: if the planes that made it back showed bullet damage mostly in areas A, B, and C, then the planes that were shot down must have been hit primarily in other areas — because those areas are conspicuously absent from the damage patterns of the returnees. The absence of holes in the engines was evidence that engine hits were fatal. The data spoke through its own silence.
The same logic applies everywhere. When we see "what successful people have in common," we are looking at holes in the wings. We need to look at the missing planes — the people who had those same characteristics and failed.
The College Dropout Billionaire
The most culturally familiar form of survivorship bias is the "dropout billionaire" narrative. Bill Gates dropped out of Harvard. Mark Zuckerberg dropped out of Harvard. Steve Jobs dropped out of Reed College. Clearly dropping out of college is a path to extraordinary success — right?
The error is as stark as it gets. We notice these individuals because they achieved extreme visibility through extreme success. We do not notice — because they are invisible — the millions of people who also dropped out of college and whose outcomes ranged from modest to very poor. The dropout rate for college students is high; the billionaire rate among dropouts is near zero. Gates and Zuckerberg didn't succeed because they dropped out; they were able to drop out because they already had exceptional abilities, resources, and (in Gates' case) early access to computers that almost no other students had. The dropout was incidental to their success, not causal. But it's the memorable hook in the story, so it gets told and repeated.
This narrative actively harms people who use it to justify educational decisions. The availability heuristic makes Gates-style success stories disproportionately salient in memory, while the quiet failures of a thousand other dropouts are forgotten or never known. We confuse frequency with salience and generate badly distorted probabilities as a result.
Mutual Funds and Investment Returns
Survivorship bias is particularly severe in investment performance data, and particularly consequential given the sums of money involved. When investment databases compile historical mutual fund performance, they typically include only funds that are currently operating — not funds that closed or merged. But funds close for a reason: they generally performed poorly. By including only the survivors in historical performance data, databases systematically overstate average historical returns.
Studies by economists including Mark Carhart and colleagues have estimated that survivorship bias inflates reported mutual fund returns by 1–3 percentage points per year — a substantial distortion when compounded over decades. An investor looking at historical fund performance to inform future investment decisions is looking at a sample that has been filtered for success, from which the failures have silently disappeared. They are seeing bullet holes in the wings.
The same effect applies to quantitative trading strategies. A strategy that "back-tested" well over historical data may have been tested on a sample of surviving assets. Stocks that went bankrupt are often excluded from databases; real-estate markets that crashed are under-represented in long-run data. The strategy looks better in back-test than it will perform in reality, partly because the test data had its failures removed.
Startup Success Stories
The startup ecosystem is a survivorship bias engine. Business books about successful companies are, by definition, books about companies that survived long enough to be interesting. Jim Collins' famous Good to Great (2001) identified characteristics shared by companies that had made sustained transitions to exceptional performance. The book was influential and widely read. It was also methodologically critiqued by Jerker Denrell (2005) and others for exactly this problem: the sample consisted only of the "great" companies. The analysis could not tell whether the identified characteristics (e.g., disciplined people, technology accelerators, hedgehog concepts) were actually predictive of success, because no comparison was made to companies with those same characteristics that had not become "great."
More rigorous analysis by Philip Rosenzweig in The Halo Effect (2007) and by economists studying startup outcomes consistently finds that the factors associated with success in post-hoc narratives are far less predictive in prospective studies. A large fraction of what gets attributed to "smart strategy" or "good leadership" in business success stories is indistinguishable from luck — but the survivors' narratives erase that luck retrospectively, replacing it with skill.
Self-Help and Successful People's Habits
The survivorship bias in self-help literature is almost total. "Successful CEOs wake up at 5 AM and meditate" is a statement about people who are famous — i.e., highly visible survivors of competitive selection. Nobody surveyed the millions of people who wake up at 5 AM and meditate and did not become CEOs. The people who follow "success habits" from books about successful people are running an experiment with no control group, which is to say they are not running an experiment at all.
This doesn't mean that the habits are useless — but it does mean that observational data from survivors cannot tell us whether they are useful. We would need to know the distribution of those habits among people who tried and failed, not just among those who succeeded. That data is, by the nature of failure, largely absent from the record.
Science and Publication Bias
Academic research suffers from a related form of survivorship bias called publication bias. Studies that find a significant effect are far more likely to be published than studies that find no effect. The published literature is therefore a survivor population — the studies that cleared the selection filter of statistical significance. When a meta-analysis aggregates published studies, it is analysing surviving results, not the full distribution of actual evidence.
This is not a marginal distortion. Reviews of the psychology replication crisis have found that for many widely-cited effects, the original studies may have drawn disproportionate attention to lucky positive results in a sea of null results that were quietly filed away. The statistically significant effect exists in the published literature; the non-significant replications are in desk drawers. The published science was the wings; the desk drawers were the engines.
Avoiding the Bias
Correcting for survivorship bias requires the uncomfortable mental discipline of actively seeking out the missing data — the planes that didn't return, the funds that closed, the startups that failed, the studies that showed no effect:
- Ask what's not in your sample: Whenever you're drawing conclusions from a set of examples, ask how that set was constructed. Is it the full population, or only the visible/successful/surviving subset?
- Compare base rates: For any "successful people have X" claim, find the base rate of X among unsuccessful people. This is the base rate fallacy connection — without the denominator, the numerator tells us nothing about probability.
- Look for disconfirming cases: If a management strategy produced great results at Apple, ask about companies that used the same strategy and failed. They won't be in the bestselling book about Apple.
- Seek pre-registered or prospective data: Data collected before outcomes are known, from a defined population, cannot be filtered by survivorship after the fact.
Survivorship bias is in many ways the bias of a species that learns from stories. Stories are about things that happened — memorable, visible, dramatic outcomes. The null outcome, the invisible failure, the uneventful non-event, is not a story. But the non-stories are often where the truth lives.
Sources & Further Reading
- Wald, A. A Method of Estimating Plane Vulnerability Based on Damage of Survivors. Statistical Research Group, Columbia University (1943). Declassified, later republished.
- Carhart, M. M. "On Persistence in Mutual Fund Performance." Journal of Finance 52, no. 1 (1997): 57–82.
- Denrell, J. "Vicarious Learning, Undersampling of Failure, and the Myths of Management." Organization Science 14, no. 3 (2003): 227–243.
- Rosenzweig, P. The Halo Effect … and the Eight Other Business Delusions That Deceive Managers. Free Press (2007).
- Taleb, N. N. Fooled by Randomness. Random House (2001). Particularly chapter 8.
- Wikipedia: Survivorship bias