🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!
survivorship_bias_statistical
The statistical error of drawing conclusions from a dataset that has been filtered by a survival or success criterion, without accounting for the filtered-out cases. The surviving sample is systematically different from the full population, and conclusions drawn from it are biased.
Studying successful companies to find the 'secret of success' ignores the many failed companies that had the same characteristics. WWII aircraft armor analysis that initially focused on where returning planes were hit, ignoring that planes hit elsewhere did not return.
A personal finance blogger interviews 20 people who became millionaires by investing in cryptocurrency and concludes it is a reliable path to wealth. The thousands who lost their savings using the same strategy are never interviewed because they do not make compelling success stories.
A gym surveys its members in January to study the health benefits of regular exercise and finds overwhelmingly positive results. The survey misses all the people who signed up in previous Januaries, exercised briefly, and quit — the very people whose data would complicate the conclusions.
Binary (yes/no) questions an LLM must answer to identify this aspect:
Is a sample being analyzed that was selected by surviving some filtering process?
Type: binaryAre the non-survivors (filtered-out cases) missing from the analysis?
Type: binaryWould including the non-survivors change the conclusions?
Type: binaryIs the filtering process related to the outcome being studied?
Type: binaryThe statistical error of drawing conclusions from a dataset that has been filtered by a survival or success criterion, without accounting for the filtered-out cases. The surviving sample is systematically different from the full population, and conclusions drawn from it are biased.
Non-survivors are invisible by definition. The available data feels complete because you can see all the survivors, creating no obvious signal that data is missing.
Actively seek out the non-survivors. Ask: what happened to the cases that did not make it into this dataset? Construct the full denominator.
Business success analysis, mutual fund performance reports, medical treatment studies, and historical analysis.
Systematic difference between respondents and non-respondents distorting study results.
Occupational studies overestimate worker health because severely ill people exit the workforce.
Prevalence studies miss fatal or short-duration cases, distorting disease-exposure associations.
Systematic exclusion of certain participants from a study distorts results.
Reduced variability in a variable artificially weakens the observed correlation.
Multiple filtering or inclusion steps systematically alter the composition of a sample.
Use these tools to detect, analyze, or train this aspect.