Sieve Bias

Also Known As: Cascading selection bias Sequential filtering bias

Statistical Error ID: sieve_bias

Definition

Sieve bias occurs when data passes through multiple filtering or selection steps, each of which may introduce its own subtle bias. While any single filter might have a minor effect, the cumulative result of successive filtering can produce a final sample that is profoundly unrepresentative of the original population. The compounding nature of sequential selection makes the total bias much larger and harder to predict than any individual step would suggest.

Examples

A clinical study starts with 10,000 patients, then restricts to those who completed intake forms (excluding the sickest), then to those with follow-up data (excluding dropouts who experienced side effects), then to those with complete lab results (excluding the poorest). The final 2,000 patients are healthier, wealthier, and more compliant than the original population.

A tech company surveys employees about workplace satisfaction, but only workers with a company email account are invited, then only those who open the HR newsletter see the survey link, then only those who feel strongly enough bother to respond. Each filter quietly removes a different type of employee — contractors, disengaged staff, and those with mild opinions — leaving a final sample that bears little resemblance to the actual workforce.

An economics study on the returns to education uses administrative records that first exclude anyone without a social security number, then drop records with incomplete wage data, then remove individuals who changed jobs more than twice. Immigrants, gig workers, and the most economically mobile people disappear through successive cuts, and the estimated wage premium for a college degree reflects only a narrow, stable slice of the labor market.

Verification Steps

Verification Steps

Binary yes/no questions that an AI must answer to detect a reasoning pattern in a text.

Each of the 452 aspects has verification steps — simple yes/no questions designed to systematically detect whether a pattern appears in a text. For ad hominem: "Does the argument attack a person rather than their claim?" For false dichotomy: "Are only two options presented when more exist?" This ensures consistent, reproducible analysis.

View in glossary →

Binary (yes/no) questions an LLM must answer to identify this aspect:

1

Has the data been filtered through multiple sequential selection criteria?
Type: binary
2

Could each filtering step disproportionately remove certain types of observations?
Type: binary
3

Is the remaining sample systematically different from the original population after all filters are applied?
Type: binary
4

Has the cumulative effect of all filtering steps on sample composition been assessed?
Type: binary

Description

Why It Works

Each filtering criterion seems reasonable in isolation, and researchers may not track how the sample composition changes across all steps. The combined effect of many small biases is non-obvious and can radically alter who remains in the study without anyone noticing the cumulative distortion.

How to Counter

Document the sample size and composition at each filtering step. Create flow diagrams showing attrition. Compare characteristics of included and excluded participants at each stage. Use multiple imputation or inverse probability weighting to account for systematic dropouts.

Also Known As

Cascading selection bias Sequential filtering bias

Real-World Context

Common in clinical trials with strict inclusion criteria, data science pipelines with multiple cleaning steps, hiring processes with sequential screening rounds, and systematic reviews with multi-stage study selection.

Related Aspects

Self-Selection Bias Survivorship Bias (Statistical) Non-Response Bias Collider Bias

Related Aspects

→ correlates with

Survivorship Bias (Statistical)

The statistical error of drawing conclusions from a dataset that has been filtered by a survival or success criterion, without accounting for the filtered-out cases. The surviving sample is systematically different from the full population, and conclusions drawn from it are biased.

→ correlates with

Non-Response Bias

Systematic difference between respondents and non-respondents distorting study results.

→ correlates with

Collider Bias

A statistical error that occurs when conditioning on a variable that is causally affected by two other variables creates a spurious association between those two variables. In a causal diagram, a collider is a variable where two causal arrows converge, and conditioning on it opens a non-causal path.

Hierarchical Context

→ is a Statistical Errors

Try it in action

Use these tools to detect, analyze, or train this aspect.

🔍 Text Analyzer

Scan a text for this pattern

⚗️ Argument Lab

Analyze an argument step by step

🎓 Fallacy Trainer

Quiz yourself on this aspect