Apps

🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!

Freedman's Paradox

Also Known As: Preselection bias Variable selection bias Data snooping
Aspect ID: freedmans_paradox

Definition

Freedman's paradox demonstrates that when many predictors are screened relative to observations, purely random variables will appear statistically significant and produce seemingly good-fitting models by chance. If 50 predictors are regressed against an outcome with n=50 observations, approximately 5 predictors will appear significant at p<0.05 purely by chance, and a model containing only those selected predictors will appear well-specified.

Examples

A researcher with 50 patients and 50 candidate biomarkers runs 50 univariate regressions. By chance alone, approximately 2-3 biomarkers will appear significant at p<0.05. The researcher then builds a multivariate model with these 'significant' predictors, which appears to fit well but will predict no better than chance on new patients.

A marketing analyst has data on 80 customers and tests 100 demographic and behavioral variables to predict purchase likelihood. Purely by chance, about 5 variables appear significant at p<0.05. The analyst proudly presents a 'predictive model' built on these variables, not realizing the model would perform no better than random chance on new customers.

A neuroscience lab scans 20 participants and extracts 200 brain region measurements, then correlates all of them with a personality score. Several regions show p<0.05 correlations. The team publishes a paper on these 'neural correlates of personality,' unaware that the significant findings are entirely consistent with what random noise would produce given so many simultaneous tests.

Verification Steps
Verification Steps
Binary yes/no questions that an AI must answer to detect a reasoning pattern in a text.
Each of the 452 aspects has verification steps — simple yes/no questions designed to systematically detect whether a pattern appears in a text. For ad hominem: "Does the argument attack a person rather than their claim?" For false dichotomy: "Are only two options presented when more exist?" This ensures consistent, reproducible analysis.

Binary (yes/no) questions an LLM must answer to identify this aspect:

  1. 1

    Are many predictors being screened relative to the number of observations?

    Type: binary
  2. 2

    Were predictors selected for inclusion in a final model based on their significance in preliminary analysis of the same dataset?

    Type: binary
  3. 3

    Does the model use the same data for variable selection and final estimation?

    Type: binary
  4. 4

    Is the final model's performance validated on an independent holdout dataset?

    Type: binary
Deep Dive
The expandable detail section on each aspect page with examples, psychology, and counter-strategies.
The Deep Dive section provides in-depth information about each aspect: a real-world example showing the pattern in action, an explanation of why it works psychologically, practical advice on how to counter it, alternative names, and links to related aspects.