🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!
freedmans_paradox
Freedman's paradox demonstrates that when many predictors are screened relative to observations, purely random variables will appear statistically significant and produce seemingly good-fitting models by chance. If 50 predictors are regressed against an outcome with n=50 observations, approximately 5 predictors will appear significant at p<0.05 purely by chance, and a model containing only those selected predictors will appear well-specified.
A researcher with 50 patients and 50 candidate biomarkers runs 50 univariate regressions. By chance alone, approximately 2-3 biomarkers will appear significant at p<0.05. The researcher then builds a multivariate model with these 'significant' predictors, which appears to fit well but will predict no better than chance on new patients.
A marketing analyst has data on 80 customers and tests 100 demographic and behavioral variables to predict purchase likelihood. Purely by chance, about 5 variables appear significant at p<0.05. The analyst proudly presents a 'predictive model' built on these variables, not realizing the model would perform no better than random chance on new customers.
A neuroscience lab scans 20 participants and extracts 200 brain region measurements, then correlates all of them with a personality score. Several regions show p<0.05 correlations. The team publishes a paper on these 'neural correlates of personality,' unaware that the significant findings are entirely consistent with what random noise would produce given so many simultaneous tests.
Binary (yes/no) questions an LLM must answer to identify this aspect:
Are many predictors being screened relative to the number of observations?
Type: binaryWere predictors selected for inclusion in a final model based on their significance in preliminary analysis of the same dataset?
Type: binaryDoes the model use the same data for variable selection and final estimation?
Type: binaryIs the final model's performance validated on an independent holdout dataset?
Type: binaryFreedman's paradox demonstrates that when many predictors are screened relative to observations, purely random variables will appear statistically significant and produce seemingly good-fitting models by chance. If 50 predictors are regressed against an outcome with n=50 observations, approximately 5 predictors will appear significant at p<0.05 purely by chance, and a model containing only those selected predictors will appear well-specified.
When the number of tests approaches or exceeds the number of observations, the false discovery rate becomes enormous. Stepwise selection, univariate screening, and similar procedures compound this by reusing data for both selection and estimation.
Use pre-specified predictor sets. Apply corrections for multiple comparisons. Validate models on independent data. Use regularization methods (LASSO, ridge). Report the number of predictors examined before final model selection.
Genomic association studies with thousands of candidate genes and hundreds of subjects were plagued by Freedman's paradox until the field adopted genome-wide significance thresholds and replication requirements.
Use these tools to detect, analyze, or train this aspect.