🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!
p_hacking
P-hacking occurs when researchers repeatedly analyze data using different methods, variable selections, or subgroup divisions until a statistically significant p-value (typically below 0.05) is found. This exploitation of researcher degrees of freedom inflates false-positive rates far beyond the nominal significance level. The practice can be intentional or unconscious, driven by publication incentives that reward significant findings.
A pharmaceutical researcher tests a new supplement against 20 different health outcomes. One outcome (toenail growth rate) yields p < 0.05. The paper is published with the title 'Supplement X significantly improves toenail growth' without mentioning the 19 non-significant tests.
A marketing researcher collects data on a new ad campaign and finds no significant effect on overall sales. They then slice the data by age group, region, time of day, device type, and day of week until they find that women aged 35–44 in the Midwest who saw the ad on a Tuesday showed p = 0.04 — and report this as a key finding.
A nutrition scientist studies whether a diet intervention reduces body weight, cholesterol, and blood pressure. None reach significance. They then test 15 additional biomarkers and find one — fasting insulin — with p = 0.049, which becomes the headline result of the published paper.
Binary (yes/no) questions an LLM must answer to identify this aspect:
Were multiple statistical analyses or variable combinations tested?
Type: binaryAre only the significant results (p<0.05) reported?
Type: binaryIs the total number of tests conducted disclosed?
Type: binaryWere corrections for multiple comparisons applied?
Type: binaryP-hacking occurs when researchers repeatedly analyze data using different methods, variable selections, or subgroup divisions until a statistically significant p-value (typically below 0.05) is found. This exploitation of researcher degrees of freedom inflates false-positive rates far beyond the nominal significance level. The practice can be intentional or unconscious, driven by publication incentives that reward significant findings.
With a 5% significance threshold, testing 20 independent hypotheses gives roughly a 64% chance of at least one false positive. Audiences typically see only the reported result, not the full search process.
Demand pre-registration of hypotheses and analysis plans. Apply corrections for multiple comparisons such as Bonferroni or Benjamini-Hochberg, and ask how many tests were conducted in total.
P-hacking is rampant in social psychology and biomedical research, contributing to the replication crisis. Journals like PLOS ONE now require pre-registration to combat it.
Filtering out contradicting information, only accepting confirming data.
Filtering out contradicting information, only accepting confirming data.
Using information that was not available at the point in time being analyzed.
Presenting post-hoc hypotheses as if they were formulated before seeing the data.
Research funded by parties with financial interests tends to produce favorable results.
Use these tools to detect, analyze, or train this aspect.