P-Hacking (Data Dredging)

Also Known As: data dredging significance chasing researcher degrees of freedom exploitation

Statistical Error ID: p_hacking

Definition

P-hacking occurs when researchers repeatedly analyze data using different methods, variable selections, or subgroup divisions until a statistically significant p-value (typically below 0.05) is found. This exploitation of researcher degrees of freedom inflates false-positive rates far beyond the nominal significance level. The practice can be intentional or unconscious, driven by publication incentives that reward significant findings.

Examples

A pharmaceutical researcher tests a new supplement against 20 different health outcomes. One outcome (toenail growth rate) yields p < 0.05. The paper is published with the title 'Supplement X significantly improves toenail growth' without mentioning the 19 non-significant tests.

A marketing researcher collects data on a new ad campaign and finds no significant effect on overall sales. They then slice the data by age group, region, time of day, device type, and day of week until they find that women aged 35–44 in the Midwest who saw the ad on a Tuesday showed p = 0.04 — and report this as a key finding.

A nutrition scientist studies whether a diet intervention reduces body weight, cholesterol, and blood pressure. None reach significance. They then test 15 additional biomarkers and find one — fasting insulin — with p = 0.049, which becomes the headline result of the published paper.

Verification Steps

Verification Steps

Binary yes/no questions that an AI must answer to detect a reasoning pattern in a text.

Each of the 452 aspects has verification steps — simple yes/no questions designed to systematically detect whether a pattern appears in a text. For ad hominem: "Does the argument attack a person rather than their claim?" For false dichotomy: "Are only two options presented when more exist?" This ensures consistent, reproducible analysis.

View in glossary →

Binary (yes/no) questions an LLM must answer to identify this aspect:

1

Were multiple statistical analyses or variable combinations tested?
Type: binary
2

Are only the significant results (p<0.05) reported?
Type: binary
3

Is the total number of tests conducted disclosed?
Type: binary
4

Were corrections for multiple comparisons applied?
Type: binary

Description

Why It Works

With a 5% significance threshold, testing 20 independent hypotheses gives roughly a 64% chance of at least one false positive. Audiences typically see only the reported result, not the full search process.

How to Counter

Demand pre-registration of hypotheses and analysis plans. Apply corrections for multiple comparisons such as Bonferroni or Benjamini-Hochberg, and ask how many tests were conducted in total.

Also Known As

data dredging significance chasing researcher degrees of freedom exploitation

Real-World Context

P-hacking is rampant in social psychology and biomedical research, contributing to the replication crisis. Journals like PLOS ONE now require pre-registration to combat it.