Apps

🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!

Data Dredging (Fishing Expedition)

Also Known As: fishing expedition HARKing (Hypothesizing After Results are Known) post-hoc analysis disguised as a priori
Statistical Error ID: data_dredging

Definition

Data dredging is the practice of exhaustively searching through data for any statistically significant patterns without a prior hypothesis, then presenting discovered patterns as if they were predicted in advance. While exploratory data analysis is legitimate when labeled as such, data dredging crosses the line by disguising exploratory findings as confirmatory results. The sheer number of possible correlations in any dataset virtually guarantees that some will pass significance thresholds by chance alone.

Examples

A researcher has access to a large health database with 500 variables. After testing all 124,750 possible pairwise correlations, they find that ice cream consumption is significantly correlated with drowning deaths. They publish this as a confirmed finding without mentioning it was one of 125,000 tests or that both variables are driven by warm weather.

A marketing team records 300 customer attributes and tests all combinations against purchase behavior. They announce a breakthrough finding: customers who prefer blue packaging and own a pet are 40% more likely to buy — a result almost certainly due to chance, with no theoretical basis and no replication attempt.

A political scientist downloads decades of county-level data with hundreds of economic and social indicators, then runs thousands of regressions until finding that per-capita bowling alley count significantly predicts voter turnout. The finding is published as a novel discovery without acknowledging the exhaustive search that produced it.

Verification Steps
Verification Steps
Binary yes/no questions that an AI must answer to detect a reasoning pattern in a text.
Each of the 452 aspects has verification steps — simple yes/no questions designed to systematically detect whether a pattern appears in a text. For ad hominem: "Does the argument attack a person rather than their claim?" For false dichotomy: "Are only two options presented when more exist?" This ensures consistent, reproducible analysis.

Binary (yes/no) questions an LLM must answer to identify this aspect:

  1. 1

    Were the hypotheses formulated before or after examining the data?

    Type: binary
  2. 2

    Were many comparisons or subgroup analyses performed?

    Type: binary
  3. 3

    Are exploratory findings being presented as if they were hypothesis-driven?

    Type: binary
  4. 4

    Have the findings been replicated in an independent dataset?

    Type: binary
Deep Dive
The expandable detail section on each aspect page with examples, psychology, and counter-strategies.
The Deep Dive section provides in-depth information about each aspect: a real-world example showing the pattern in action, an explanation of why it works psychologically, practical advice on how to counter it, alternative names, and links to related aspects.

Hierarchical Context