Apps

🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!

← Back to Library
blog.category.aspects Mar 29, 2026 2 min read

Data Dredging (Fishing Expedition) — When Logic Wears a Disguise

Data dredging is the practice of exhaustively searching through data for any statistically significant patterns without a prior hypothesis, then presenting discovered patterns as if they were predicted in advance. While exploratory data analysis is legitimate when labeled as such, data dredging crosses the line by disguising exploratory findings as confirmatory results. The sheer number of possible correlations in any dataset virtually guarantees that some will pass significance thresholds by chance alone.

Also known as: fishing expedition, HARKing (Hypothesizing After Results are Known), post-hoc analysis disguised as a priori

How It Works

The published result looks identical to a hypothesis-driven finding: clean data, clear statistical test, significant p-value. The reader has no way to know how many tests preceded the reported one.

A Classic Example

A researcher has access to a large health database with 500 variables. After testing all 124,750 possible pairwise correlations, they find that ice cream consumption is significantly correlated with drowning deaths. They publish this as a confirmed finding without mentioning it was one of 125,000 tests or that both variables are driven by warm weather.

More Examples

A marketing team records 300 customer attributes and tests all combinations against purchase behavior. They announce a breakthrough finding: customers who prefer blue packaging and own a pet are 40% more likely to buy — a result almost certainly due to chance, with no theoretical basis and no replication attempt.
A political scientist downloads decades of county-level data with hundreds of economic and social indicators, then runs thousands of regressions until finding that per-capita bowling alley count significantly predicts voter turnout. The finding is published as a novel discovery without acknowledging the exhaustive search that produced it.

Where You See This in the Wild

Data dredging is facilitated by big data and machine learning, where massive datasets make spurious correlations inevitable. The website 'Spurious Correlations' by Tyler Vigen illustrates the absurdity of uncritical data mining.

How to Spot and Counter It

Distinguish between exploratory and confirmatory analyses. Require replication on independent data for any dredged finding. Apply multiple comparison corrections appropriate to the number of tests actually conducted.

The Takeaway

The Data Dredging (Fishing Expedition) is one of those reasoning errors that sounds perfectly logical at first glance. That's what makes it dangerous — it wears the costume of valid reasoning while smuggling in a broken conclusion. The best defense? Slow down and ask: does this conclusion actually follow from these premises, or am I just connecting dots that happen to be near each other?

Next time someone presents you with an argument that "just makes sense," check the structure. The feeling of logic is not the same as logic itself.

Related Articles