Argument from Correlation to Cause: When "Together" Means "Therefore"
In the 1950s, a striking correlation emerged in the data: countries where more cigarettes were smoked had higher rates of lung cancer. Tobacco companies responded — initially with some support from statisticians — by pointing out that correlation does not prove causation. They were technically correct. Perhaps something else caused both smoking and cancer; perhaps the causal arrow ran the other way. For a decade, this argument provided cover. Eventually, the biological mechanism was established, the confounders were ruled out, and the causal link became one of the most thoroughly demonstrated in medical history. The tobacco industry had exploited a genuine principle of statistical reasoning to defend a dangerous falsehood. This is the double edge of the correlation-causation distinction: it is both a legitimate scientific caution and a tool for motivated scepticism.
What Correlation Actually Tells You
A correlation is a statistical relationship between two variables: when one increases, the other tends to increase (positive correlation) or decrease (negative correlation). Correlations can be strong or weak, consistent or variable, and they can hold across a wide range of data or only in restricted conditions.
Correlation is valuable. It is the primary tool for detecting potential causal relationships in observational data where experiments are impossible or unethical. The correlation between smoking and lung cancer, between dietary fat and cardiovascular disease, between poverty and educational outcomes — all were first established as correlations before causal mechanisms were identified. Dismissing correlational evidence wholesale would leave us unable to learn anything from epidemiology, economics, or social science.
But correlation does not, by itself, establish causation. Three distinct alternatives always remain open:
- Direct causation: A causes B (or B causes A). This is what the argument from correlation to cause typically assumes.
- Reverse causation: B causes A. The causal arrow runs opposite to the assumed direction. Countries with high healthcare spending have high rates of chronic disease — not because healthcare causes disease, but because sick populations spend more on healthcare.
- Common cause (confounding): A third variable C causes both A and B, producing a correlation without any direct causal connection between them. Ice cream and drowning are both caused by hot weather. Shoe size and reading ability correlate in children because both increase with age.
The Classic Examples
Ice cream and drowning. In summer months, both ice cream sales and drowning deaths peak. The correlation is real and robust. The common cause is heat and outdoor recreation — both variables are driven by the same seasonal factor. No one seriously believes ice cream causes drowning, which makes this a useful pedagogical example of how striking correlations can be entirely spurious once the confounder is identified.
Nicolas Cage films and pool drownings. Statistician Tyler Vigen has documented dozens of correlations between completely unrelated time series — per capita cheese consumption correlates with deaths by bedsheet tangling; Nicholas Cage film releases correlate with swimming pool drownings. These are the result of chance alignment of trends in limited datasets — a phenomenon that becomes inevitable when you search enough pairs of variables without correcting for multiple comparisons (see: p-hacking).
Economic growth and democracy. There is a long-running debate about whether economic development causes democratic institutions, or democratic institutions cause economic development, or whether both are driven by third factors like education levels, rule of law, or colonial history. The correlation is well-established; the causal story remains contested after decades of research.
Vaccines and autism. A small 1998 study by Andrew Wakefield appeared to find a correlation between MMR vaccination and autism diagnoses. The study was fraudulent, was retracted, and Wakefield was struck off. But the correlation claim persists in public discourse, fuelled by the real (and separately caused) increase in autism diagnoses over the same period that vaccination rates were rising — a classic spurious correlation driven by independent trends. The harm done by this argument from correlation to cause is still being counted in preventable deaths from measles.
The Argument Form and Its Critical Questions
In argumentation theory, the argument from correlation to cause has the following schematic form:
- Correlation premise: Variables A and B are positively (or negatively) correlated.
- Causal inference: The correlation is evidence of a causal relationship between A and B.
- Direction premise: The causal direction is from A to B (or B to A).
- Conclusion: A causes B.
The critical questions that expose the argument's vulnerabilities:
- Is the correlation robust? Is it statistically significant? Does it replicate across independent samples? Or could it be a chance finding?
- Have confounders been controlled for? What other variables could be driving both A and B? Have researchers tested for them?
- Is reverse causation possible? Could B cause A rather than the other way around?
- Is there a plausible mechanism? Can a causal pathway be identified that would explain how A produces B?
- Has the relationship been tested experimentally? Randomised controlled experiments remain the gold standard for establishing causation — can one be done?
Causal Criteria: When Correlation Justifies Causal Inference
The epidemiologist Bradford Hill proposed nine criteria in 1965 for evaluating whether an observed correlation supports a causal conclusion. These are not a checklist — satisfying more criteria increases confidence, but no single criterion is necessary or sufficient:
- Strength: Strong correlations are more likely to reflect causation than weak ones.
- Consistency: The correlation replicates across different populations, settings, and methods.
- Specificity: A causes B and not a dozen other things simultaneously (though this criterion is now less emphasised).
- Temporality: A must precede B in time. (This is the only criterion Hill considered necessary.)
- Biological gradient: A dose-response relationship — more of A produces more of B.
- Plausibility: The causal claim is consistent with known biological or social mechanisms.
- Coherence: The claim fits with what is known from other evidence.
- Experiment: Experimental evidence supports the causal story.
- Analogy: Similar causal relationships are known to exist.
Smoking and lung cancer meets virtually every criterion. Most contested correlations in nutrition science, economics, and social policy meet far fewer.
The Statistical Frontier: DAGs and Causal Inference
Modern causal inference has moved beyond simply noting that correlation does not equal causation. The statistician Judea Pearl developed the framework of Directed Acyclic Graphs (DAGs) to make causal assumptions explicit and to identify what statistical adjustments would — and would not — allow causal conclusions from observational data. Pearl's framework shows that whether adjusting for a variable helps or harms causal inference depends on its structural role in the causal graph: controlling for a confounder helps; controlling for a mediator or a "collider" can introduce bias rather than remove it.
This represents a significant advance: rather than dismissing all observational evidence as non-causal, it provides systematic tools for deciding what can be inferred, under what assumptions, from what data. The practical implication: the right response to "correlation ≠ causation" is not to abandon the correlation but to ask what additional evidence or analysis would warrant the causal step.
Motivated Use and Motivated Scepticism
The correlation-causation distinction can be deployed in both directions as a motivated argument. The tobacco industry used "it's only a correlation" to suppress inconvenient evidence. But advocates can equally cite correlations as proof of causation when the causal story fits their preferred narrative. Social scientists have documented the "hot hand fallacy" debate, gun control debates, and nutrition science debates as arenas where the same statistical evidence is interpreted causally by one side and merely correlationally by the other — and the interpretation tends to track prior beliefs (see: confirmation bias).
Related fallacies worth studying alongside this one: false cause (the broader category of causal errors), apophenia (finding patterns in random data), base rate fallacy, and p-hacking (generating spurious correlations through selective analysis).
Sources & Further Reading
- Pearl, Judea, and Dana Mackenzie. The Book of Why: The New Science of Cause and Effect. Basic Books, 2018.
- Hill, Austin Bradford. "The Environment and Disease: Association or Causation?" Proceedings of the Royal Society of Medicine, 58(5), 1965.
- Vigen, Tyler. Spurious Correlations. Hachette Books, 2015. And: tylervigen.com
- Hernán, Miguel A., and James M. Robins. Causal Inference: What If. Chapman & Hall/CRC, 2020. (Open access)
- Walton, Douglas. Fundamentals of Critical Argumentation. Cambridge University Press, 2006.
- Wikipedia: Correlation does not imply causation
- Wikipedia: Bradford Hill criteria