blog.category.aspect Mar 29, 2026 8 min read

Misleading Aggregation: The Average That Hides Everything

#blog.tag.aspect #blog.tag.d4_statistical_errors #blog.tag.bok #blog.tag.encyclopedia

A statistician, the old joke goes, drowned in a river with an average depth of three feet. The average was accurate. Most of the river was ankle-deep. One section was fifteen feet deep, and she stepped into it. The average concealed the one thing that mattered. This is the essence of misleading aggregation: the transformation of heterogeneous, structured, varied data into a single summary number that is technically correct and practically useless — or worse, actively deceptive. It is one of the most pervasive problems in data communication, and it operates invisibly, because the average is calculated correctly. The error is not arithmetic; it is conceptual.

What Aggregation Does — and What It Destroys

Aggregation is not inherently wrong. Summarising data is essential: without it, we cannot compare, communicate, or make decisions at scale. The problem arises when aggregation compresses variation that is not merely noise but meaningful signal. When the data contains distinct subgroups with different properties, or when the relationship between variables differs across those subgroups, collapsing everything into a single average does not describe any real entity in the dataset. The average is a ghost — a number that belongs to no actual case.

The standard example is income distribution. If nine people each earn $30,000 per year and one person earns $3,000,000, the average income is $327,000. The median is $30,000. Both are "correct." The average describes a situation that no one in the group actually inhabits; the median describes what a typical person actually experiences. Using the average to characterise this group is not a lie, but it is profoundly misleading — particularly if you are designing social policy or trying to understand whether people are economically secure.

Simpson's Paradox: When the Average Reverses Direction

The most counterintuitive form of misleading aggregation is Simpson's Paradox: a statistical phenomenon in which a trend that appears in several different groups reverses direction when those groups are combined. The aggregate relationship is not just uninformative — it actively points the wrong way.

The classic illustration involves a study of kidney stone treatments. Treatment A had a success rate of 78% (273/350 patients). Treatment B had a success rate of 83% (289/350 patients). On this basis, Treatment B appears superior. But when the data is stratified by stone size — a confounding variable — the picture reverses completely:

For small stones: Treatment A succeeds 93% of the time vs. Treatment B's 87%.
For large stones: Treatment A succeeds 73% of the time vs. Treatment B's 69%.

Treatment A is superior for both subtypes — yet appears inferior in the aggregate. The paradox arises because the two treatments were applied to different compositions of patients: Treatment B was more commonly used on smaller stones, which are easier to treat overall. The aggregate success rate reflects the case mix, not the treatment quality.

Simpson's Paradox has appeared in university admissions data (Berkeley's apparent gender bias in admissions, which disappeared when individual departments were examined), in epidemiological studies of vaccination efficacy during the Delta wave of COVID-19, and in comparisons of batting averages in baseball. In each case, the aggregate statistic was technically correct and deeply misleading, because it conflated variation in outcome with variation in group composition.

The Will Rogers Phenomenon

A related aggregation artefact, sometimes called the Will Rogers Phenomenon or stage migration bias, was first identified in oncology. When cancer staging criteria change — reclassifying some patients who were previously "Stage I" as "Stage II" — a remarkable thing can happen: survival rates improve in both groups. Stage I patients who are reclassified were the sickest Stage I patients (by the new criteria); removing them from Stage I improves the average survival of Stage I. The newly reclassified patients join Stage II as its healthiest members; their addition improves Stage II average survival. Both groups look better — and no patient was actually treated differently. The improvement is entirely an artefact of reclassification changing the composition of each group.

The comedian Will Rogers is quoted (apocryphally) as saying that when the Okies left Oklahoma for California, they raised the average intelligence of both states. The observation makes statistical sense. Moving low-value items from a high group to a low group can improve the average of both groups while changing nothing real.

Ecological Fallacy: The Wrong Level of Analysis

A closely related error is the ecological fallacy: drawing conclusions about individuals from aggregate (group-level) data. If countries with higher chocolate consumption per capita have more Nobel laureates per capita, it would be an ecological fallacy to conclude that eating chocolate makes you smarter. The correlation at the country level may exist (it has been observed, satirically, by Franz Messerli in the New England Journal of Medicine), but it tells us nothing about whether any individual's chocolate consumption affects their likelihood of winning a Nobel Prize. The relationship at the group level may be driven by confounders (wealthy countries consume more chocolate and also fund more research), by aggregation of distinct effects, or by simple coincidence.

Ecological correlations are often much stronger than individual-level correlations, precisely because group averaging suppresses within-group variation and leaves only between-group differences — the correlations that are most influenced by confounding. Researchers who treat ecological correlations as individual-level evidence routinely overestimate effect sizes and misidentify causes.

Performance Tables and League Rankings

Hospital performance league tables are a canonical applied example of misleading aggregation. If Hospital A treats predominantly elderly, complex patients and Hospital B treats predominantly young, healthy patients, comparing their raw mortality rates says nothing useful about the quality of care at either institution — and potentially inverts the truth. A high raw mortality rate at a specialist cancer centre may reflect an excellent institution that accepts the most difficult cases, not a poor one that kills its patients. Unadjusted league tables measure case mix more than quality.

School performance tables suffer identically. Schools serving high-income areas with well-supported students will show high raw scores; schools serving disadvantaged areas may show lower scores despite being more effective — because effectiveness is about value added relative to starting point, not absolute outcome. Publishing raw outcome data as a measure of institutional quality is a category error that punishes mission-driven institutions and rewards advantageous demographics.

Why Averages Seduce Us

The persistence of misleading aggregation in public discourse is not accidental. Averages are compelling because they are simple, authoritative-sounding, and comparable. They reduce multidimensional complexity to a single number that can be ranked, communicated in a sentence, and remembered. The cognitive machinery that produces the availability heuristic and the overconfidence effect is also attracted to simple summary statistics: they feel like knowledge. The uncertainty, variation, and subgroup structure that they erase is invisible.

The McNamara Fallacy describes the related failure of optimising for what can be measured rather than what matters — and average scores are quintessentially measurable. Body counts in Vietnam, GDP growth, average test scores, average waiting times: each is an aggregate that captures something real and obscures something important. They are the decision-maker's preferred currency because they are defensible, auditable, and easily communicated. Their misleading properties are features, not bugs, from the perspective of anyone who wants to communicate selectively.

Seeing Through the Average

The practical antidote to misleading aggregation is disaggregation: always ask what is hidden inside the average.

Examine the distribution: Mean and median diverge when the distribution is skewed. When they diverge sharply, the average is describing the tail, not the typical. Always ask for both, and ideally for a histogram or percentile breakdown.
Identify relevant subgroups: Ask whether the population being aggregated is actually homogeneous. If it contains groups with structurally different properties — patients with different diagnoses, students at different starting levels, countries at different stages of development — aggregate statistics may be meaningless or misleading.
Control for confounders: When comparing aggregate statistics across groups, ask whether the groups have similar compositions. Apparent differences in group-level averages are often differences in composition rather than differences in the underlying process.
Look for Simpson's Paradox: When an aggregate result seems surprising, check the stratified data. A reversal at the subgroup level is a strong signal that a confounding variable is driving the aggregate relationship.
Ask what the average is for: An average is a tool for a specific purpose — to estimate the expected value for a randomly drawn member of the population. If your decision concerns a specific subpopulation, or if you care about the distribution of outcomes rather than the expected value, the average may be the wrong summary statistic entirely.

The river is three feet deep on average. The statistician drowned anyway. The lesson is not that averages are wrong — it is that the wrong average, applied to the wrong question, can kill you just as surely as no data at all. Possibly more surely, because it feels like knowledge.

Sources & Further Reading

Pearl, J., & Mackenzie, D. The Book of Why: The New Science of Cause and Effect. Basic Books, 2018. (Chapter on Simpson's Paradox.)
Charig, C. R., et al. "Comparison of Treatment of Renal Calculi by Open Surgery, Percutaneous Nephrolithotomy, and Extracorporeal Shockwave Lithotripsy." BMJ 292 (1986): 879–882. (The kidney stone Simpson's Paradox study.)
Feinstein, A. R., et al. "The Will Rogers Phenomenon: Stage Migration and New Diagnostic Techniques as a Source of Misleading Statistics for Survival in Cancer." New England Journal of Medicine 312, no. 25 (1985): 1604–1608.
Messerli, F. H. "Chocolate Consumption, Cognitive Function, and Nobel Laureates." New England Journal of Medicine 367 (2012): 1562–1564. (Satirical ecological fallacy paper.)
Darrell Huff. How to Lie with Statistics. Norton, 1954. (Still the most readable introduction to statistical deception.)
Wikipedia: Simpson's paradox · Ecological fallacy

Misleading Aggregation: The Average That Hides Everything

What Aggregation Does — and What It Destroys

Simpson's Paradox: When the Average Reverses Direction

The Will Rogers Phenomenon

Ecological Fallacy: The Wrong Level of Analysis

Performance Tables and League Rankings

Why Averages Seduce Us

Seeing Through the Average

Sources & Further Reading

Related Articles

The Base Rate Fallacy: Why a 99% Accurate Test Can Still Mostly Be Wrong

Berkson's Paradox: Why Your Dating Pool Lies to You About Reality

Confounding Variable Neglect: The Hidden Third Factor Behind Every Suspicious Correlation

Related Articles

blog.category.aspect 8 min read

The Base Rate Fallacy: Why a 99% Accurate Test Can Still Mostly Be Wrong

blog.category.aspect 8 min read

Berkson's Paradox: Why Your Dating Pool Lies to You About Reality

blog.category.aspect 8 min read

Confounding Variable Neglect: The Hidden Third Factor Behind Every Suspicious Correlation