🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!
misleading_aggregation
Misleading aggregation occurs when data is combined or averaged in ways that obscure important patterns, subgroup differences, or distributional characteristics. By reporting only a mean or total, the analyst can hide bimodal distributions, extreme outliers, or opposing trends within subgroups. The choice of aggregation method (mean vs. median vs. mode) can also be exploited to paint different pictures from the same underlying data.
A company reports that 'average employee compensation increased by 15% this year.' In reality, the CEO received a $10 million raise while the 500 other employees received a 1% raise. The mean was pulled up by the extreme outlier, misrepresenting the typical employee's experience.
A city government announces that 'average income in the downtown district rose 20% over five years.' The rise reflects wealthy newcomers moving in and pricing out lower-income longtime residents, whose incomes barely changed. The aggregate masks displacement rather than broad prosperity.
A university reports that its graduates earn an average starting salary of $95,000. The figure is pulled up by a small cohort of finance and engineering graduates. The median salary for graduates of the largest programs — education, social work, and the humanities — is closer to $38,000.
Binary (yes/no) questions an LLM must answer to identify this aspect:
Are aggregate statistics hiding meaningful variation within subgroups?
Type: binaryWould breaking the data down by relevant categories change the conclusion?
Type: binaryIs the distribution skewed in a way that makes the average misleading?
Type: binaryAre outliers or subgroup effects driving the aggregate result?
Type: binaryMisleading aggregation occurs when data is combined or averaged in ways that obscure important patterns, subgroup differences, or distributional characteristics. By reporting only a mean or total, the analyst can hide bimodal distributions, extreme outliers, or opposing trends within subgroups. The choice of aggregation method (mean vs. median vs. mode) can also be exploited to paint different pictures from the same underlying data.
Aggregated numbers are simpler and more digestible than distributional data. Audiences assume that averages represent typical cases, and rarely question whether the underlying distribution is skewed or bimodal.
Request the median alongside the mean, and ask about the distribution shape. Demand subgroup breakdowns and look for outliers that might be driving the aggregate statistic.
Misleading aggregation appears in income and wealth reporting, school district performance averages, and corporate revenue figures that combine growing and shrinking business units.
Statistical results change depending on how geographic boundaries are drawn or aggregated.
Nearby observations are correlated, violating the independence assumption in standard analyses.
Incorrectly assuming smooth or linear relationships between observed data points.
Use these tools to detect, analyze, or train this aspect.