Apps

🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!

Multiple Comparisons Problem

Also Known As: Look-Elsewhere Effect Multiple Testing Problem Multiplicity
Discourse Mechanics ID: multiple_comparisons_problem

Definition

The statistical error of performing many tests without adjusting for the increased probability of false positives. With a significance level of 0.05 and 20 independent tests, there is a 64% chance of at least one false positive. Failure to correct for this inflates the apparent number of 'significant' findings.

Examples

A brain imaging study tests 100,000 voxels for activation. At p < 0.05, about 5,000 voxels will appear significant by chance alone, potentially producing spurious 'brain activation' maps.

A nutrition researcher surveys 500 participants on 80 different dietary habits and tests each one for correlation with heart disease risk. At p < 0.05, roughly four associations will appear significant purely by chance. The researcher publishes the 'finding' that eating soup three times a week reduces risk, without correcting for multiple comparisons.

A social media company's data science team runs A/B tests on 200 minor interface variations in a single month, each evaluated at p < 0.05. Statistically, about 10 of those tests will show a 'significant' effect even if none of the changes actually influence user behavior, leading the team to roll out ineffective features confidently.

Verification Steps
Verification Steps
Binary yes/no questions that an AI must answer to detect a reasoning pattern in a text.
Each of the 452 aspects has verification steps — simple yes/no questions designed to systematically detect whether a pattern appears in a text. For ad hominem: "Does the argument attack a person rather than their claim?" For false dichotomy: "Are only two options presented when more exist?" This ensures consistent, reproducible analysis.

Binary (yes/no) questions an LLM must answer to identify this aspect:

  1. 1

    Are multiple statistical tests being performed on the same dataset?

    Type: binary
  2. 2

    Is the significance threshold (alpha) applied per-test rather than adjusted for the total number of tests?

    Type: binary
  3. 3

    Does the number of tests substantially increase the probability of at least one false positive?

    Type: binary
Deep Dive
The expandable detail section on each aspect page with examples, psychology, and counter-strategies.
The Deep Dive section provides in-depth information about each aspect: a real-world example showing the pattern in action, an explanation of why it works psychologically, practical advice on how to counter it, alternative names, and links to related aspects.

Hierarchical Context