blog.category.aspect Mar 29, 2026 8 min read

The Base Rate Fallacy: Why a 99% Accurate Test Can Still Mostly Be Wrong

#blog.tag.aspect #blog.tag.d4_statistical_errors #blog.tag.bok #blog.tag.encyclopedia

Imagine a disease that affects 1% of the population. A test for this disease is 99% accurate: it correctly identifies 99% of people who have the disease, and correctly clears 99% of people who don't. You take the test. It comes back positive. What's the probability you actually have the disease? Most people say 99%. The actual answer is 50%. A positive result on this near-perfect test means you're no more likely to have the disease than to not have it — because the rarity of the disease itself, the base rate, transforms the mathematics of what a positive result means.

The Prior Probability That Gets Left Out

The base rate fallacy occurs when new evidence is evaluated without adequately accounting for the prior probability of what's being tested. The "base rate" is simply the background frequency of something in the relevant population — how common a disease is, how often a person lies, how frequently fraud occurs, how often a particular profession commits a crime. It is the starting probability before any new information arrives.

When we update our beliefs based on evidence, the base rate is supposed to anchor that update. Bayesian probability theory formalises this: the posterior probability (what we believe after evidence) is a function of both the likelihood of the evidence given the hypothesis and the prior probability of the hypothesis. Ignore the prior, and you systematically over-weight striking but rare evidence while under-weighting the mundane reality of how common or uncommon things actually are.

The Mathematics Made Concrete

Return to the 1% disease / 99% accurate test example, with 10,000 people:

100 people actually have the disease (1% of 10,000)
9,900 people do not have the disease
The test correctly identifies 99 of the 100 sick people (true positives)
The test incorrectly flags 1% of the 9,900 healthy people: that's 99 false positives
Total positive results: 99 true + 99 false = 198
Fraction of positives that are true: 99/198 = exactly 50%

This result feels paradoxical but is mathematically exact. A test that is 99% accurate produces results that are only 50% reliable when the base rate is 1%. The accuracy of the test tells you how well it distinguishes sick from healthy people within each group. But the rarity of the disease means the healthy group is 99 times larger, providing 99 times more opportunities for false positives even at a 1% error rate.

Change the base rate, and the positive predictive value changes dramatically. If the disease affects 10% of the population, the same 99% accurate test produces positive results that are 91.7% reliable. If the disease affects 50%, reliability rises to 99%. The test hasn't changed — only the prior probability has. The lesson: the meaning of a test result cannot be separated from the prior probability of what's being tested.

Medical Diagnosis and Screening

The base rate fallacy is not an abstract puzzle — it has direct consequences for how medical tests are interpreted and how screening programmes should be designed.

Mammography screening is the most debated example. Breast cancer has a prevalence of roughly 0.8% among women in their 40s with no specific risk factors. Mammography has a sensitivity of around 80–90% (detecting most actual cancers) but a false positive rate of about 7–10% per screen. In a population where prevalence is low, the positive predictive value of a mammogram for a woman with no symptoms is substantially below 10%: most positive mammograms — perhaps 80–90% in low-risk populations — do not represent actual cancer. The New England Journal of Medicine published a landmark study in 1998 finding that about half of women who undergo annual mammograms for ten years will receive at least one false positive requiring a biopsy or follow-up procedure.

This is not an argument against mammography — early detection saves lives. It is an argument for honest communication about what a positive result means, and for ensuring that screening is targeted toward populations where the base rate is high enough to make positive results meaningfully informative. A positive test in a 60-year-old woman with a family history of breast cancer has a very different meaning than the same positive test in a healthy 35-year-old — because the base rates differ dramatically.

COVID-19 testing during the pandemic provided millions of people with a practical encounter with base rate reasoning. During periods of very low community prevalence, positive rapid tests — especially in asymptomatic individuals — had substantially lower positive predictive values than during peaks, when prevalence was high. The same test, the same result, a very different interpretation depending on what proportion of the tested population actually had the virus at that moment.

Security, Surveillance, and Profiling

The base rate fallacy is endemic to security and law enforcement, with serious civil liberties implications. Mass surveillance programmes, pre-crime algorithms, and behavioural screening at airports all face the same mathematical problem: if the target behaviour (terrorism, drug trafficking, fraud) is rare in the population being screened, even a highly accurate detection system will produce enormous numbers of false positives relative to true positives.

Consider a hypothetical airport screening system that correctly identifies 99% of terrorists and produces false alarms for only 0.1% of innocent travellers. If one in a million passengers is actually a terrorist, then in a day processing 100,000 passengers, the system will on average flag 100 innocent people for every one genuine terrorist. Ninety-nine out of every 100 flags are wrong. The system's accuracy is real and impressive; its utility in the specific context — where the base rate of terrorism is extraordinarily low — is deeply limited.

This mathematics is at the heart of debates about stop-and-frisk policing, predictive policing algorithms, and automated fraud detection systems. High-precision detection systems applied to rare targets generate false positives that impose real costs on large numbers of innocent people.

The Courtroom and the Prosecutor's Fallacy

A specific and legally consequential form of the base rate fallacy is known as the prosecutor's fallacy: confusing the probability of a match given innocence with the probability of innocence given a match. A DNA match that occurs by chance only 1 in a million times sounds enormously powerful as evidence. But if a database of a million suspects is searched, one false match is expected — and the probability that a specific match in a one-million-person search represents a guilty party is nowhere near the apparent "1 in a million" figure.

This error has contributed to wrongful convictions. In the 1999 case of British nurse Sally Clark, expert witness Sir Roy Meadow testified that the probability of two children in the same family dying of sudden infant death syndrome was 1 in 73 million — a figure that the prosecution and jury treated as the probability she was innocent. The calculation ignored both the base rate of such deaths in similar families and the alternative that the joint probability of murder by a parent was also low. Clark was convicted and spent three years in prison before her conviction was overturned. Sir Roy Meadow was struck off the medical register, later reinstated on appeal.

Why Our Minds Fail Here

The cognitive mechanism behind the base rate fallacy has been extensively studied by Amos Tversky and Daniel Kahneman, who identified it as a systematic product of the representativeness heuristic: judging probability by how "representative" something seems of a category, rather than by actual frequency. A positive test result represents sickness; a suspicious behaviour represents guilt; a confident person represents knowledge. These representativeness judgements are useful heuristics in many contexts, but they systematically crowd out base rate information.

In their classic "lawyer-engineer" experiments, Kahneman and Tversky told participants that a group of 100 people contained either 30 or 70 engineers (the rest lawyers), then gave them personality descriptions. Participants' probability estimates for whether a described person was an engineer were virtually unaffected by whether the group was 30% or 70% engineers — the representativeness of the description completely dominated the base rate. The base rate literally disappeared from conscious reasoning.

This connects directly to the availability heuristic: when a test result or a news story makes a particular outcome vivid and memorable, it crowds out the abstract statistical reality of how rare that outcome actually is. Plane crashes feel more probable than car accidents because they are more vivid, not because they are more frequent. A positive medical test feels like a near-certain diagnosis because the outcome is vivid, not because the mathematics support it.

Correcting the Fallacy

The antidote to the base rate fallacy is Bayesian thinking: systematically updating beliefs by combining prior probability with the likelihood of new evidence. In practice, this means:

Before interpreting any test result, find out the base rate of the condition being tested in your specific population.
Ask about the test's positive predictive value in that population — not just its general "accuracy."
Use frequency formats rather than probability formats when communicating: "99 out of 10,000" is processed more accurately than "0.0099."
Be especially sceptical of alarming findings in domains where the target event is rare.
Remember that confirmatory tests work differently when base rates are low — a second independent positive test dramatically raises reliability because false positives are (usually) independent.

The base rate fallacy is also corrected by being alert to its cousins: p-hacking produces false positives that compound the base rate problem by flooding the literature with misleading signals, and Simpson's Paradox shows how aggregating across groups can produce trends that reverse when the groups are separated. Understanding these phenomena together builds the kind of numeracy that resists statistical manipulation.

Sources

Kahneman, D., & Tversky, A. (1973). "On the psychology of prediction." Psychological Review, 80(4), 237–251.
Gigerenzer, G., & Hoffrage, U. (1995). "How to improve Bayesian reasoning without instruction: Frequency formats." Psychological Review, 102(4), 684–704.
Elmore, J. G., et al. (1998). "Ten-year risk of false positive screening mammograms and clinical breast examinations." New England Journal of Medicine, 338(16), 1089–1096.
Ioannidis, J. P. A. (2005). "Why most published research findings are false." PLOS Medicine, 2(8), e124.
Thompson, W. C., & Schumann, E. L. (1987). "Interpretation of statistical evidence in criminal trials." Law and Human Behavior, 11(3), 167–187.

The Base Rate Fallacy: Why a 99% Accurate Test Can Still Mostly Be Wrong

The Prior Probability That Gets Left Out

The Mathematics Made Concrete

Medical Diagnosis and Screening

Security, Surveillance, and Profiling

The Courtroom and the Prosecutor's Fallacy

Why Our Minds Fail Here

Correcting the Fallacy

Sources

Related Articles

Berkson's Paradox: Why Your Dating Pool Lies to You About Reality

Confounding Variable Neglect: The Hidden Third Factor Behind Every Suspicious Correlation

The Conjunction Fallacy: When More Detail Feels More Likely

Related Articles

blog.category.aspect 8 min read

Berkson's Paradox: Why Your Dating Pool Lies to You About Reality

blog.category.aspect 8 min read

Confounding Variable Neglect: The Hidden Third Factor Behind Every Suspicious Correlation

blog.category.aspect 7 min read

The Conjunction Fallacy: When More Detail Feels More Likely