blog.category.aspect Mar 29, 2026 9 min read

Goodhart's Law: When the Measure Becomes the Enemy of the Goal

#blog.tag.aspect #blog.tag.d4_statistical_errors #blog.tag.bok #blog.tag.encyclopedia

Soviet central planners wanted to increase nail production. They assigned factories a target: produce more nails. The factories complied — by manufacturing millions of tiny, useless nails that no builder would touch, because the target was defined in units and small nails are easy to make quickly. Planners adjusted the target: produce more weight of nails. Factories complied again — by making a small number of enormous, equally useless nails, because a single giant spike weighs as much as a thousand small ones. The measure changed. The gaming changed. The mission — producing useful nails for construction — remained perpetually unachieved.

The Principle

Goodhart's Law originated in the work of British economist Charles Goodhart, initially formulated in the context of monetary policy in a 1975 paper. His observation was that when a central bank targets a statistical measure — say, the money supply — the behaviour of economic actors changes in response to the target, eroding the statistical relationship between that measure and the underlying economic reality it was meant to track. The measure becomes unreliable precisely because it is the object of optimisation.

The version most widely cited today was articulated by anthropologist Marilyn Strathern in 1997, generalising Goodhart's insight: "When a measure becomes a target, it ceases to be a good measure." This formulation captures the essential dynamic: the act of selecting a proxy measure as an optimisation target changes the behaviour being measured, and that changed behaviour typically diverges from the underlying goal the measure was originally tracking.

The mechanism has two components. First, a measure is always a proxy for some underlying value — it captures the value imperfectly, measuring a correlated indicator rather than the thing itself. Second, when rational actors are rewarded for improving the proxy, they discover and exploit the gap between the proxy and the underlying value. The proxy improves; the underlying value may not.

The Soviet Nail Factory and Central Planning

The nail factory example is almost certainly apocryphal — no specific documented case has been traced, and it circulates as folk wisdom rather than verifiable history. But it encapsulates a pattern that was extensively documented in Soviet central planning. Enterprises throughout the Soviet system gamed output targets in predictable ways:

Glass factories given weight-based targets produced thick, heavy glass of low quality.
Agricultural collectives assigned harvest-weight targets delivered wet grain (heavier but lower in nutritional value) or early-harvested crops (easier to deliver but less developed).
Construction enterprises given floor-area targets produced buildings with low ceilings, saving material while maximising the measured output.
Medical facilities assigned patient-discharge-rate targets discharged patients prematurely to improve the metric.

These examples reveal the structural problem: central planners could measure output quantity but not output quality or usefulness. Actors who were rewarded for the measurable quantity rationally and predictably sacrificed the unmeasured quality. The Soviet system did not suffer from Goodhart's Law because Soviet citizens were uniquely dishonest; it suffered because it applied target-based incentives to complex outputs that could not be fully specified by any single metric — a universal problem that appears in every system that measures performance.

Education: The Testing Trap

In education, Goodhart's Law manifests as "teaching to the test": the substitution of test-score improvement for genuine learning as the target of educational activity. When school performance is measured, funded, and regulated on the basis of standardised test scores, the predictable consequence is that teaching time, curriculum design, and school resource allocation are optimised for test-score performance rather than the broader educational goals that test scores were meant to proxy.

The No Child Left Behind Act (2001) and the Every Student Succeeds Act (2015) in the United States, and similar accountability frameworks in the UK (OFSTED school inspections, league tables) and elsewhere, generated extensive documentation of this dynamic. Studies found that schools with high-stakes testing accountability:

Narrowed curriculum toward tested subjects (mathematics and reading), reducing time for arts, physical education, history, and science.
Focused instruction on "bubble students" — those just below the proficiency threshold who were most likely to generate marginal improvements in the school's pass rate, at the expense of both the lowest-performing and highest-performing students.
Engaged in various forms of score manipulation: excluding certain student categories from testing, rescheduling test days to exclude likely low-scorers, and in documented cases, outright score alteration.

The 2009 Atlanta Public Schools cheating scandal — in which 178 teachers and principals systematically altered student test answers — represented the extreme of institutional test-gaming. But the subtler forms of gaming (curriculum narrowing, bubble-student focus) are arguably more consequential because they operate at scale across all high-accountability systems, without any individual actors committing fraud.

Test scores may have genuinely risen in some cases. Whether the educational substance those scores were meant to represent improved proportionally is a separate, harder-to-measure question — and, by Goodhart's Law, the harder-to-measure substance is exactly what tends to be sacrificed.

Finance: Wells Fargo and the Cross-Sell Target

Between 2002 and 2016, Wells Fargo employees opened approximately 3.5 million unauthorised bank and credit card accounts on behalf of customers who had not requested them. The accounts were created, and often closed within days, to meet aggressive cross-selling targets. Branch employees who met targets received bonuses; those who did not faced termination. The target — number of new accounts opened per customer relationship — was a proxy for customer engagement and relationship depth. It became, instead, the direct object of manipulation.

The Wells Fargo scandal resulted in a $185 million regulatory fine in 2016, the resignation of CEO John Stumpf, and congressional hearings. It is among the most extensively documented corporate Goodhart's Law failures in recent history. The accounts metric was a reasonable proxy for genuine customer value, except when subjected to high-pressure optimisation under threat of employment consequences — at which point it decoupled entirely from the underlying objective.

This is characteristic of Goodhart's Law in corporate settings: the problem is not that the initial metric was poorly chosen, but that the pressure applied to it exceeds the threshold at which rational actors will exploit the proxy-reality gap rather than improve on the underlying value. Light incentives may produce genuine improvement; intense pressure on a proxy tends to produce gaming.

Healthcare Metrics: Mortality Rates and Wait Times

Healthcare quality measurement provides rich examples of Goodhart's Law at institutional scale. When hospital mortality rates are published and used in performance rankings, hospitals face incentives to improve the metric that are not always aligned with improving patient outcomes:

Hospitals may discourage admission of high-risk patients who are likely to die in hospital, improving their published mortality rate without improving care for those patients.
Patients may be transferred to hospice or palliative care settings before death, removing those deaths from the hospital's count.
Coding practices may be adjusted to classify more deaths as occurring after discharge or to assign comorbidities that move cases to higher risk-adjusted categories where death is statistically more expected.

Emergency department wait time targets in the UK National Health Service (four-hour wait times) have produced analogous gaming: patients are admitted to wards to formally exit the ED wait clock, even when the clinical rationale for admission is marginal; "corridor patients" awaiting actual beds technically clock out of the ED target. The metric improves; the patient's experience of unnecessary movement and waiting does not.

Technology and Content: Algorithmic Goodhart

Recommendation algorithms optimised for engagement metrics (clicks, watch time, likes, shares) are subject to Goodhart's Law at massive scale. Engagement is a proxy for user value — the assumption being that users engage more with content they find valuable. When that engagement metric is directly optimised under strong commercial pressure, content that maximally exploits psychological triggers (outrage, anxiety, novelty, social comparison) achieves high engagement scores regardless of whether it delivers value, accuracy, or genuine satisfaction to the user.

This is a form of Goodhart's Law that operates across billions of daily interactions. The metric (engagement) becomes the target; the target diverges from the underlying goal (user welfare, accurate information, genuine entertainment). The gap between the proxy and the reality is exploited — not by individual fraudsters, but by the optimisation process itself, which systematically surfaces content that hacks the metric.

The same dynamic applies to citation metrics in academia, follower counts in social media influence, and star ratings in e-commerce. Each was originally a proxy for quality; each has become a direct target that rational actors optimise, often at the expense of the quality it was meant to measure.

The Cobra Effect: When Solutions Become Problems

A related historical parable is the "cobra effect" — attributed to British colonial India, where an attempt to reduce cobra populations by paying bounties for dead cobras led to enterprising locals breeding cobras to collect the bounty. When the government discovered the scheme and cancelled the programme, breeders released their now-worthless stock into the wild, increasing the cobra population beyond its original level. The metric (dead cobras submitted) diverged from the goal (fewer cobras in the wild) when the metric became the object of incentivised production.

The cobra effect is Goodhart's Law applied to policy design: a proxy for the desired outcome becomes a productive input for actors who profit from generating the proxy without achieving the underlying outcome.

Living with Goodhart's Law

Goodhart's Law is not an argument against measurement. Measurement is essential to management, policy, and learning. The law is, rather, a structural constraint on how metrics should be used and what should be expected of them:

Use multiple metrics. A single metric is easy to game; a balanced scorecard of varied indicators is harder to manipulate simultaneously, and manipulation of one metric becomes visible as other metrics fail to improve in concert.
Rotate metrics. Regularly changing the measured indicators reduces the value of optimising for any single proxy over time.
Separate measurement from reward. Metrics used for understanding and learning are less susceptible to Goodhart's Law than metrics tied to high-stakes rewards or punishments. Light incentives preserve more of the proxy's validity.
Measure process, not just outcome. Supplementing outcome metrics with process and qualitative assessment reduces the exploitable gap between the proxy and the underlying goal.
Preserve direct observation. No metric replaces direct engagement with the underlying value. Schools should regularly assess teaching quality through observation, not just scores. Managers should talk to customers, not only read NPS dashboards.

Goodhart's Law is also a caution about the McNamara Fallacy — the tendency to treat what is measurable as the totality of what matters, and to discard the unmeasured. The measurable is always a sample of the important. Treating it as the whole is how measures become targets and targets become traps.

Sources & Further Reading

Goodhart, C.A.E. (1975). Problems of monetary management: The UK experience. In Papers in Monetary Economics. Reserve Bank of Australia.
Strathern, M. (1997). 'Improving ratings': Audit in the British University System. European Review, 5(3), 305–321.
Muller, J.Z. (2018). The Tyranny of Metrics. Princeton University Press.
Consumer Financial Protection Bureau. (2016). CFPB Fines Wells Fargo $100 Million for Widespread Illegal Practice of Secretly Opening Unauthorized Accounts. cfpb.gov.
Koretz, D. (2008). Measuring Up: What Educational Testing Really Tells Us. Harvard University Press.
Davies, W. (2017). How statistics lost their power – and why we should fear what comes next. The Guardian, January 19.