Spatial Autocorrelation

Also Known As: Spatial dependence Spatial clustering

Statistical Error ID: spatial_autocorrelation

Definition

Spatial autocorrelation occurs when the values of a variable at nearby locations are more similar (positive autocorrelation) or more dissimilar (negative autocorrelation) than expected by chance. When present in data analyzed with standard regression, it violates the assumption of independent observations, leading to underestimated standard errors, inflated test statistics, and false confidence in results. It reflects Tobler's First Law of Geography: everything is related to everything else, but near things are more related.

Examples

A study analyzes property values across a city using standard regression and finds a highly significant effect of nearby park access. However, property values are spatially autocorrelated — expensive neighborhoods cluster together regardless of parks. The standard errors are too small, and the park effect is overstated.

A public health study uses standard regression to examine the relationship between fast-food restaurant density and obesity rates across census tracts, finding a strong positive effect. However, obesity rates are spatially clustered — high-obesity neighborhoods tend to be surrounded by other high-obesity neighborhoods — violating the independence assumption and inflating the statistical significance of the result.

An agricultural study models crop yield as a function of fertilizer application across farm plots, reporting highly significant results. Neighboring plots share the same soil type, microclimate, and pest pressure, so their yields are correlated by geography rather than treatment alone, making the standard errors unrealistically small and the findings appear more robust than they are.

Verification Steps

Verification Steps

Binary yes/no questions that an AI must answer to detect a reasoning pattern in a text.

Each of the 452 aspects has verification steps — simple yes/no questions designed to systematically detect whether a pattern appears in a text. For ad hominem: "Does the argument attack a person rather than their claim?" For false dichotomy: "Are only two options presented when more exist?" This ensures consistent, reproducible analysis.

View in glossary →

Binary (yes/no) questions an LLM must answer to identify this aspect:

1

Are the observations located in geographic space and potentially influenced by proximity?
Type: binary
2

Do nearby observations tend to have more similar values than distant observations?
Type: binary
3

Does the analysis assume that observations are independent of one another?
Type: binary
4

Has the study tested for spatial autocorrelation using Moran's I or a similar diagnostic?
Type: binary

Description

Why It Works

Standard statistical methods assume each observation provides independent information. When nearby observations are correlated, the effective sample size is smaller than the actual sample size, but standard methods do not account for this, producing artificially precise estimates.

How to Counter

Test for spatial autocorrelation using Moran's I or Geary's C before running analyses. Use spatial regression models (spatial lag or spatial error models) that explicitly account for spatial dependence. Include spatial fixed effects or use geographically weighted regression.

Also Known As

Spatial dependence Spatial clustering

Real-World Context

Relevant in environmental science (pollution levels cluster), epidemiology (disease outbreaks cluster), real estate analysis (property values cluster), and political science (voting patterns cluster geographically).

Related Aspects

Modifiable Areal Unit Problem (MAUP) Ecological Inference Fallacy Misleading Aggregation (Averaging Artifact) Confounding Variable Neglect

Related Aspects

→ correlates with

Modifiable Areal Unit Problem (MAUP)

Statistical results change depending on how geographic boundaries are drawn or aggregated.

→ correlates with

Ecological Inference Fallacy

The error of drawing conclusions about individuals from aggregate (group-level) data. Correlations observed at the group level may not hold at the individual level due to within-group variation, confounding, and aggregation effects. This is the statistical formalization of the ecological fallacy.

→ correlates with

Misleading Aggregation (Averaging Artifact)

Presenting aggregate statistics (means, totals) that mask important variation or subgroup differences within the data. The aggregate can tell a completely different story than the disaggregated data.

→ correlates with

Confounding Variable Neglect

Failing to account for a third variable that influences both the independent and dependent variables, creating a spurious apparent relationship. The 'lurking variable' problem that undermines causal claims from observational data.

Hierarchical Context

→ is a Statistical Errors

Try it in action

Use these tools to detect, analyze, or train this aspect.

🔍 Text Analyzer

Scan a text for this pattern

⚗️ Argument Lab

Analyze an argument step by step

🎓 Fallacy Trainer

Quiz yourself on this aspect