Apps

🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!

Overfitting

Also Known As: overtraining curve fitting memorization
Statistical Error ID: overfitting

Definition

Overfitting occurs when a statistical model or analysis captures noise and random fluctuations in the training data rather than the underlying pattern. An overfitted model performs excellently on the data it was built on but fails to generalize to new, unseen data. This happens when the model is too complex relative to the amount of data available, allowing it to memorize specific data points rather than learning general relationships.

Examples

An analyst builds a stock market prediction model using 50 variables and 100 days of data. The model perfectly 'predicts' past prices, achieving 99% accuracy on historical data. When applied to the next month's data, it performs worse than simply guessing the market will stay flat.

A marketing analyst uses 30 demographic and behavioral variables to build a model predicting which customers will churn, trained on last month's data. The model scores 97% accuracy in testing but performs no better than random chance on next month's customers, as it learned quirks specific to that one month.

A medical researcher develops a cancer diagnosis algorithm trained on 50 patients, incorporating 200 biomarker features. It perfectly classifies every patient in the training set. When validated on a new hospital's patients, its accuracy drops to near baseline, because it memorized noise rather than true disease patterns.

Verification Steps
Verification Steps
Binary yes/no questions that an AI must answer to detect a reasoning pattern in a text.
Each of the 452 aspects has verification steps — simple yes/no questions designed to systematically detect whether a pattern appears in a text. For ad hominem: "Does the argument attack a person rather than their claim?" For false dichotomy: "Are only two options presented when more exist?" This ensures consistent, reproducible analysis.

Binary (yes/no) questions an LLM must answer to identify this aspect:

  1. 1

    Does the model perform much better on training data than on new/test data?

    Type: binary
  2. 2

    Is the model excessively complex relative to the amount of data?

    Type: binary
  3. 3

    Has the model been validated on independent or out-of-sample data?

    Type: binary
  4. 4

    Are random patterns in the data being treated as meaningful signals?

    Type: binary
Deep Dive
The expandable detail section on each aspect page with examples, psychology, and counter-strategies.
The Deep Dive section provides in-depth information about each aspect: a real-world example showing the pattern in action, an explanation of why it works psychologically, practical advice on how to counter it, alternative names, and links to related aspects.

Hierarchical Context