🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!
double_dipping
Double-dipping (circular analysis) occurs when the same data is used both to generate a hypothesis and to test it, inflating the apparent significance of results. By selecting features, regions, or variables based on the data and then testing those same selections on the same data, the analysis becomes circular. This guarantees inflated effect sizes and artificially significant p-values because the test is biased toward confirming patterns already identified in the sample.
A neuroscientist scans brain activity across thousands of voxels, identifies the 10 voxels most active during a task, and then reports that 'these brain regions show significant activation during the task' using the same data. The significance is guaranteed by the selection process, not by genuine neural effects.
A market researcher surveys 1,000 customers, notices in the raw data that satisfaction scores are highest among users aged 50–65, and then runs a significance test on that same dataset to confirm that 'the 50–65 age group shows significantly higher satisfaction (p = 0.02).' The finding was generated and validated on identical data.
A social psychologist explores a large dataset of survey responses, notices a pattern suggesting that people who drink tea report lower anxiety, and then tests this hypothesis on the same dataset — reporting a statistically significant result without collecting new data to independently verify the pattern.
Binary (yes/no) questions an LLM must answer to identify this aspect:
Was the dataset first explored to find an interesting pattern?
Type: binaryWas the statistical significance of that same pattern tested on the same dataset?
Type: binaryWas a separate holdout or validation dataset used for the confirmatory test?
Type: binaryDouble-dipping (circular analysis) occurs when the same data is used both to generate a hypothesis and to test it, inflating the apparent significance of results. By selecting features, regions, or variables based on the data and then testing those same selections on the same data, the analysis becomes circular. This guarantees inflated effect sizes and artificially significant p-values because the test is biased toward confirming patterns already identified in the sample.
The circularity is often invisible in the final report because the selection step and the testing step appear as separate analyses. Readers assume the hypothesis was formed independently of the test data.
Split data into discovery and validation sets, or use cross-validation. Pre-register analysis methods, and be suspicious of analyses where the same dataset was used for both feature selection and hypothesis testing.
Double-dipping was identified as a widespread problem in fMRI neuroimaging research by Kriegeskorte et al. (2009). It also appears in machine learning when test data leaks into training, and in financial backtesting of trading strategies.
Use these tools to detect, analyze, or train this aspect.