🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!
collider_bias
A statistical error that occurs when conditioning on a variable that is causally affected by two other variables creates a spurious association between those two variables. In a causal diagram, a collider is a variable where two causal arrows converge, and conditioning on it opens a non-causal path.
Among hospitalized patients (collider), a negative correlation appears between two diseases that are actually independent in the general population, because having either disease is sufficient for hospitalization.
A study of professional athletes finds a puzzling negative correlation between raw strength and cardiovascular endurance. In the general population the two traits are unrelated, but because both independently increase the chance of making it to elite sport (the collider), conditioning on being a professional athlete creates a spurious trade-off.
Researchers studying successful tech startups find that companies with charismatic founders tend to have weaker initial products. In the broader startup population, charisma and product quality are unrelated — but investors fund startups that have at least one of the two, so among funded companies (the collider), the two traits appear negatively correlated.
Binary (yes/no) questions an LLM must answer to identify this aspect:
Is a statistical relationship between two variables being analyzed?
Type: binaryIs the analysis conditioned on a third variable that is causally influenced by both?
Type: binaryDoes conditioning on this collider variable create a spurious association between the two variables of interest?
Type: binaryWould the association disappear or reverse without conditioning on the collider?
Type: binaryA statistical error that occurs when conditioning on a variable that is causally affected by two other variables creates a spurious association between those two variables. In a causal diagram, a collider is a variable where two causal arrows converge, and conditioning on it opens a non-causal path.
Conditioning on a common effect creates a mathematical dependency between its causes, even when they are truly independent. This is counterintuitive because controlling for variables is usually seen as beneficial.
Draw the causal diagram before deciding which variables to control for. Never condition on a descendant of the exposure and outcome without understanding the causal structure.
Epidemiological studies, social science research, machine learning feature selection, and hospital-based studies.
Use these tools to detect, analyze, or train this aspect.