🧪 This platform is in early beta. Features may change and you might encounter bugs. We appreciate your patience!
spectrum_bias
Spectrum bias occurs when the accuracy of a diagnostic test is evaluated using a patient population that does not reflect the full range of disease severity encountered in practice. Tests often perform best when distinguishing severe disease from healthy controls, but perform poorly in the clinically relevant middle ground where the diagnosis is uncertain.
A blood test for a liver disease is validated by comparing patients with advanced liver failure to perfectly healthy volunteers, achieving 98% accuracy. When used in a primary care setting on patients with mild symptoms, accuracy drops to 60% because the test cannot distinguish early-stage disease from other minor conditions.
A rapid strep throat test is evaluated by testing children admitted to hospital with severe, culture-confirmed streptococcal infections against children with no symptoms at all, yielding 97% sensitivity. In a school nurse's office, where most children have mild sore throats from viral infections, the test performs far worse because the signal is much harder to distinguish from noise.
A machine-learning skin cancer screening tool is trained and validated on images of obvious melanomas confirmed by biopsy versus clearly benign moles, achieving near-perfect accuracy. When deployed in a dermatology clinic where most referrals involve ambiguous, borderline lesions, accuracy drops sharply because the tool was never tested on the difficult middle-ground cases it now encounters most often.
Binary (yes/no) questions an LLM must answer to identify this aspect:
Was the diagnostic test evaluated on a sample that includes a narrow range of disease severity?
Type: binaryDoes the study population differ from the population where the test will actually be used?
Type: binaryCould the test's sensitivity or specificity change across different patient subgroups?
Type: binaryAre the test's performance metrics presented as universal without noting population-specific limitations?
Type: binarySpectrum bias occurs when the accuracy of a diagnostic test is evaluated using a patient population that does not reflect the full range of disease severity encountered in practice. Tests often perform best when distinguishing severe disease from healthy controls, but perform poorly in the clinically relevant middle ground where the diagnosis is uncertain.
Extreme cases are easy to classify. By testing at the extremes of the disease spectrum, researchers inflate apparent accuracy. Clinicians and patients then trust these numbers in situations where the test actually performs much worse.
Evaluate diagnostic tests across the full spectrum of disease severity, including borderline and mild cases. Report sensitivity and specificity by subgroup. Validate tests in the clinical setting where they will actually be deployed.
Many rapid diagnostic tests for infectious diseases show excellent sensitivity in hospital studies but perform poorly in community screening where most cases are mild or asymptomatic. This has been observed with COVID-19 antigen tests, which detect high viral loads well but miss early infections.
How participants are identified or recruited systematically distorts the sample.
Systematic exclusion of certain participants from a study distorts results.
Ignoring general statistical base rates in favor of specific individual-case info.
Prevalence studies miss fatal or short-duration cases, distorting disease-exposure associations.
A model or analysis fits the noise in the training data so closely that it fails to generalize to new data. The model captures random fluctuations rather than the underlying pattern.
Use these tools to detect, analyze, or train this aspect.