Pre-registered falsification testing of sepsis prediction models.
286,510 ICU patients across MIMIC-IV and eICU-CRD. Hypothesis pre-registered on OSF before any data access. Null result on the primary cohort. Published anyway.
Summary.
We tested whether published sepsis prediction models retained predictive performance after controlling for care-process intensity, that is, whether the models were detecting genuine biological signal or were partly recapitulating the clinical workup that leads to a sepsis label. The hypothesis and analysis plan were registered on the Open Science Framework with a timestamp before any data access. The full pre-registration, code, and analysis pipeline are publicly available.
On the primary cohort, the result was null against our own hypothesis: after controlling for care-process intensity, the model's incremental signal collapsed inside the confidence bounds we had pre-specified. We published the null result. A secondary, exploratory analysis surfaced a more nuanced finding about administrative-versus-clinical sepsis definitions that warrants further work; we labeled it as exploratory in the preprint, not as the main finding.
Verifiable artifacts.
Every claim above is anchored to one of the links below.
Why this matters.
Pre-registered null results in clinical AI are rare. The incentives push the other way: models that work get written up; models that don't get quietly shelved. We pre-registered, locked the hypothesis, and published the null. That is the methodological commitment, not a marketing line.
The exploratory secondary finding, that administrative versus clinical definitions of sepsis produce materially different signal in these models, is a useful pointer for the field. But it is exploratory, surfaced after the primary analysis was reported, and we have labeled it as such throughout the paper.
Acknowledgment.
The methodology was reviewed by an established sepsis informatics researcher in connection with the credentialed MIMIC access this study required. The reviewer is acknowledged in the paper. Any errors are our own.
What this is the foundation of.
This study is the methodological foundation for everything RocSite builds. The same falsification discipline, pre-register the hypothesis, lock the analysis plan, publish the result regardless of direction, is how we approach ICH Triage validation and how we structure AI Governor comparisons. If a method survives this study, it survives.
Questions, critiques, replications.
If you have read the preprint and want to point out something we got wrong, please email Adam directly. The pre-registration is on OSF; the code is on GitHub. Replication is welcome.

Contact