Recognition: no theorem link
Outlier detection for patient monitoring and alerting
Pith reviewed 2026-05-12 01:46 UTC · model grok-4.3
The pith
Unusual patient management decisions in electronic health records can be detected as outliers and alerted with true positive rates of 25% to 66%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A data-driven outlier detection approach applied to patient-management decisions in electronic health records can identify potential errors. When evaluated on cases from 4486 post-cardiac surgical patients using expert opinions as ground truth, the method achieved true alert rates ranging from 25% to 66%, with the highest rates for the strongest outliers. This supports the hypothesis that generating alerts for unusual decisions is worthwhile for patient monitoring.
What carries the argument
Outlier detection model trained on historical EHR patient cases to score the unusualness of current patient-management decisions.
Load-bearing premise
Unusual decisions with respect to past patient care are likely to be errors, and expert opinions provide a valid measure of whether an alert is true.
What would settle it
A follow-up study with more patients and multiple independent expert reviews finding true alert rates below 20% for most alerts would disprove that the approach leads to promising rates.
Figures
read the original abstract
We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management decisions using past patient cases stored in electronic health records (EHRs). Our hypothesis is that a patient-management decision that is unusual with respect to past patient care may be due to an error and that it is worthwhile to generate an alert if such a decision is encountered. We evaluate this hypothesis using data obtained from EHRs of 4486 post-cardiac surgical patients and a subset of 222 alerts generated from the data. We base the evaluation on the opinions of a panel of experts. The results of the study support our hypothesis that the outlier-based alerting can lead to promising true alert rates. We observed true alert rates that ranged from 25\% to 66\% for a variety of patient-management actions, with 66\% corresponding to the strongest outliers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a data-driven outlier detection method to flag anomalous patient-management decisions in EHR data from 4486 post-cardiac surgery patients. It generates 222 alerts and evaluates them through expert panel review, reporting true alert rates of 25% to 66% (higher for stronger outliers) and concluding that the approach yields promising results for error detection and alerting.
Significance. If the evaluation methodology proves robust, the work offers a practical, data-driven complement to rule-based clinical decision support by identifying deviations from historical care patterns. Strengths include the use of real EHR data at scale and direct expert validation on actual alerts. The reported rate range provides an initial signal that outlier strength correlates with expert-flagged issues, which could inform alerting thresholds in monitoring systems.
major comments (3)
- [§3 (Methods)] §3 (Methods): The outlier detection procedure is described at a high level only. No specification is given for the feature set extracted from EHR management decisions, the distance/density measure used to quantify outlierness, preprocessing (normalization, missing-value handling, temporal alignment), or any multiple-testing correction. These omissions make it impossible to assess reproducibility or to determine whether the reported rates depend on particular modeling choices.
- [§4 (Evaluation)] §4 (Evaluation): True-alert rates rest entirely on expert panel judgments, yet no inter-rater reliability statistic (Cohen’s or Fleiss’ kappa), panel size, selection criteria, blinding protocol, or correlation with downstream patient outcomes is reported. Without these, the 25–66% figures cannot be interpreted as evidence that statistical outlierness corresponds to clinical error rather than legitimate practice variation.
- [§4.1 and Table 2] §4.1 and Table 2: The subset of 222 alerts is presented without describing the sampling frame or selection criteria from the full set of outliers. If the 222 were chosen to include the strongest outliers, the observed rate gradient may be an artifact of selection rather than a general property of the method.
minor comments (2)
- [Abstract and §1] The abstract and §1 should explicitly define “true alert rate” (expert agreement that the decision was erroneous) and distinguish it from positive predictive value against objective outcomes.
- [Figure 1] Figure 1 (outlier score distribution) would benefit from axis labels that include units and from an overlay of the expert-labeled subset.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below and indicate the revisions we will make to improve the manuscript.
read point-by-point responses
-
Referee: [§3 (Methods)] §3 (Methods): The outlier detection procedure is described at a high level only. No specification is given for the feature set extracted from EHR management decisions, the distance/density measure used to quantify outlierness, preprocessing (normalization, missing-value handling, temporal alignment), or any multiple-testing correction. These omissions make it impossible to assess reproducibility or to determine whether the reported rates depend on particular modeling choices.
Authors: We agree that the methods section would benefit from greater specificity to support reproducibility. We will revise §3 to provide a detailed description of the feature set extracted from the EHR (encompassing vital signs, laboratory values, medications, and procedural data), the specific outlier detection approach and distance/density measure, all preprocessing steps including normalization, missing-value handling, and temporal alignment, and confirmation that no multiple-testing correction was applied. These additions will allow readers to evaluate the dependence of results on modeling choices. revision: yes
-
Referee: [§4 (Evaluation)] §4 (Evaluation): True-alert rates rest entirely on expert panel judgments, yet no inter-rater reliability statistic (Cohen’s or Fleiss’ kappa), panel size, selection criteria, blinding protocol, or correlation with downstream patient outcomes is reported. Without these, the 25–66% figures cannot be interpreted as evidence that statistical outlierness corresponds to clinical error rather than legitimate practice variation.
Authors: We will expand §4 to include the expert panel size, selection criteria, and blinding protocol. Inter-rater reliability was not computed in the original study, and downstream patient outcomes were not tracked. We will explicitly note these as limitations and discuss the implications for interpreting the true-alert rates as potential indicators of error versus legitimate variation in practice. revision: partial
-
Referee: [§4.1 and Table 2] §4.1 and Table 2: The subset of 222 alerts is presented without describing the sampling frame or selection criteria from the full set of outliers. If the 222 were chosen to include the strongest outliers, the observed rate gradient may be an artifact of selection rather than a general property of the method.
Authors: We will revise §4.1 and the Table 2 caption to explicitly state the sampling frame and selection criteria applied to obtain the 222 alerts from the complete set of outliers. This clarification will enable readers to assess whether the observed gradient in true-alert rates is influenced by selection or represents a broader property of the outlier detection method. revision: yes
Circularity Check
No significant circularity in derivation or evaluation
full rationale
The paper develops an outlier detection method on EHR patient-management decisions from 4486 cases, generates 222 alerts, and evaluates true alert rates (25-66%) via separate expert panel review as ground truth. No equations, fitted parameters, or self-citations are shown to reduce the central result to its inputs by construction; the evaluation relies on independent expert judgments rather than reusing the same outcomes or data for both model fitting and performance claims. This keeps the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A patient-management decision that is unusual with respect to past patient care may be due to an error
Reference graph
Works this paper leans on
-
[1]
To err is human: building a safer health system
Kohn LT, Corrigan JM, et al. To err is human: building a safer health system. National Academy Press; 2000
work page 2000
-
[2]
Is US health really the best in the world? JAMA 2000;284(4):483–5
Starfield B. Is US health really the best in the world? JAMA 2000;284(4):483–5
work page 2000
-
[3]
Costs of medical injuries in Utah and Colorado
Thomas EJ, Studdert DM, Newhouse JP. Costs of medical injuries in Utah and Colorado. Inquiry 1999;36:255–64
work page 1999
-
[4]
Classen DC, Resar R, Griffin F, Federico F, Frankel T, Kimmel N, et al. ‘Global Trigger Tool’ shows that adverse events in hospitals may be ten times greater than previously measured. Health Aff 2011;30:581–9
work page 2011
-
[5]
Adverse events in hospitals: national incidence among Medicare beneficiaries
Levinson DR. Adverse events in hospitals: national incidence among Medicare beneficiaries. Contract no.: Department of Health and Human Services, Office of the Inspector General, Report number OEI-06-09-00090; 2010
work page 2010
-
[6]
Temporal trends in rates of patient harm resulting from medical care
Landrigan CP, Parry GJ, Bones CB, Hackbarth AD, Goldmann DA, Sharek PJ. Temporal trends in rates of patient harm resulting from medical care. New Engl J Med 2010;363:2124–34
work page 2010
-
[7]
Conditional outlier detection for clinical alerting
Hauskrecht M, Valko M, Batal I, Clermont G, Visweswaran S, Cooper GF. Conditional outlier detection for clinical alerting. In: Proceedings of annual American Medical Informatics Association symposium; 2010. p. 286–90
work page 2010
-
[8]
Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv 2009;41(3)
work page 2009
-
[9]
Novelty detection: a review – part 1: statistical approaches
Markou M, Singh S. Novelty detection: a review – part 1: statistical approaches. Signal Process 2003;83:2481–97
work page 2003
-
[10]
Evidence-based anomaly detection
Hauskrecht M, Valko M, Kveton B, Visweswaran S, Cooper GF. Evidence-based anomaly detection. In: Proceedings of annual American Medical Informatics Association symposium; 2007. p. 319–324
work page 2007
-
[11]
Bates D et al. Ten commandments for effective clinical decision support: making the practice of evidence-based medicine a reality. J Am Med Inf Assoc 2003;10:523–30
work page 2003
-
[12]
Medical informatics: computer applications in health care and biomedicine
Shortliffe EH, Fagan LM, Perreault LE, Wiederhold G. Medical informatics: computer applications in health care and biomedicine. 2nd ed. New York: Springer Verlag; 2000
work page 2000
-
[13]
Computerized surveillance of adverse drug events in hospital patients
Classen DC, Pestotnik SL, Evans RS, Burke JP. Computerized surveillance of adverse drug events in hospital patients. JAMA 1991;266:2847–51
work page 1991
-
[14]
Medication-related clinical decision support in computerized provider order entry systems: a review
Kuperman GJ, Bobb A, Payne TH, Avery AJ, Gandhi TK, Burns G, et al. Medication-related clinical decision support in computerized provider order entry systems: a review. JAMA 2007;14:29–40
work page 2007
-
[15]
Adverse drug event trigger tool: a practical methodology for measuring medication related harm
Rozich JD, Haraden CR, Resar RK. Adverse drug event trigger tool: a practical methodology for measuring medication related harm. Qual Saf Health Care 2003;12:194–200
work page 2003
-
[16]
Jha AK, Kuperman GJ, Teich JM, Leape L, Shea B, Rittenberg E, et al. Identifying adverse drug events: development of a computer-based monitor and comparison with chart review and stimulated voluntary report. JAMA 1998;5:305–14
work page 1998
-
[17]
A computer-assisted management program for antibiotics and other antiinfective agents
Evans RS, Pestotnik SL, Classen DC, Clemmer TP, Weaver LK, Orme Jr JF, et al. A computer-assisted management program for antibiotics and other antiinfective agents. New Engl J Med 1998;338:232–8
work page 1998
-
[18]
Managing temporal worlds for medical trend diagnosis
Haimowitz IJ, Kohane IS. Managing temporal worlds for medical trend diagnosis. Artif Intell Med 1996;8(3):299–321
work page 1996
-
[19]
Clinical monitoring using regression-based trend templates
Haimowitz IJ, Le PP, et al. Clinical monitoring using regression-based trend templates. Artif Intell Med 1995;7(6):473–96
work page 1995
-
[20]
Temporal abstractions for interpreting diabetic patients monitoring data
Bellazzi R, Larizza C, Riva A. Temporal abstractions for interpreting diabetic patients monitoring data. Intell Data Anal 1998;2:97–122
work page 1998
-
[21]
Analysis of a failed clinical decision support system for management of congestive heart failure
Wadhwa RFD, Saul MI, Penrod LE, Visweswaran S, Cooper GF, Chapman W. Analysis of a failed clinical decision support system for management of congestive heart failure. In: Proceedings of the fall symposium of the American Medical Informatics Association; 2008. p. 773–777
work page 2008
-
[22]
Crying wolf: false alarms in a pediatric intensive care unit
Lawless ST. Crying wolf: false alarms in a pediatric intensive care unit. Crit Care Med 1994;22:981–5
work page 1994
-
[23]
Physicians’ decisions to override computerized drug alerts in primary care
Weingart SN, Toth M, Sands DZ, Aronson MD, Davis RB, Phillips RS. Physicians’ decisions to override computerized drug alerts in primary care. Arch Int Med 2003;163:2625–31
work page 2003
-
[24]
Hsieh TC, Kuperman GJ, Jaggi T, Hojnowski-Diaz P, Fiskio J, Williams DH, et al. Characteristics and consequences of drug allergy alert overrides in a computerized physician order entry system. JAMA 2004;11:482–91
work page 2004
-
[25]
The nature of statistical learning theory
Vapnik VN. The nature of statistical learning theory. New York: Springer- Verlag; 1995
work page 1995
-
[26]
LIBSVM: A library for support vector machines
Chang C-C, Lin, C-J. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2011;2(3):1–27. < http://www.csie.ntu.edu.tw/~cjlin/ libsvm>
work page 2011
-
[27]
Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
Platt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in max margin classifiers. MIT Press; 1999. p. 61–74
work page 1999
-
[28]
Probabilistic methods for support vector machines
Sollich P. Probabilistic methods for support vector machines. In: Advances in neural information processing systems; 2000. p. 349–55
work page 2000
-
[29]
Predicting good probabilities with supervised learning
Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on, machine learning; 2005. p. 625–32
work page 2005
-
[30]
The meaning and use of the area under a receiver operating characteristic (ROC) curve
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology; 1982
work page 1982
-
[31]
Feature importance analysis for patient management decisions
Valko M, Hauskrecht M. Feature importance analysis for patient management decisions. In: 13th International congress on medical informatics, Cape Town, South, Africa; 2010. p. 861–5
work page 2010
-
[32]
Post AR, Harrison JA. Temporal data mining. Clin Lab Med 2008;28(1):83–100
work page 2008
-
[33]
A coefficient of agreement for nominal scales
Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Measur 1960;20(1):37–46
work page 1960
-
[34]
Overriding of drug safety alerts in computerized physician order entry
VanDerSisj H, Aarts J, Vulto A, Berg M. Overriding of drug safety alerts in computerized physician order entry. J Am Med Inf Assoc 2006;13:138–47
work page 2006
-
[35]
Medication alert fatigue: the potential for compromised patient safety
Baker DE. Medication alert fatigue: the potential for compromised patient safety. Hospital Pharmacy, vol. 44, no. 6. Wolters Kluwer Health, Inc.; 2009. p. 460–2
work page 2009
-
[36]
Improving acceptance of computerized prescribing alerts in ambulatory care
Shah NR, Seger AC, Seger DL, Fiskio JM, Kuperman GJ, Blumenfeld B, et al. Improving acceptance of computerized prescribing alerts in ambulatory care. J Am Med Inf Assoc 2006;13(1):5–11
work page 2006
-
[37]
Factors influencing alert acceptance
Seidling HM, Phansalkar S, Seger DL, Paterno MD, Shaykevich S, Haefeli WE, et al. Factors influencing alert acceptance. a novel approach for predicting the success of clinical decision support. J Am Med Inf Assoc 2011;18(4): 479–84
work page 2011
-
[38]
Monitor alarm fatigue: standardizing use of physiological monitors and decreasing nuisance alarms
Graham KC, Cvach M. Monitor alarm fatigue: standardizing use of physiological monitors and decreasing nuisance alarms. Am J Crit Care 2010;19:28–34
work page 2010
-
[39]
Tiering drug–drug interaction alerts by severity increases compliance rates
Paterno MD, Maviglia SM, Gorman PN, Seger DL, Yoshida E, Seger AC, et al. Tiering drug–drug interaction alerts by severity increases compliance rates. J Am Med Inf Assoc 2009;16(1):40–6
work page 2009
-
[40]
Lee EK, Mejia AF, Senior T, Jose J. Improving patient safety through medical alert management: an automated decision tool to reduce alert fatigue. In: Proceedings of annual American Medical Informatics Association symposium
-
[41]
p. 417–21. M. Hauskrecht et al. / Journal of Biomedical Informatics 46 (2013) 47–55 55
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.