Can Physician Expertise Improve Machine Learning Identification of Delirium?

Lu Wang; Ruiheng Yu; Vicky Ye; Xinyu Qin

arxiv: 2606.30651 · v1 · pith:ZPJ7QZVTnew · submitted 2026-06-16 · 💻 cs.CY · cs.AI· cs.LG

Can Physician Expertise Improve Machine Learning Identification of Delirium?

Xinyu Qin , Vicky Ye , Ruiheng Yu , Lu Wang This is my paper

Pith reviewed 2026-07-01 07:34 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.LG

keywords delirium detectioninteractive machine learningphysician expertisefeature refinementSHAP explanationstemporal robustnesshuman-in-the-loophospital admissions

0 comments

The pith

Physician-guided feature refinement in interactive machine learning improves delirium detection over automated methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that involving physicians in an interactive machine learning process to refine features produces more accurate and stable models for identifying delirium in hospitalized patients. Delirium often goes undetected in routine care, so models that perform better could support timelier interventions. Using data from multiple Toronto hospitals, the framework has physicians guide feature choices and model checks alongside SHAP-based explanations, outperforming fully automated alternatives in discrimination and temporal stability. This human-in-the-loop setup keeps the resulting signals aligned with clinical knowledge rather than purely statistical patterns.

Core claim

The user-centered interactive machine learning framework integrates physician-guided feature refinement with interpretable modeling on 3,862 admissions from six Toronto hospitals. It demonstrates improved discrimination and temporal robustness over automated and baseline variants, with SHAP explanations pointing to meaningful clinical signals. This supports the framework as a practical way to build clinically relevant delirium models.

What carries the argument

The user-centered interactive machine learning (UC-iML) framework that combines physician-guided feature refinement with interpretable modeling and SHAP explanations.

If this is right

The framework achieves better overall discrimination for delirium than automated or baseline models.
It shows stronger temporal robustness when tested on later-phase validation cohorts.
The explanations produced highlight clinically meaningful signals rather than artifacts.
Interactive human-in-the-loop methods can function as a practical approach for clinically relevant medical modeling tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same physician-refinement step could be applied to machine learning models for other frequently missed hospital conditions.
Direct expert involvement may increase clinician trust in model outputs enough to change bedside decision-making.
Repeating the refinement process with physicians from varied specialties or regions could reveal whether the performance edge persists.

Load-bearing premise

Improvements from physician-guided feature refinement will generalize beyond the specific Toronto hospitals and physicians without the guidance process introducing bias or selection effects.

What would settle it

Testing the same physician-guided framework on admissions from a separate hospital network and observing no gain in discrimination or temporal robustness compared with automated baselines.

Figures

Figures reproduced from arXiv: 2606.30651 by Lu Wang, Ruiheng Yu, Vicky Ye, Xinyu Qin.

**Figure 2.** Figure 2: Data contained in a GEMINI project. 400,000 hospitalizations, supporting AI-driven analytics with validated accuracy of 98-100% across key data types [33]. B. Research Ethics Review Board (REB) Approval The Toronto Academic Health Science Network’s REB approved on the GEMINI study on 08/31/2019 (reference 15- 087), with an extension granted on 8/17/2020. This paper is also part of a GEMINI sub-study on AI… view at source ↗

**Figure 1.** Figure 1: iML frameworks for healthcare applications. (a) il [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 3.** Figure 3: Rolling holdout evaluation pipeline across 10 6-month [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Model interpretation results: (a) Feature importance [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of missing percentages in the original [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Delirium is common in hospitalized patients and is often missed in routine care. We present a user-centered interactive machine learning (UC-iML) framework for delirium detection support that combines physician-guided feature refinement with interpretable modeling. Using 3,862 labeled admissions from six Toronto hospitals in the General Medicine Inpatient Initiative (GEMINI), we integrate administrative variables, laboratory results, medications, and a radiology-derived text indicator. Physicians guide feature refinement and model evaluation, and Shapley Additive exPlanations (SHAP) are used to summarize feature attribution. We evaluate standard supervised classifiers with temporally separated holdout testing and a later-phase validation cohort. Compared with automated and baseline variants, the proposed framework shows better overall discrimination and stronger temporal robustness, while the explanations highlight clinically meaningful signals. These results support UC-iML as a practical human-in-the-loop framework for clinically relevant delirium modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies physician-guided feature refinement in an interactive ML setup to delirium detection on Toronto hospital data and claims better discrimination than baselines, but the abstract supplies no performance numbers and the single-site design leaves generalization untested.

read the letter

The core takeaway is that this work takes existing interactive ML ideas and applies them to delirium on the GEMINI administrative dataset from six Toronto hospitals. Physicians refine features drawn from labs, meds, admin variables, and a radiology text flag, then SHAP is used for explanations. They use temporal holdout plus a later validation cohort and report that the physician-in-the-loop version beats automated and baseline models on discrimination and robustness.

What the paper does reasonably is keep the setup practical and tied to a real clinical problem. Delirium is under-diagnosed, the data source is multi-hospital within one system, and the temporal split is a sensible check against time drift. Involving physicians directly in feature work is a concrete step beyond pure automation.

The soft spots are straightforward. The abstract contains no AUC values, no confidence intervals, no exact comparison protocol, and no description of how physician guidance was structured or checked for bias. All 3,862 admissions and the validation cohort come from the same Toronto sites, so the temporal split does not address whether the refined features or the guidance process itself would transfer elsewhere. The stress-test concern about site- and physician-specific effects therefore stands on the information given.

This is for readers already working on human-in-the-loop clinical ML who want an example applied to delirium. It is not yet strong enough on its own for broad claims about general methods. A serious editor should send it to peer review once the full manuscript supplies the missing numbers, methods details, and any external checks; the idea is worth proper evaluation even if the current evidence is preliminary.

Referee Report

2 major / 1 minor

Summary. The paper presents a user-centered interactive machine learning (UC-iML) framework for delirium detection that integrates physician-guided feature refinement with interpretable modeling. Using 3,862 labeled admissions from six Toronto hospitals in the GEMINI initiative (administrative variables, labs, medications, and radiology text), it evaluates standard classifiers on temporally separated holdout data and a later-phase validation cohort. The central claim is that the UC-iML approach yields better overall discrimination and temporal robustness than automated or baseline variants, with SHAP explanations highlighting clinically meaningful signals.

Significance. If the empirical improvements hold under rigorous external validation, the work would provide evidence that physician-in-the-loop feature refinement can enhance clinical ML models for underdiagnosed conditions like delirium while maintaining interpretability. The temporal robustness evaluation is a strength relative to purely cross-sectional designs. However, the single-institution scope limits claims about practical deployment.

major comments (2)

[Evaluation and Results] Evaluation section (and Results): The claim of superior discrimination and temporal robustness is based entirely on data from the same six Toronto hospitals. The temporal holdout controls for time shift within this ecosystem but does not isolate whether physician-guided features or the interactive process itself would produce comparable gains at other sites or with different physicians; any lift could reflect local coding practices or selection effects rather than a general property of UC-iML. This is load-bearing for the assertion that the framework is 'practical' for clinically relevant modeling.
[Methods] Methods: The operationalization of physician guidance (exact refinement steps, number of physicians, blinding, and how guidance was validated against an independent process) is not described in sufficient detail to determine whether reported gains are attributable to the framework or to unmeasured biases introduced by the specific physicians and site.

minor comments (1)

[Abstract] Abstract: Asserts 'better overall discrimination' without reporting numeric metrics (AUC, sensitivity, etc.), confidence intervals, or the exact statistical comparison method used against baselines, which hinders immediate assessment of effect size.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript describing the UC-iML framework for delirium detection. We address each major comment below, proposing revisions to enhance transparency and acknowledge limitations where appropriate.

read point-by-point responses

Referee: [Evaluation and Results] Evaluation section (and Results): The claim of superior discrimination and temporal robustness is based entirely on data from the same six Toronto hospitals. The temporal holdout controls for time shift within this ecosystem but does not isolate whether physician-guided features or the interactive process itself would produce comparable gains at other sites or with different physicians; any lift could reflect local coding practices or selection effects rather than a general property of UC-iML. This is load-bearing for the assertion that the framework is 'practical' for clinically relevant modeling.

Authors: We agree that the single-region scope (six Toronto hospitals within the GEMINI initiative) limits strong claims of generalizability, and the temporal holdout addresses time shifts only within this ecosystem. The improvements observed may partly reflect local practices. We will revise the Discussion and add a dedicated Limitations subsection to explicitly state that external multi-site validation is required before broader deployment claims, while retaining the finding of improved discrimination and robustness within the studied setting as a proof-of-concept for the UC-iML approach. revision: yes
Referee: [Methods] Methods: The operationalization of physician guidance (exact refinement steps, number of physicians, blinding, and how guidance was validated against an independent process) is not described in sufficient detail to determine whether reported gains are attributable to the framework or to unmeasured biases introduced by the specific physicians and site.

Authors: We acknowledge that the Methods section provides only a high-level overview of physician involvement in feature refinement and evaluation. In the revision, we will expand this section with additional details on the number of participating physicians, the specific interactive refinement steps performed, any blinding or documentation procedures used, and how the guidance process was recorded. This will improve reproducibility and allow better assessment of potential site-specific biases. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison on holdout data

full rationale

The paper reports an empirical ML study comparing UC-iML (physician-guided feature refinement) against automated and baseline variants on 3,862 GEMINI admissions with temporally separated holdout and later-phase validation. No equations, derivations, or first-principles predictions appear; performance metrics are computed directly from fitted models on independent test splits. No self-citation load-bearing steps, fitted-input-as-prediction reductions, or ansatz smuggling are present. The central claim rests on observable discrimination and robustness differences, which do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities. Standard ML assumptions such as i.i.d. sampling within temporal splits and that SHAP attributions reflect true clinical relevance are implicit but unstated.

pith-pipeline@v0.9.1-grok · 5684 in / 1152 out tokens · 26302 ms · 2026-07-01T07:34:34.812134+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Practice guideline for the treatment of patients with delirium.Am J Psychiatry, 156(5):1–39, 1999

American Psychiatric Association et al. Practice guideline for the treatment of patients with delirium.Am J Psychiatry, 156(5):1–39, 1999

1999
[2]

Anal- ysis of nasopharyngeal carcinoma risk factors with bayesian networks

Alex Aussem, S ´eRgio Rodrigues De Morais, and Marilys Corbex. Anal- ysis of nasopharyngeal carcinoma risk factors with bayesian networks. Artificial intelligence in Medicine, 54(1):53–62, 2012

2012
[3]

Impact of delirium and recall on the level of distress in patients with advanced cancer and their family caregivers.Cancer, 115(9):2004–2012, 2009

Eduardo Bruera, Shirley H Bush, Jie Willey, Timotheos Paraskevopou- los, Zhijun Li, J Lynn Palmer, Marlene Z Cohen, Debra Sivesind, and Ahmed Elsayem. Impact of delirium and recall on the level of distress in patients with advanced cancer and their family caregivers.Cancer, 115(9):2004–2012, 2009

2004
[4]

Revi- sions to the canadian emergency department triage and acuity scale (ctas) guidelines.Canadian Journal of Emergency Medicine, 16(6):485–489, 2014

Michael J Bullard, Tom Chan, Colleen Brayman, David Warren, Erin Musgrave, Bernard Unger, CTAS National Working Group, et al. Revi- sions to the canadian emergency department triage and acuity scale (ctas) guidelines.Canadian Journal of Emergency Medicine, 16(6):485–489, 2014

2014
[5]

Prediction of incident delirium using a random forest classifier.Journal of medical systems, 42(12):1–10, 2018

John P Corradi, Stephen Thompson, Jeffrey F Mather, Christine M Waszynski, and Robert S Dicks. Prediction of incident delirium using a random forest classifier.Journal of medical systems, 42(12):1–10, 2018

2018
[6]

Sentiment analysis: a comparative study on different approaches.Procedia Computer Science, 87:44–49, 2016

MD Devika, Cª Sunitha, and Amal Ganesh. Sentiment analysis: a comparative study on different approaches.Procedia Computer Science, 87:44–49, 2016

2016
[7]

Risk-adjusting hospital inpatient mor- tality using automated inpatient, outpatient, and laboratory databases

Gabriel J Escobar, John D Greene, Peter Scheirer, Marla N Gardner, David Draper, and Patricia Kipnis. Risk-adjusting hospital inpatient mor- tality using automated inpatient, outpatient, and laboratory databases. Medical care, pages 232–239, 2008

2008
[8]

Deep learning—a technology with the potential to transform health care.Jama, 320(11):1101–1102, 2018

Geoffrey Hinton. Deep learning—a technology with the potential to transform health care.Jama, 320(11):1101–1102, 2018

2018
[9]

Interactive machine learning: experimental evidence for the human in the algorithmic loop.Applied Intelligence, 49(7):2401–2414, 2019

Andreas Holzinger, Markus Plass, Michael Kickmeier-Rust, Katharina Holzinger, Gloria Cerasela Cris ¸an, Camelia-M Pintea, and Vasile Palade. Interactive machine learning: experimental evidence for the human in the algorithmic loop.Applied Intelligence, 49(7):2401–2414, 2019

2019
[10]

Hospital elder life program: systematic review and meta-analysis of effectiveness.The American Journal of Geriatric Psychiatry, 26(10):1015–1033, 2018

Tammy T Hshieh, Tinghan Yang, Sarah L Gartaganis, Jirong Yue, and Sharon K Inouye. Hospital elder life program: systematic review and meta-analysis of effectiveness.The American Journal of Geriatric Psychiatry, 26(10):1015–1033, 2018

2018
[11]

A multicomponent intervention to prevent delirium in hos- pitalized older patients.New England journal of medicine, 340(9):669– 676, 1999

Sharon K Inouye, Sidney T Bogardus Jr, Peter A Charpentier, Linda Leo-Summers, Denise Acampora, Theodore R Holford, and Leo M Cooney Jr. A multicomponent intervention to prevent delirium in hos- pitalized older patients.New England journal of medicine, 340(9):669– 676, 1999

1999
[12]

Sharon K Inouye, Linda Leo-Summers, Ying Zhang, Sidney T Bogar- dus Jr, Douglas L Leslie, and Joseph V Agostini. A chart-based method for identification of delirium: validation compared with interviewer ratings using the confusion assessment method.Journal of the American Geriatrics Society, 53(2):312–318, 2005

2005
[13]

Kdigo clinical practice guidelines for acute kidney injury

Arif Khwaja. Kdigo clinical practice guidelines for acute kidney injury. Nephron Clinical Practice, 120(4):c179–c184, 2012

2012
[14]

One-year health care costs associated with delirium in the elderly population.Archives of internal medicine, 168(1):27–32, 2008

Douglas L Leslie, Edward R Marcantonio, Ying Zhang, Linda Leo- Summers, and Sharon K Inouye. One-year health care costs associated with delirium in the elderly population.Archives of internal medicine, 168(1):27–32, 2008

2008
[15]

Prefix: Understand and adapt to user preference in human-agent interaction

Jialin Li, Zhenhao Chen, Hanjun Luo, and Hanan Salam. Prefix: Understand and adapt to user preference in human-agent interaction. arXiv preprint arXiv:2602.06714, 2026

work page arXiv 2026
[16]

Systematic review of prediction models for delirium in the older adult inpatient.BMJ open, 8(4):e019223, 2018

Heidi Lindroth, Lisa Bratzke, Suzanne Purvis, Roger Brown, Mark Coburn, Marko Mrkobrada, Matthew TV Chan, Daniel HJ Davis, Pratik Pandharipande, Cynthia M Carlsson, et al. Systematic review of prediction models for delirium in the older adult inpatient.BMJ open, 8(4):e019223, 2018

2018
[17]

Agentauditor: Human-level safety and security evaluation for llm agents.Advances in Neural Information Processing Systems, 38:43241–43298, 2026

Hanjun Luo, Shenyu Dai, Chiming Ni, Xinfeng Li, Guibin Zhang, Kun Wang, Tongliang Liu, and Hanan Salam. Agentauditor: Human-level safety and security evaluation for llm agents.Advances in Neural Information Processing Systems, 38:43241–43298, 2026

2026
[18]

CentaurEval: Benchmarking Human-in-the-Loop Value in Agentic Coding

Hanjun Luo, Chiming Ni, Jiaheng Wen, Zhimu Huang, Yiran Wang, Bingduo Liao, Sylvia Chung, Yingbin Jin, Xinfeng Li, Wenyuan Xu, et al. Hai-eval: Measuring human-ai synergy in collaborative coding. arXiv preprint arXiv:2512.04111, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Acute brain failure: pathophysiology, diagnosis, management, and sequelae of delirium.Critical care clinics, 33(3):461– 519, 2017

Jos ´e R Maldonado. Acute brain failure: pathophysiology, diagnosis, management, and sequelae of delirium.Critical care clinics, 33(3):461– 519, 2017

2017
[20]

Underreporting of delirium in statewide claims data: implications for clinical care and predictive modeling.Psychosomatics, 57(5):480– 488, 2016

Thomas H McCoy Jr, Leslie Snapper, Theodore A Stern, and Roy H Perlis. Underreporting of delirium in statewide claims data: implications for clinical care and predictive modeling.Psychosomatics, 57(5):480– 488, 2016

2016
[21]

Factors associated with increased risk of exacerbation and hospital admission in a cohort of ambulatory copd patients: a multiple logistic regression analysis

Marc Miravitlles, Tina Guerrero, Cristina Mayordomo, Leopoldo S´anchez-Agudo, Felip Nicolau, and Jos ´e Luis Seg ´u. Factors associated with increased risk of exacerbation and hospital admission in a cohort of ambulatory copd patients: a multiple logistic regression analysis. Respiration, 67(5):495–501, 2000

2000
[22]

Prediction and early detection of delirium in the intensive care unit by using heart rate variability and machine learning.Physiological measurement, 39(3):035004, 2018

Jooyoung Oh, Dongrae Cho, Jaesub Park, Se Hee Na, Jongin Kim, Jae- seok Heo, Cheung Soo Shin, Jae-Jin Kim, Jin Young Park, and Boreom Lee. Prediction and early detection of delirium in the intensive care unit by using heart rate variability and machine learning.Physiological measurement, 39(3):035004, 2018

2018
[23]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- esnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

2011
[24]

sage, 2003

Marjorie A Pett, Nancy R Lackey, John J Sullivan, et al.Making sense of factor analysis: The use of factor analysis for instrument development in health care research. sage, 2003

2003
[25]

Up- dating and validating the charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries

Hude Quan, Bing Li, Chantal M Couris, Kiyohide Fushimi, Patrick Graham, Phil Hider, Jean-Marie Januel, and Vijaya Sundararajan. Up- dating and validating the charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. American journal of epidemiology, 173(6):676–682, 2011

2011
[26]

Big data analytics in healthcare: promise and potential.Health information science and systems, 2(1):1–10, 2014

Wullianallur Raghupathi and Viju Raghupathi. Big data analytics in healthcare: promise and potential.Health information science and systems, 2(1):1–10, 2014

2014
[27]

Validation of a delirium risk assessment using electronic medical record information.Journal of the American Medical Directors Association, 17(3):244–248, 2016

James L Rudolph, Kelly Doherty, Brittany Kelly, Jane A Driver, and Elizabeth Archambault. Validation of a delirium risk assessment using electronic medical record information.Journal of the American Medical Directors Association, 17(3):244–248, 2016

2016
[28]

A tale of two methods: chart and interview methods for identifying delirium.Journal of the American Geriatrics Society, 62(3):518–524, 2014

Jane S Saczynski, Cyrus M Kosar, Guoquan Xu, Margaret R Puelle, Eva Schmitt, Richard N Jones, Edward R Marcantonio, Bonnie Wong, Ilean Isaza, and Sharon K Inouye. A tale of two methods: chart and interview methods for identifying delirium.Journal of the American Geriatrics Society, 62(3):518–524, 2014

2014
[29]

Mimic ii: a massive temporal icu patient database to support research in intelligent patient monitoring

Mohammed Saeed, Christine Lieu, Greg Raber, and Roger G Mark. Mimic ii: a massive temporal icu patient database to support research in intelligent patient monitoring. InComputers in cardiology, pages 641–644. IEEE, 2002

2002
[30]

Improving recognition of delirium in clinical practice: a call for action.BMC geriatrics, 12(1):1–5, 2012

Andrew Teodorczuk, Emma Reynish, and Koen Milisen. Improving recognition of delirium in clinical practice: a call for action.BMC geriatrics, 12(1):1–5, 2012

2012
[31]

Neuropsy- chiatric aspects of delirium

Paula T Trzepacz, David J Meagher, and Michael G Wise. Neuropsy- chiatric aspects of delirium. 2004

2004
[32]

Amol A Verma, Yishan Guo, Janice L Kwan, Lauren Lapointe-Shaw, Shail Rawal, Terence Tang, Adina Weinerman, and Fahad Razak. Prevalence and costs of discharge diagnoses in inpatient general internal medicine: a multi-center cross-sectional study.Journal of general internal medicine, 33(11):1899–1904, 2018

1904
[33]

Amol A Verma, Sachin V Pasricha, Hae Young Jung, Vladyslav Kushnir, Denise YF Mak, Radha Koppula, Yishan Guo, Janice L Kwan, Lauren Lapointe-Shaw, Shail Rawal, et al. Assessing the quality of clinical and administrative data extracted from hospitals: the general medicine inpatient initiative (gemini) experience.Journal of the American Medical Informatics ...

2021
[34]

Physician experience design (pxd): More usable machine learning prediction for clinical decision making

Lu Wang, Mark Chignell, Yilun Zhang, Andrew Pinto, Fahad Razak, Kathleen Sheehan, and Amol Verma. Physician experience design (pxd): More usable machine learning prediction for clinical decision making. InAMIA Annual Symposium Proceedings, volume 2022, page

2022
[35]

American Medical Informatics Association, 2022

2022
[36]

A systematic review of risk factors for delirium in the icu.Critical care medicine, 43(1):40–47, 2015

Irene J Zaal, John W Devlin, Linda M Peelen, and Arjen JC Slooter. A systematic review of risk factors for delirium in the icu.Critical care medicine, 43(1):40–47, 2015

2015

[1] [1]

Practice guideline for the treatment of patients with delirium.Am J Psychiatry, 156(5):1–39, 1999

American Psychiatric Association et al. Practice guideline for the treatment of patients with delirium.Am J Psychiatry, 156(5):1–39, 1999

1999

[2] [2]

Anal- ysis of nasopharyngeal carcinoma risk factors with bayesian networks

Alex Aussem, S ´eRgio Rodrigues De Morais, and Marilys Corbex. Anal- ysis of nasopharyngeal carcinoma risk factors with bayesian networks. Artificial intelligence in Medicine, 54(1):53–62, 2012

2012

[3] [3]

Impact of delirium and recall on the level of distress in patients with advanced cancer and their family caregivers.Cancer, 115(9):2004–2012, 2009

Eduardo Bruera, Shirley H Bush, Jie Willey, Timotheos Paraskevopou- los, Zhijun Li, J Lynn Palmer, Marlene Z Cohen, Debra Sivesind, and Ahmed Elsayem. Impact of delirium and recall on the level of distress in patients with advanced cancer and their family caregivers.Cancer, 115(9):2004–2012, 2009

2004

[4] [4]

Revi- sions to the canadian emergency department triage and acuity scale (ctas) guidelines.Canadian Journal of Emergency Medicine, 16(6):485–489, 2014

Michael J Bullard, Tom Chan, Colleen Brayman, David Warren, Erin Musgrave, Bernard Unger, CTAS National Working Group, et al. Revi- sions to the canadian emergency department triage and acuity scale (ctas) guidelines.Canadian Journal of Emergency Medicine, 16(6):485–489, 2014

2014

[5] [5]

Prediction of incident delirium using a random forest classifier.Journal of medical systems, 42(12):1–10, 2018

John P Corradi, Stephen Thompson, Jeffrey F Mather, Christine M Waszynski, and Robert S Dicks. Prediction of incident delirium using a random forest classifier.Journal of medical systems, 42(12):1–10, 2018

2018

[6] [6]

Sentiment analysis: a comparative study on different approaches.Procedia Computer Science, 87:44–49, 2016

MD Devika, Cª Sunitha, and Amal Ganesh. Sentiment analysis: a comparative study on different approaches.Procedia Computer Science, 87:44–49, 2016

2016

[7] [7]

Risk-adjusting hospital inpatient mor- tality using automated inpatient, outpatient, and laboratory databases

Gabriel J Escobar, John D Greene, Peter Scheirer, Marla N Gardner, David Draper, and Patricia Kipnis. Risk-adjusting hospital inpatient mor- tality using automated inpatient, outpatient, and laboratory databases. Medical care, pages 232–239, 2008

2008

[8] [8]

Deep learning—a technology with the potential to transform health care.Jama, 320(11):1101–1102, 2018

Geoffrey Hinton. Deep learning—a technology with the potential to transform health care.Jama, 320(11):1101–1102, 2018

2018

[9] [9]

Interactive machine learning: experimental evidence for the human in the algorithmic loop.Applied Intelligence, 49(7):2401–2414, 2019

Andreas Holzinger, Markus Plass, Michael Kickmeier-Rust, Katharina Holzinger, Gloria Cerasela Cris ¸an, Camelia-M Pintea, and Vasile Palade. Interactive machine learning: experimental evidence for the human in the algorithmic loop.Applied Intelligence, 49(7):2401–2414, 2019

2019

[10] [10]

Hospital elder life program: systematic review and meta-analysis of effectiveness.The American Journal of Geriatric Psychiatry, 26(10):1015–1033, 2018

Tammy T Hshieh, Tinghan Yang, Sarah L Gartaganis, Jirong Yue, and Sharon K Inouye. Hospital elder life program: systematic review and meta-analysis of effectiveness.The American Journal of Geriatric Psychiatry, 26(10):1015–1033, 2018

2018

[11] [11]

A multicomponent intervention to prevent delirium in hos- pitalized older patients.New England journal of medicine, 340(9):669– 676, 1999

Sharon K Inouye, Sidney T Bogardus Jr, Peter A Charpentier, Linda Leo-Summers, Denise Acampora, Theodore R Holford, and Leo M Cooney Jr. A multicomponent intervention to prevent delirium in hos- pitalized older patients.New England journal of medicine, 340(9):669– 676, 1999

1999

[12] [12]

Sharon K Inouye, Linda Leo-Summers, Ying Zhang, Sidney T Bogar- dus Jr, Douglas L Leslie, and Joseph V Agostini. A chart-based method for identification of delirium: validation compared with interviewer ratings using the confusion assessment method.Journal of the American Geriatrics Society, 53(2):312–318, 2005

2005

[13] [13]

Kdigo clinical practice guidelines for acute kidney injury

Arif Khwaja. Kdigo clinical practice guidelines for acute kidney injury. Nephron Clinical Practice, 120(4):c179–c184, 2012

2012

[14] [14]

One-year health care costs associated with delirium in the elderly population.Archives of internal medicine, 168(1):27–32, 2008

Douglas L Leslie, Edward R Marcantonio, Ying Zhang, Linda Leo- Summers, and Sharon K Inouye. One-year health care costs associated with delirium in the elderly population.Archives of internal medicine, 168(1):27–32, 2008

2008

[15] [15]

Prefix: Understand and adapt to user preference in human-agent interaction

Jialin Li, Zhenhao Chen, Hanjun Luo, and Hanan Salam. Prefix: Understand and adapt to user preference in human-agent interaction. arXiv preprint arXiv:2602.06714, 2026

work page arXiv 2026

[16] [16]

Systematic review of prediction models for delirium in the older adult inpatient.BMJ open, 8(4):e019223, 2018

Heidi Lindroth, Lisa Bratzke, Suzanne Purvis, Roger Brown, Mark Coburn, Marko Mrkobrada, Matthew TV Chan, Daniel HJ Davis, Pratik Pandharipande, Cynthia M Carlsson, et al. Systematic review of prediction models for delirium in the older adult inpatient.BMJ open, 8(4):e019223, 2018

2018

[17] [17]

Agentauditor: Human-level safety and security evaluation for llm agents.Advances in Neural Information Processing Systems, 38:43241–43298, 2026

Hanjun Luo, Shenyu Dai, Chiming Ni, Xinfeng Li, Guibin Zhang, Kun Wang, Tongliang Liu, and Hanan Salam. Agentauditor: Human-level safety and security evaluation for llm agents.Advances in Neural Information Processing Systems, 38:43241–43298, 2026

2026

[18] [18]

CentaurEval: Benchmarking Human-in-the-Loop Value in Agentic Coding

Hanjun Luo, Chiming Ni, Jiaheng Wen, Zhimu Huang, Yiran Wang, Bingduo Liao, Sylvia Chung, Yingbin Jin, Xinfeng Li, Wenyuan Xu, et al. Hai-eval: Measuring human-ai synergy in collaborative coding. arXiv preprint arXiv:2512.04111, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

Acute brain failure: pathophysiology, diagnosis, management, and sequelae of delirium.Critical care clinics, 33(3):461– 519, 2017

Jos ´e R Maldonado. Acute brain failure: pathophysiology, diagnosis, management, and sequelae of delirium.Critical care clinics, 33(3):461– 519, 2017

2017

[20] [20]

Underreporting of delirium in statewide claims data: implications for clinical care and predictive modeling.Psychosomatics, 57(5):480– 488, 2016

Thomas H McCoy Jr, Leslie Snapper, Theodore A Stern, and Roy H Perlis. Underreporting of delirium in statewide claims data: implications for clinical care and predictive modeling.Psychosomatics, 57(5):480– 488, 2016

2016

[21] [21]

Factors associated with increased risk of exacerbation and hospital admission in a cohort of ambulatory copd patients: a multiple logistic regression analysis

Marc Miravitlles, Tina Guerrero, Cristina Mayordomo, Leopoldo S´anchez-Agudo, Felip Nicolau, and Jos ´e Luis Seg ´u. Factors associated with increased risk of exacerbation and hospital admission in a cohort of ambulatory copd patients: a multiple logistic regression analysis. Respiration, 67(5):495–501, 2000

2000

[22] [22]

Prediction and early detection of delirium in the intensive care unit by using heart rate variability and machine learning.Physiological measurement, 39(3):035004, 2018

Jooyoung Oh, Dongrae Cho, Jaesub Park, Se Hee Na, Jongin Kim, Jae- seok Heo, Cheung Soo Shin, Jae-Jin Kim, Jin Young Park, and Boreom Lee. Prediction and early detection of delirium in the intensive care unit by using heart rate variability and machine learning.Physiological measurement, 39(3):035004, 2018

2018

[23] [23]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- esnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

2011

[24] [24]

sage, 2003

Marjorie A Pett, Nancy R Lackey, John J Sullivan, et al.Making sense of factor analysis: The use of factor analysis for instrument development in health care research. sage, 2003

2003

[25] [25]

Up- dating and validating the charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries

Hude Quan, Bing Li, Chantal M Couris, Kiyohide Fushimi, Patrick Graham, Phil Hider, Jean-Marie Januel, and Vijaya Sundararajan. Up- dating and validating the charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. American journal of epidemiology, 173(6):676–682, 2011

2011

[26] [26]

Big data analytics in healthcare: promise and potential.Health information science and systems, 2(1):1–10, 2014

Wullianallur Raghupathi and Viju Raghupathi. Big data analytics in healthcare: promise and potential.Health information science and systems, 2(1):1–10, 2014

2014

[27] [27]

Validation of a delirium risk assessment using electronic medical record information.Journal of the American Medical Directors Association, 17(3):244–248, 2016

James L Rudolph, Kelly Doherty, Brittany Kelly, Jane A Driver, and Elizabeth Archambault. Validation of a delirium risk assessment using electronic medical record information.Journal of the American Medical Directors Association, 17(3):244–248, 2016

2016

[28] [28]

A tale of two methods: chart and interview methods for identifying delirium.Journal of the American Geriatrics Society, 62(3):518–524, 2014

Jane S Saczynski, Cyrus M Kosar, Guoquan Xu, Margaret R Puelle, Eva Schmitt, Richard N Jones, Edward R Marcantonio, Bonnie Wong, Ilean Isaza, and Sharon K Inouye. A tale of two methods: chart and interview methods for identifying delirium.Journal of the American Geriatrics Society, 62(3):518–524, 2014

2014

[29] [29]

Mimic ii: a massive temporal icu patient database to support research in intelligent patient monitoring

Mohammed Saeed, Christine Lieu, Greg Raber, and Roger G Mark. Mimic ii: a massive temporal icu patient database to support research in intelligent patient monitoring. InComputers in cardiology, pages 641–644. IEEE, 2002

2002

[30] [30]

Improving recognition of delirium in clinical practice: a call for action.BMC geriatrics, 12(1):1–5, 2012

Andrew Teodorczuk, Emma Reynish, and Koen Milisen. Improving recognition of delirium in clinical practice: a call for action.BMC geriatrics, 12(1):1–5, 2012

2012

[31] [31]

Neuropsy- chiatric aspects of delirium

Paula T Trzepacz, David J Meagher, and Michael G Wise. Neuropsy- chiatric aspects of delirium. 2004

2004

[32] [32]

Amol A Verma, Yishan Guo, Janice L Kwan, Lauren Lapointe-Shaw, Shail Rawal, Terence Tang, Adina Weinerman, and Fahad Razak. Prevalence and costs of discharge diagnoses in inpatient general internal medicine: a multi-center cross-sectional study.Journal of general internal medicine, 33(11):1899–1904, 2018

1904

[33] [33]

Amol A Verma, Sachin V Pasricha, Hae Young Jung, Vladyslav Kushnir, Denise YF Mak, Radha Koppula, Yishan Guo, Janice L Kwan, Lauren Lapointe-Shaw, Shail Rawal, et al. Assessing the quality of clinical and administrative data extracted from hospitals: the general medicine inpatient initiative (gemini) experience.Journal of the American Medical Informatics ...

2021

[34] [34]

Physician experience design (pxd): More usable machine learning prediction for clinical decision making

Lu Wang, Mark Chignell, Yilun Zhang, Andrew Pinto, Fahad Razak, Kathleen Sheehan, and Amol Verma. Physician experience design (pxd): More usable machine learning prediction for clinical decision making. InAMIA Annual Symposium Proceedings, volume 2022, page

2022

[35] [35]

American Medical Informatics Association, 2022

2022

[36] [36]

A systematic review of risk factors for delirium in the icu.Critical care medicine, 43(1):40–47, 2015

Irene J Zaal, John W Devlin, Linda M Peelen, and Arjen JC Slooter. A systematic review of risk factors for delirium in the icu.Critical care medicine, 43(1):40–47, 2015

2015