pith. sign in

arxiv: 2502.05740 · v2 · submitted 2025-02-09 · 💻 cs.HC · cs.AI

RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care

Pith reviewed 2026-05-23 03:41 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords remote patient monitoringlarge language modelspostoperative caregastrointestinal cancerparticipatory designconversational agentclinical dashboardhealthcare AI
0
0 comments X

The pith

RECOVER integrates six design strategies from participatory sessions into an LLM-based remote monitoring system for GI cancer patients after surgery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes how researchers designed RECOVER, an LLM-powered remote patient monitoring system for postoperative gastrointestinal cancer care. They ran seven participatory design sessions with five clinical staff and interviewed five patients to identify six strategies that embed clinical guidelines and patient needs into LLM applications. From those strategies the team built a conversational agent that patients can talk to and an interactive dashboard that staff can use to track recovery. A pilot test with four staff and five patients then evaluated how well the strategies worked in practice. The work shows a concrete way to bring stakeholder input into LLM tools for a setting where complications after cancer surgery are hard to predict and can be serious.

Core claim

Through participatory design sessions with clinical staff and interviews with cancer patients the authors derived six major design strategies for integrating clinical guidelines and information needs into LLM-based remote patient monitoring systems. These strategies shaped the implementation of RECOVER, which includes an LLM-powered conversational agent for patients and an interactive dashboard for staff, and were assessed in a pilot study that identified crucial design elements and offered implications for responsible AI use in postoperative care.

What carries the argument

Six major design strategies derived from participatory design sessions for integrating clinical guidelines and information needs into LLM-based RPM systems.

If this is right

  • The conversational agent supplies patients with responses aligned to their specific postoperative information needs.
  • The interactive dashboard enables clinical staff to review patient data and LLM-generated insights more efficiently than before.
  • Embedding the six strategies ensures LLM outputs remain consistent with established clinical guidelines.
  • Pilot feedback highlights specific design elements needed for responsible AI deployment in this clinical context.
  • The same strategies can guide development of additional LLM-powered remote monitoring tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The participatory method used here could be repeated for designing LLM tools that monitor recovery after other types of major surgery.
  • A larger study that tracks actual rates of early complication detection would be required to confirm clinical value beyond the pilot.
  • Linking the dashboard to existing electronic health record systems could reduce manual data entry and increase staff adoption.

Load-bearing premise

The six design strategies drawn from seven sessions with five clinical staff and five patient interviews are sufficient to produce an effective and responsible LLM integration for postoperative remote monitoring.

What would settle it

A controlled trial in which patients monitored with the RECOVER conversational agent show no reduction in undetected postoperative complications compared with patients using standard follow-up calls would show the design strategies do not deliver the intended clinical benefit.

Figures

Figures reproduced from arXiv: 2502.05740 by Bingsheng Yao, Collin Campbell, Dakuo Wang, Guodong Gao, Jehan El-Bayoumi, Jennifer Bagdasarian, Nawar Shara, Ritu Agarwal, Vedant Das Swain, Waddah Al-Refaire, Yuxuan Lu, Ziqi Yang.

Figure 1
Figure 1. Figure 1: Overview of RECOVER, an LLM-powered RPM system integrating a conversational agent and an interactive dashboard [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PD sessions and their participants, discussion artifacts, and agenda. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A PD session where participants commented on the conversation flow, and dashboard design version; the research [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Dashboard Iteration Process. In each section, we present the key design versions of the corresponding module, [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: System architecture of RECOVER. The red, purple, and blue arrows represent data generated by the Conversation [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Final Design of the RECOVER Dashboard. We present three key sections and the major interaction flow that connects them: (1) from patient list to patient detail, (2a) from patient list to key questions, (2b) from key questions visualization to detailed log, and (3) from daily report to summary. Each section also includes local interactions to review and manage patient reports. Using the Prompt. We implement… view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of Key Questions: An example of the local interactions within each section. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Technical Architecture. Our system backend modules are built upon OpenAI API and connected to a Postgre SQL [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Our user study session. A participant is navigating on the dashboard to complete the given task to review patient [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The first version of dashboard interface design. This version only include simple interactions like selecting patient [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Example of participants’ comments in Google Docs. [PITH_FULL_IMAGE:figures/full_fig_p034_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Example of conversation flow iteration. , Vol. 1, No. 1, Article . Publication date: February 2023 [PITH_FULL_IMAGE:figures/full_fig_p035_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Participants discussed the color coding depending on the priority of a symptom, and which questions should include [PITH_FULL_IMAGE:figures/full_fig_p036_13.png] view at source ↗
read the original abstract

Cancer surgery is a key treatment for gastrointestinal (GI) cancers, a group of cancers that account for more than 35% of cancer-related deaths worldwide, but postoperative complications are unpredictable and can be life-threatening. In this paper, we investigate how recent advancements in large language models (LLMs) can benefit remote patient monitoring (RPM) systems through clinical integration by designing RECOVER, an LLM-powered RPM system for postoperative GI cancer care. To closely engage stakeholders in the design process, we first conducted seven participatory design sessions with five clinical staff and interviewed five cancer patients to derive six major design strategies for integrating clinical guidelines and information needs into LLM-based RPM systems. We then designed and implemented RECOVER, which features an LLM-powered conversational agent for cancer patients and an interactive dashboard for clinical staff to enable efficient postoperative RPM. Finally, we used RECOVER as a pilot system to assess the implementation of our design strategies with four clinical staff and five patients, providing design implications by identifying crucial design elements, offering insights on responsible AI, and outlining opportunities for future LLM-powered RPM systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims to have engaged stakeholders via seven participatory design sessions with five clinical staff and interviews with five cancer patients to derive six major design strategies for LLM integration in remote patient monitoring. These informed the design and implementation of RECOVER, featuring an LLM-powered conversational agent for patients and an interactive dashboard for staff. A pilot with four clinical staff and five patients was then conducted to assess the strategies, yielding design implications on crucial elements, responsible AI, and future opportunities for LLM-powered RPM in postoperative GI cancer care.

Significance. If the reported design process and implications hold, the work contributes a stakeholder-centered case study to HCI research on clinical AI systems. Strengths include the explicit participatory approach with both staff and patients, the translation of sessions into concrete system features, and the pilot evaluation that surfaces responsible-AI considerations. These elements provide transferable insights for integrating clinical guidelines into LLM-based RPM without overclaiming efficacy or generalizability.

minor comments (2)
  1. [Abstract] Abstract: the claim that the pilot 'assess[es] the implementation of our design strategies' would be strengthened by briefly naming the evaluation method (e.g., thematic analysis of interviews or observation notes) so readers can judge how the design implications were generated.
  2. [Abstract] Abstract: the participant counts (five staff, five patients for design; four staff, five patients for pilot) are appropriate for early-stage work but could be accompanied by a short statement on recruitment and session structure to clarify the depth of engagement.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The feedback correctly identifies the participatory design process, translation into system features, and responsible-AI insights as core contributions. No major comments were enumerated in the report.

Circularity Check

0 steps flagged

No significant circularity; qualitative design study with no equations or fitted predictions

full rationale

The paper describes a participatory design process yielding six strategies from sessions, followed by system implementation and a small pilot evaluation. No mathematical derivations, parameters, predictions, uniqueness theorems, or self-referential reductions exist. All claims are descriptive of the design activities performed; the central output (design implications) is directly produced by the reported sessions and pilot rather than derived from prior fitted values or self-citations that collapse the argument. This matches the default expectation for non-circular qualitative HCI work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard HCI assumptions that participatory design with small stakeholder groups yields generalizable strategies and that LLM conversational agents can be safely integrated into clinical workflows when guided by those strategies. No free parameters, invented entities, or ad-hoc mathematical axioms are introduced.

axioms (1)
  • domain assumption Participatory design sessions with clinical staff and patients produce actionable and responsible design strategies for LLM-based clinical systems.
    Invoked in the derivation of the six major design strategies from the seven sessions and interviews.

pith-pipeline@v0.9.0 · 5771 in / 1173 out tokens · 25270 ms · 2026-05-23T03:41:19.964198+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

141 extracted references · 141 canonical work pages · 4 internal anchors

  1. [1]

    [n. d.]. Common Terminology Criteria for Adverse Events (CTCAE) | Protocol Development | CTEP. https://ctep.cancer.gov/protocolDevelopment/electronic_applications/ctc.htm

  2. [2]

    [n. d.]. Fitbit Wear-Time and Patterns of Activity in Cancer Survivors throughout a Physical Activ- ity Intervention and Follow-up: Exploratory Analysis from a Randomised Controlled Trial | PLOS ONE. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0240967

  3. [3]

    Daniel A Adler, Yuewen Yang, Thalia Viranda, Xuhai Xu, David C Mohr, Anna R Van Meter, Julia C Tartaglia, Nicholas C Jacobson, Fei Wang, Deborah Estrin, et al. 2024. Beyond Detection: Towards Actionable Sensing Research in Clinical Mental Healthcare. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 8, 4 (2024), 1–33

  4. [4]

    Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, and David Sontag. 2022. Large Language Models Are Few-Shot Clinical Information Extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, Unite...

  5. [5]

    Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, and David Sontag. 2022. Large Language Models are Few-Shot Clinical Information Extractors. http://arxiv.org/abs/2205.12689 arXiv:2205.12689 [cs]

  6. [6]

    Lakshmi Arbatti, Abhishek Hosamath, Vikram Ramanarayanan, and Ira Shoulson. 2023. What Do Patients Say About Their Disease Symptoms? Deep Multilabel Text Classification With Human-in-the-Loop Curation for Automatic Labeling of Patient Self Reports of , Vol. 1, No. 1, Article . Publication date: February 2023. RECOVER • 25 Problems

  7. [7]

    Abnet, Rachel E

    Melina Arnold, Christian C. Abnet, Rachel E. Neale, Jerome Vignat, Edward L. Giovannucci, Katherine A. McGlynn, and Freddie Bray. 2020. Global Burden of 5 Major Types of Gastrointestinal Cancer. Gastroenterology 159, 1 (July 2020), 335–349.e15. https: //doi.org/10.1053/j.gastro.2020.02.068

  8. [8]

    Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, et al . 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion 58 (2020), 82–115

  9. [9]

    Aaron Bangor, Philip T Kortum, and James T Miller. 2008. An empirical evaluation of the system usability scale. Intl. Journal of Human–Computer Interaction 24, 6 (2008), 574–594

  10. [10]

    Emma Beede, Elizabeth Baylor, Fred Hersch, Anna Iurchenko, Lauren Wilcox, Paisan Ruamviboonsuk, and Laura M Vardoulakis. 2020. A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy. In Proceedings of the 2020 CHI conference on human factors in computing systems . 1–12

  11. [11]

    Karthik S Bhat, Mohit Jain, and Neha Kumar. 2021. Infrastructuring Telehealth in (In)Formal Patient-Doctor Contexts. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (Oct. 2021), 1–28. https://doi.org/10.1145/3476064

  12. [12]

    Diogo Branco, Margarida Móteiro, Raquel Bouça-Machado, Rita Miranda, Tiago Reis, Élia Decoroso, Rita Cardoso, Joana Ramalho, Filipa Rato, Joana Malheiro, et al. 2024. Co-designing Customizable Clinical Dashboards with Multidisciplinary Teams: Bridging the Gap in Chronic Disease Care. In Proceedings of the CHI Conference on Human Factors in Computing Syste...

  13. [13]

    Hylke JF Brenkman, Leonie Haverkamp, Jelle P Ruurda, and Richard van Hillegersberg. 2016. Worldwide practice in gastric cancer surgery. World journal of gastroenterology 22, 15 (2016), 4041

  14. [14]

    John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (11 1995)

  15. [15]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

  16. [16]

    Coburn, Anna Gagliardi, Barbara-Anne Maier, Elisa Greco, Linda Last, Andrew J

    Jonathan Cardella, Natalie G. Coburn, Anna Gagliardi, Barbara-Anne Maier, Elisa Greco, Linda Last, Andrew J. Smith, Calvin Law, and Frances Wright. 2008. Compliance, Attitudes and Barriers to Post-Operative Colorectal Cancer Follow-Up. Journal of Evaluation in Clinical Practice 14, 3 (2008), 407–415. https://doi.org/10.1111/j.1365-2753.2007.00880.x

  17. [17]

    Lauren Carmichael, Rose Rocca, Erin Laing, Phoebe Ashford, Jesse Collins, Luke Jackson, Lauren McPherson, Brydie Pendergast, and Nicole Kiss. 2022. Early Postoperative Feeding Following Surgery for Upper Gastrointestinal Cancer: A Systematic Review. Journal of Human Nutrition and Dietetics 35, 1 (2022), 33–48. https://doi.org/10.1111/jhn.12930

  18. [18]

    Marco Cascella, Jonathan Montomoli, Valentina Bellini, and Elena Bignami. 2023. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. Journal of Medical Systems 47, 1 (2023), 33

  19. [19]

    Mango Mango, How to Let The Lettuce Dry Without A Spinner?

    Szeyi Chan, Jiachen Li, Bingsheng Yao, Amama Mahmood, Chien-Ming Huang, Holly Jimison, Elizabeth D. Mynatt, and Dakuo Wang. 2023. "Mango Mango, How to Let The Lettuce Dry Without A Spinner?”: Exploring User Perceptions of Using An LLM-Based Conversational Assistant Toward Cooking Partner. https://doi.org/10.48550/arXiv.2310.05853 arXiv:2310.05853 [cs]

  20. [20]

    Rajesh Chandwani and Neha Kumar. 2018. Stitching Infrastructures to Facilitate Telemedicine for Low-Resource Environments. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems . ACM, Montreal QC Canada, 1–12. https: //doi.org/10.1145/3173574.3173958

  21. [21]

    Chen, Chinmaya U

    Kevin A. Chen, Chinmaya U. Joisa, Karyn B. Stitzenberg, Jonathan Stem, Jose G. Guillem, Shawn M. Gomez, and Muneera R. Kapadia. 2022. Development and Validation of Machine Learning Models to Predict Readmission After Colorectal Surgery. Journal of Gastrointestinal Surgery 26, 11 (Nov. 2022), 2342–2350. https://doi.org/10.1007/s11605-022-05443-5

  22. [22]

    Yuxuan Chen, Haoyan Yang, Hengkai Pan, Fardeen Siddiqui, Antonio Verdone, Qingyang Zhang, Sumit Chopra, Chen Zhao, and Yiqiu Shen. 2024. Burextract-llama: An llm for clinical concept extraction in breast ultrasound reports. In Proceedings of the 1st International Workshop on Multimedia Computing for Health and Medicine . 53–58

  23. [23]

    Chloe Chira, Evangelos Mathioudis, Christina Michailidou, Pantelis Agathangelou, Georgia Christodoulou, Ioannis Katakis, Efstratios Kontopoulos, and Konstantinos Avgerinakis. 2022. An Affective Multi-modal Conversational Agent for Non Intrusive Data Collection from Patients with Brain Diseases. In International Workshop on Chatbot Research and Design . Sp...

  24. [24]

    Avishek Choudhury, Onur Asan, et al. 2020. Role of artificial intelligence in patient safety outcomes: systematic literature review. JMIR medical informatics 8, 7 (2020), e18599

  25. [25]

    Juliet Clark and Aisling Kelliher. 2021. Understanding the Needs and Values of Rehabilitation Therapists in Designing and Implementing Telehealth Solutions. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems . ACM, Yokohama Japan, 1–6. https://doi.org/10.1145/3411763.3451704

  26. [26]

    I Glenn Cohen and Michelle M Mello. 2019. Big data, big tech, and protecting patient privacy. Jama 322, 12 (2019), 1141–1142. , Vol. 1, No. 1, Article . Publication date: February 2023. 26 • Yang et al

  27. [27]

    Sarah E Cousins, Emma Tempest, and David J Feuer. 2016. Surgery for the resolution of symptoms in malignant bowel obstruction in advanced gynaecological and gastrointestinal cancer. Cochrane Database of Systematic Reviews 1 (2016)

  28. [28]

    J Desrame, V Heinschild, C Desauw, AC Fuerea, P Artru, S Javed, T Papazyan, C Ferté, M Autheman, M Valery, et al . 2024. 595P Adoption of remote patient monitoring in gastrointestinal oncology: A real-world experience from 1822 patients across 47 centers in France and Belgium. Annals of Oncology 35 (2024), S478

  29. [29]

    Virginia Dignum. 2019. Responsible artificial intelligence: how to develop and use AI in a responsible way . Vol. 2156. Springer

  30. [30]

    Hartman, Emily C

    Nickolas Dreher, Edward Kenji Hadeler, Sheri J. Hartman, Emily C. Wong, Irene Acerbi, Hope S. Rugo, Melanie Catherine Majure, Amy Jo Chien, Laura J. Esserman, and Michelle E. Melisko. 2019. Fitbit Usage in Patients With Breast Cancer Undergoing Chemotherapy. Clinical Breast Cancer 19, 6 (Dec. 2019), 443–449.e1. https://doi.org/10.1016/j.clbc.2019.05.005

  31. [31]

    Tim Dwyer, Graeme Hoit, David Burns, James Higgins, Justin Chang, Daniel Whelan, Irene Kiroplis, and Jaskarndip Chahal. 2023. Use of an artificial intelligence conversational agent (chatbot) for hip arthroscopy patients following surgery. Arthroscopy, Sports Medicine, and Rehabilitation 5, 2 (2023), e495–e505

  32. [32]

    Emilio Ferrara. 2024. Large language models for wearable sensor-based human activity recognition, health monitoring, and behavioral modeling: A survey of early trends, datasets, and challenges. Sensors 24, 15 (2024), 5045

  33. [33]

    Inc. Figma. 2024. Figma: the collaborative interface design tool . https://www.figma.com

  34. [34]

    A. K. Garth, C. M. Newsome, N. Simmance, and T. C. Crowe. 2010. Nutritional Status, Nutrition Practices and Post-Operative Complications in Patients with Gastrointestinal Cancer. Journal of Human Nutrition and Dietetics 23, 4 (2010), 393–401. https: //doi.org/10.1111/j.1365-277X.2010.01058.x

  35. [35]

    Michelle Louise Gatt, Maria Cassar, and Sandra C Buttigieg. 2022. A review of literature on risk prediction tools for hospital readmissions in older adults. Journal of Health Organization and Management 36, 4 (2022), 521–557

  36. [36]

    L Geoghegan, A Scarborough, JCR Wormald, CJ Harrison, D Collins, M Gardiner, Julie Bruce, and JN Rodrigues. 2021. Automated conversational agents for post-intervention follow-up: a systematic review. BJS open 5, 4 (2021), zrab070

  37. [37]

    Alireza Ghods, Armin Shahrokni, Hassan Ghasemzadeh, and Diane Cook. 2021. Remote monitoring of the performance status and burden of symptoms of patients with gastrointestinal cancer via a consumer-based activity tracker: quantitative cohort study. JMIR cancer 7, 4 (2021), e22931

  38. [38]

    Gonçalves-Bradley, Ana Rita J

    Daniela C. Gonçalves-Bradley, Ana Rita J. Maria, Ignacio Ricci-Cabello, Gemma Villanueva, Marita S. Fønhus, Claire Glenton, Simon Lewin, Nicholas Henschke, Brian S. Buckley, Garrett L. Mehl, Tigest Tamrat, and Sasha Shepperd. 2020. Mobile technologies to support healthcare provider to healthcare provider communication and management of care. Cochrane Data...

  39. [39]

    den Hamer, Perry Schoor, Tobias B

    Danny M. den Hamer, Perry Schoor, Tobias B. Polak, and Daniel Kapitan. 2023. Improving Patient Pre-screening for Clinical Trials: Assisting Physicians with Large Language Models. http://arxiv.org/abs/2304.07396 arXiv:2304.07396 [cs]

  40. [40]

    Yuexing Hao, Jason Holmes, Mark Waddle, Nathan Yu, Kirstin Vickers, Heather Preston, Drew Margolin, Corinna E Löckenhoff, Aditya Vashistha, Marzyeh Ghassemi, et al. 2024. Outlining the Borders for LLM Applications in Patient Education: Developing an Expert-in-the-Loop LLM-Powered Chatbot for Prostate Cancer Patient Education. arXiv preprint arXiv:2409.191...

  41. [41]

    Yuexing Hao, Zeyu Liu, Robert N Riter, and Saleh Kalantari. 2024. Advancing Patient-Centered Shared Decision-Making with AI Systems for Older Adult Cancer Patients. In Proceedings of the CHI Conference on Human Factors in Computing Systems . 1–20

  42. [42]

    Claudia E Haupt and Mason Marks. 2023. AI-generated medical advice—GPT and beyond. Jama 329, 16 (2023), 1349–1350

  43. [43]

    Elizabeth Healey and Isaac Kohane. 2024. LLM-CGM: A Benchmark for Large Language Model-Enabled Querying of Continuous Glucose Monitoring Data for Conversational Diabetes Management. In Biocomputing 2025: Proceedings of the Pacific Symposium . World Scientific, 82–93

  44. [44]

    Thanh Cong Ho, Farah Kharrat, Abderrazek Abid, Fakhri Karray, and Anis Koubaa. 2024. REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring. In 2024 IEEE International Symposium on Medical Measurements and Applications (MeMeA). IEEE, 1–6

  45. [45]

    Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems . ACM, Hamburg Germany, 1–19. https://doi.org/10.1145/3544548.3580688

  46. [46]

    Maia Jacobs, Jeremy Johnson, and Elizabeth D. Mynatt. 2018. MyPath: Investigating Breast Cancer Patients’ Use of Personalized Health Information. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (Nov. 2018), 78:1–78:21. https://doi.org/10.1145/3274347

  47. [47]

    Epstein, Hyunhoon Jung, and Young-Ho Kim

    Eunkyung Jo, Daniel A. Epstein, Hyunhoon Jung, and Young-Ho Kim. 2023. Understanding the Benefits and Challenges of Deploying Conversational AI Leveraging Large Language Models for Public Health Intervention. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems . ACM, Hamburg Germany, 1–16. https://doi.org/10.1145/3544548.3581503

  48. [48]

    Maura Kennedy, Richard A Enander, Sarah P Tadiri, Richard E Wolfe, Nathan I Shapiro, and Edward R Marcantonio. 2014. Delirium risk prediction, healthcare use and mortality of elderly adults in the emergency department. Journal of the American Geriatrics Society 62, 3 (2014), 462–469. , Vol. 1, No. 1, Article . Publication date: February 2023. RECOVER • 27

  49. [49]

    Charalampia (Xaroula) Kerasidou, Angeliki Kerasidou, Monika Buscher, and Stephen Wilkinson. 2022. Before and beyond Trust: Reliance in Medical AI. Journal of Medical Ethics 48, 11 (Nov. 2022), 852–856. https://doi.org/10.1136/medethics-2020-107095

  50. [50]

    King, Judith Moskowitz, Begum Egilmez, Shibo Zhang, Lida Zhang, Michael Bass, John Rogers, Roozbeh Ghaffari, Laurie Wakschlag, and Nabil Alshurafa

    Zachary D. King, Judith Moskowitz, Begum Egilmez, Shibo Zhang, Lida Zhang, Michael Bass, John Rogers, Roozbeh Ghaffari, Laurie Wakschlag, and Nabil Alshurafa. 2019. Micro-Stress EMA: A Passive Sensing Framework for Detecting in-the-Wild Stress in Pregnant Mothers. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (Se...

  51. [51]

    A Baki Kocaballi, Kiran Ijaz, Liliana Laranjo, Juan C Quiroz, Dana Rezazadegan, Huong Ly Tong, Simon Willcock, Shlomo Berkovsky, and Enrico Coiera. 2020. Envisioning an artificial intelligence documentation assistant for future primary care consultations: A co-design study with general practitioners. Journal of the American Medical Informatics Association...

  52. [52]

    Julia Lai-Kwon, Claudia Rutherford, Stephanie Best, Thai Ly, Iris Zhang, Catherine Devereux, Dishan Herath, Kate Burbury, and Michael Jefford. 2024. Co-design of an electronic patient-reported outcome symptom monitoring system for immunotherapy toxicities. Supportive Care in Cancer 32, 12 (2024), 1–14

  53. [53]

    Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining. Bioinformatics (Sept. 2019), btz682. https://doi.org/10/ggh5qq

  54. [54]

    Peter Lee, Sebastien Bubeck, and Joseph Petro. 2023. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine 388, 13 (2023), 1233–1239

  55. [55]

    Maria Alejandra León, Valeria Pannunzio, and Maaike Kleinsmann. 2022. The impact of perioperative remote patient monitoring on clinical staff workflows: scoping review. JMIR Human Factors 9, 2 (2022), e37204

  56. [56]

    James R Lewis. 2018. The system usability scale: past, present, and future. International Journal of Human–Computer Interaction 34, 7 (2018), 577–590

  57. [57]

    Brenna Li, Ofek Gross, Noah Crampton, Mamta Kapoor, Saba Tauseef, Mohit Jain, Khai N Truong, and Alex Mariakakis. 2024. Beyond the Waiting Room: Patient’s Perspectives on the Conversational Nuances of Pre-Consultation Chatbots. In Proceedings of the CHI Conference on Human Factors in Computing Systems . 1–24

  58. [58]

    Binbin Li, Tianxin Meng, Xiaoming Shi, Jie Zhai, and Tong Ruan. 2023. Meddm: Llm-executable clinical guidance tree for clinical decision-making. arXiv preprint arXiv:2312.02441 (2023)

  59. [59]

    Hongjin Lin, Tessa Han, Krzysztof Z Gajos, and Anoopum S Gupta. 2024. Hevelius Report: Visualizing Web-Based Mobility Test Data For Clinical Decision and Learning Support. In Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility. 1–10

  60. [60]

    Xin Liu, Daniel McDuff, Geza Kovacs, Isaac Galatzer-Levy, Jacob Sunshine, Jiening Zhan, Ming-Zher Poh, Shun Liao, Paolo Di Achille, and Shwetak Patel. 2023. Large Language Models Are Few-Shot Health Learners. https://doi.org/10.48550/arXiv.2305.15525 arXiv:2305.15525 [cs]

  61. [61]

    Yuxuan Lu, Jingya Yan, Zhixuan Qi, Zhongzheng Ge, and Yongping Du. 2022. Contextual Embedding and Model Weighting by Fusing Domain Knowledge on Biomedical Question Answering. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (BCB ’22) . Association for Computing Machinery, New York, NY,...

  62. [62]

    Amama Mahmood, Junxiang Wang, Bingsheng Yao, Dakuo Wang, and Chien-Ming Huang. 2023. LLM-Powered Conversational Voice Assistants: Interaction Patterns, Opportunities, Challenges, and Design Guidelines. arXiv:2309.13879 [cs]

  63. [63]

    Malasinghe, Naeem Ramzan, and Keshav Dahal

    Lakmini P. Malasinghe, Naeem Ramzan, and Keshav Dahal. 2019. Remote Patient Monitoring: A Comprehensive Study. Journal of Ambient Intelligence and Humanized Computing 10, 1 (Jan. 2019), 57–76. https://doi.org/10.1007/s12652-017-0598-x

  64. [64]

    Jayson S Marwaha, Adam B Landman, Gabriel A Brat, Todd Dunn, and William J Gordon. 2022. Deploying digital health tools within large, complex health systems: key considerations for adoption and implementation. NPJ digital medicine 5, 1 (2022), 13

  65. [65]

    George Michalopoulos, Yuanxin Wang, Hussam Kaka, Helen Chen, and Alexander Wong. 2021. UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus. https://doi.org/10.48550/arXiv. 2010.10391 arXiv:2010.10391 [cs]

  66. [66]

    Lucian Mocan. 2021. Surgical Management of Gastric Cancer: A Systematic Review. Journal of Clinical Medicine 10, 12 (Jan. 2021),

  67. [67]

    https://doi.org/10.3390/jcm10122557

  68. [68]

    Sara Montagna, Stefano Ferretti, Lorenz Cuno Klopfenstein, Antonio Florio, and Martino Francesco Pengo. 2023. Data Decentralisation of LLM-Based Chatbot Systems in Chronic Disease Self-Management. In Proceedings of the 2023 ACM Conference on Information Technology for Social Good. 205–212

  69. [69]

    Blake Murdoch. 2021. Privacy and artificial intelligence: challenges for protecting health information in a new era. BMC Medical Ethics 22 (2021), 1–5

  70. [70]

    Varun Nair, Elliot Schumacher, and Anitha Kannan. 2023. Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models. http://arxiv.org/abs/2305.05982 arXiv:2305.05982 [cs]

  71. [71]

    Lin Ni, Chenhao Lu, Niu Liu, and Jiamou Liu. 2017. Mandy: Towards a smart primary care chatbot application. In International symposium on knowledge and systems sciences . Springer, 38–52. , Vol. 1, No. 1, Article . Publication date: February 2023. 28 • Yang et al

  72. [72]

    Harsha Nori, Nicholas King, Scott Mayer McKinney, Dean Carignan, and Eric Horvitz. 2023. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375 (2023)

  73. [73]

    Benedict U Nwachukwu, Nathan H Varady, Answorth A Allen, Joshua S Dines, David W Altchek, Riley J Williams III, and Kyle N Kunze

  74. [74]

    Arthroscopy: The Journal of Arthroscopic & Related Surgery (2024)

    Currently available large language models do not provide musculoskeletal treatment recommendations that are concordant with evidence-based clinical practice guidelines. Arthroscopy: The Journal of Arthroscopic & Related Surgery (2024)

  75. [75]

    Rónán O’Caoimh, Nicola Cornally, Elizabeth Weathers, Ronan O’Sullivan, Carol Fitzgerald, Francesc Orfila, Roger Clarnette, Constança Paúl, and D William Molloy. 2015. Risk prediction in the community: A systematic review of case-finding instruments that predict adverse healthcare outcomes in community-dwelling older adults. Maturitas 82, 1 (2015), 3–21

  76. [76]

    Batyrkhan Omarov, Sergazi Narynov, Zhandos Zhumanov, Elmira Alzhanova, Aidana Gumar, and Mariyam Khassanova. 2022. Artificial Intelligence Enabled Conversational Agent for Mental Healthcare. International journal of health sciences 6, 3 (Oct. 2022), 1544–1555. https://doi.org/10.53730/ijhs.v6n3.13239

  77. [77]

    OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt

  78. [78]

    OpenAI. 2023. GPT-4. https://openai.com/index/gpt-4/ Accessed: 2025-01-18

  79. [79]

    OpenAI. 2023. GPT-4 Technical Report. https://doi.org/10.48550/arXiv.2303.08774 arXiv:2303.08774 [cs]

  80. [80]

    Valeria Pannunzio, Hosana Cristina Morales Ornelas, Pema Gurung, Robert van Kooten, Dirk Snelders, Hendrikus van Os, Michel Wouters, Rob Tollenaar, Douwe Atsma, and Maaike Kleinsmann. 2024. Patient and Staff Experience of Remote Patient Monitoring—What to Measure and How: Systematic Review. Journal of Medical Internet Research 26 (2024), e48463

Showing first 80 references.