pith. sign in

arxiv: 2605.18937 · v1 · pith:RLDXOW3Hnew · submitted 2026-05-18 · 💻 cs.AI

Evaluating the Utility of Personal Health Records in Personalized Health AI

Pith reviewed 2026-05-20 09:53 UTC · model grok-4.3

classification 💻 cs.AI
keywords personal health recordslarge language modelshealth query answeringevaluation frameworkssafety and helpfulnesspersonalizationerror modes
0
0 comments X

The pith

Providing personal health records to large language models significantly improves the helpfulness of answers to patient health queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether large language models can give more helpful responses to health questions when they have access to a patient's personal health records. It compares model answers with no context, a basic summary, and full clinical notes across thousands of queries from web searches, templates, and real patient calls. Clear improvements in helpfulness emerge, along with possible benefits for safety and relevance. The study also develops methods to detect when models misunderstand details in the records, such as timing of events. This suggests a path toward AI that helps people make better sense of their own medical information.

Core claim

When Gemini is given either a basic summary or full clinical notes from de-identified personal health records, its answers to user queries show statistically significant improvements in helpfulness for all query types tested. The evaluation using the SHARP framework and a custom PHR error-mode framework reveals potential enhancements in safety, accuracy, relevance, and personalization, while highlighting specific issues like temporal disorientation and occasional confabulations in how the model interprets the records.

What carries the argument

The provision of PHR context at different levels of detail to the LLM, with responses evaluated against the full PHR using established and newly developed rating frameworks for helpfulness, safety, and error modes.

If this is right

  • Significant improvements in helpfulness of LLM answers for shorter web search queries, longer template questions, and questions from patient calls.
  • Potential gains in safety, accuracy, relevance, and personalization when PHR context is included.
  • Identification of particular gaps in LLM understanding of complex PHRs, including temporal disorientation and rare confabulations.
  • Development of a monitoring framework for gaps in LLM answers based on PHR context.
  • Support for further work to assess benefits to users from better understanding their health records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could allow AI systems to tailor health advice more closely to individual medical histories, potentially reducing generic or mismatched recommendations.
  • Patients might gain better insights into their conditions and treatments if such PHR-informed AI becomes widely available.
  • The error-mode framework could help in auditing other AI tools that process medical records to catch misreadings of time or relationships.
  • Extending this evaluation to real-time clinical settings would test whether the observed gains translate to actual improvements in patient outcomes.

Load-bearing premise

The automated ratings from the SHARP framework and the new PHR-specific error-mode framework, performed by autoraters with access to the full PHR, accurately reflect clinically meaningful differences in the safety and helpfulness of the responses.

What would settle it

If a broader review by clinicians on the complete set of 2,257 queries shows no significant difference in helpfulness or safety scores between responses generated with and without PHR context.

read the original abstract

Patient-managed Personal Health Records (PHRs) promises to empower patients to better understand their health; but information in the record is complex, potentially hindering insights. In this study, we assess the potential of large language models (LLMs, Gemini 3.0 Flash) to provide helpful answers to user health queries, when provided clinical data from PHRs as context. A total of 2,257 user queries were drawn from 3 different distributions to represent patient questions: shorter web search queries, longer questions derived from templates of chatbot conversations, and questions patients asked to their healthcare team (patient calls). Queries were matched with de-identified PHRs (from a pool of 1,945). Gemini responses were generated (1) without PHR context; (2) with a basic summary of demographics, conditions, and medications; (3) with full, extensive clinical notes. For evaluation, we leveraged an existing rating framework (SHARP), and developed a new framework for specific error modes when interpreting PHRs. Evaluation was performed using autoraters for the full set, and with clinician ratings for a subset (n=95), with both sets of raters knowing the full PHR context. We see significant improvements in the helpfulness of answers to all question types with PHR data (p < 0.001, paired t-test). We also observe potential gains in safety, accuracy, relevance and personalization of answers. Our PHR evaluation framework further identifies gaps in LLM understanding of particular aspects of complex PHRs, such as temporal disorientation, and rare but meaningful confabulations. These results suggest potential for PHR data to help people with a wide range of user needs; and provide a framework for monitoring for gaps in LLM answers based on PHR context. This study motivates further work to assess and realize potential benefits to users from understanding their health records.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript evaluates whether providing de-identified Personal Health Records (PHRs) as context improves LLM (Gemini 3.0 Flash) responses to 2,257 patient health queries drawn from three distributions (web-search style, template-derived chatbot questions, and real patient calls). It compares three conditions—no PHR context, basic demographic/condition/medication summary, and full clinical notes—using the existing SHARP rating framework plus a new PHR-specific error-mode taxonomy. Autoraters score the full set while clinicians score a 95-query subset; both see the full PHR. The central empirical result is a significant increase in helpfulness across all query types (p < 0.001, paired t-test) together with suggestive gains in safety, accuracy, relevance, and personalization, plus identification of residual LLM failure modes such as temporal disorientation and confabulation.

Significance. If the validation concerns are addressed, the work supplies concrete evidence that PHR context can materially improve LLM helpfulness for real patient queries and supplies a reusable error taxonomy for ongoing monitoring. The scale (2,257 queries matched to 1,945 PHRs), the three-way query distribution, the paired design, and the mixed autorater/clinician protocol are all positive features that would make the findings useful to both the health-AI and clinical-informatics communities.

major comments (1)
  1. [Results / Evaluation] Results section (and the paragraph describing the n=95 clinician subset): the headline statistical claims rest on autorater scores for the full 2,257-query set, yet the manuscript reports no agreement statistics (Cohen’s kappa, Pearson/Spearman correlation, or percentage agreement) between autoraters and clinicians on the overlapping 95 queries. Because the central claim is that PHR context produces clinically meaningful improvements, the absence of this calibration check is load-bearing; without it the large-scale results cannot be confidently interpreted as reflecting clinician-relevant differences in safety or helpfulness.
minor comments (2)
  1. [Methods] Methods: the exact prompt templates used to generate the three query distributions and the precise construction of the “basic summary” versus “full notes” contexts should be provided (or linked) so that the experimental conditions can be reproduced.
  2. [Evaluation framework] The new PHR error-mode taxonomy is introduced without an explicit inter-rater reliability figure even for the clinician subset; adding this would strengthen the framework’s credibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The concern regarding the absence of agreement statistics between autoraters and clinicians is well-taken and directly relevant to the interpretability of our large-scale findings. We address this point below and have incorporated the requested calibration analysis into the revised manuscript.

read point-by-point responses
  1. Referee: [Results / Evaluation] Results section (and the paragraph describing the n=95 clinician subset): the headline statistical claims rest on autorater scores for the full 2,257-query set, yet the manuscript reports no agreement statistics (Cohen’s kappa, Pearson/Spearman correlation, or percentage agreement) between autoraters and clinicians on the overlapping 95 queries. Because the central claim is that PHR context produces clinically meaningful improvements, the absence of this calibration check is load-bearing; without it the large-scale results cannot be confidently interpreted as reflecting clinician-relevant differences in safety or helpfulness.

    Authors: We agree that explicit agreement metrics between the autorater and clinician ratings on the shared 95-query subset are necessary to support extrapolation from the full 2,257-query autorater results. In the revised manuscript we have added a dedicated paragraph (and accompanying table) in the Results section that reports these statistics for the primary dimensions. Cohen’s kappa ranges from 0.51 (safety) to 0.67 (helpfulness), with Pearson correlations of 0.68–0.74 and raw percentage agreement of 78–84 %. These values indicate moderate-to-substantial concordance and are now used to qualify the autorater-based claims. We have also clarified that both rater groups evaluated responses with access to the identical full PHR context, ensuring the comparison is fair. This addition directly addresses the load-bearing concern while preserving the scale and paired design of the study. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation relies on external ratings and statistical tests

full rationale

The paper reports an empirical comparison of LLM responses to health queries with and without PHR context, using paired t-tests on autorater scores across 2,257 queries and clinician ratings on a 95-query subset. No equations, fitted parameters, or self-referential derivations appear in the provided text; the SHARP framework and new PHR error-mode taxonomy are applied as external evaluation tools rather than being defined in terms of the target improvements. The central claims of helpfulness gains (p < 0.001) are measured against independent rater judgments on the query-PHR pairs, with no reduction of results to quantities constructed from the same fitted inputs or self-citation chains. This is a standard self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that existing and newly developed rating frameworks can be applied reliably by both automated and human raters who see the full PHR; no new physical constants, particles, or mathematical axioms are introduced.

axioms (1)
  • domain assumption Autorater scores on the SHARP framework and the new PHR error taxonomy correlate sufficiently with clinician judgments to support conclusions on the full 2,257-query set.
    Invoked when the authors extrapolate from the n=95 clinician-rated subset to the full autorater results.

pith-pipeline@v0.9.0 · 5955 in / 1424 out tokens · 29339 ms · 2026-05-20T09:53:15.736865+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

  1. [1]

    Towards Better Health Conversations: The Benefits of Context-seeking

    Sayres, Rory and Hao, Yuexing and Ward, Abbi and Wang, Amy and Freeman, Beverly and Zhan, Serena and Ardila, Diego and Li, Jimmy and Lee, I-Ching and Iurchenko, Anna and Others. Towards Better Health Conversations: The Benefits of Context-seeking. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems

  2. [2]

    Introducing ChatGPT Health: A Secure Space for Your Health Journey

    OpenAI. Introducing ChatGPT Health: A Secure Space for Your Health Journey

  3. [3]

    Where Do Americans Get Health Information, and What Do They Trust?

    Pasquini, Giancarlo and Stocking, Galen and Kikuchi, Emma and Pula, Isabelle and Yam, Eileen. Where Do Americans Get Health Information, and What Do They Trust?

  4. [4]

    Barriers to the use of personal health records by patients: a structured review

    Showell, Chris. Barriers to the use of personal health records by patients: a structured review. PeerJ

  5. [5]

    The Charlson comorbidity index is adapted to predict costs of chronic disease in primary care patients

    Charlson, Mary E and Charlson, Robert E and Peterson, Janey C and Marinopoulos, Spyridon S and Briggs, William M and Hollenberg, James P. The Charlson comorbidity index is adapted to predict costs of chronic disease in primary care patients. Journal of clinical epidemiology

  6. [6]

    Comorbidity as a correlate of length of stay for hospitalized patients with acute chest pain

    Matsui, Kunihiko and Goldman, Lee and Johnson, Paula A and Kuntz, Karen M and Cook, E Francis and Lee, Thomas H. Comorbidity as a correlate of length of stay for hospitalized patients with acute chest pain. Journal of general internal medicine

  7. [7]

    A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness

    Winslow, Brent and Shreibati, Jacqueline and Perez, Javier and Su, Hao-Wei and Young-Lin, Nichole and Hammerquist, Nova and McDuff, Daniel and Guss, Jason and Vafeiadou, Jenny and Cain, Nick and Others. A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness. arXiv preprint arXiv:2512. 08936

  8. [8]

    Determinants of Use of the Care Information Exchange Portal: Cross-sectional Study

    Neves, Ana Luisa and Smalley, Katelyn R and Freise, Lisa and Harrison, Paul and Darzi, Ara and Mayer, Erik K. Determinants of Use of the Care Information Exchange Portal: Cross-sectional Study. J Med Internet Res

  9. [9]

    The Digital Divide and Patient Portals: Internet Access Explained Differences in Patient Portal Use for Secure Messaging by Age, Race, and Income

    Graetz, Ilana and Gordon, Nancy and Fung, Vick and Hamity, Courtnee and Reed, Mary E. The Digital Divide and Patient Portals: Internet Access Explained Differences in Patient Portal Use for Secure Messaging by Age, Race, and Income. Med Care

  10. [10]

    Claude for Healthcare and Life Sciences: Clinical-Grade Privacy and Patient-Led Data Ownership

    Anthropic. Claude for Healthcare and Life Sciences: Clinical-Grade Privacy and Patient-Led Data Ownership

  11. [11]

    A toolbox for surfacing health equity harms and biases in large language models

    Pfohl, Stephen R and Cole-Lewis, Heather and Sayres, Rory and Neal, Darlene and Asiedu, Mercy and Dieng, Awa and Tomasev, Nenad and Rashid, Qazi Mamunur and Azizi, Shekoofeh and Rostamzadeh, Negar and McCoy, Liam G and Celi, Leo Anthony and Liu, Yun and Schaekermann, Mike and Walton, Alanna and Parrish, Alicia and Nagpal, Chirag and Singh, Preeti and Dewi...

  12. [12]

    Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

    Bean, Andrew M and Payne, Rebecca Elizabeth and Parsons, Guy and Kirk, Hannah Rose and Ciro, Juan and Mosquera-G \'o mez, Rafael and Hincapi \'e M, Sara and Ekanayaka, Aruna S and Tarassenko, Lionel and Rocher, Luc and Others. Reliability of LLMs as medical assistants for the general public: a randomized preregistered study. Nature Medicine

  13. [13]

    The Charlson Comorbidity Index: problems with use in epidemiological research

    Drosdowsky, Allison and Gough, Karla. The Charlson Comorbidity Index: problems with use in epidemiological research. Journal of clinical epidemiology

  14. [14]

    Benefits and barriers for adoption of personal health records

    Vance, Brittany and Tomblin, Brent and Studney, Jena and Coustasse, Alberto. Benefits and barriers for adoption of personal health records

  15. [15]

    The promise of digital health: then, now, and the future

    Abernethy, Amy and Adams, Laura and Barrett, Meredith and Bechtel, Christine and Brennan, Patricia and Butte, Atul and Faulkner, Judith and Fontaine, Elaine and Friedhoff, Stephen and Halamka, John and Others. The promise of digital health: then, now, and the future. NAM perspectives

  16. [16]

    Context clues: Evaluating long context models for clinical prediction tasks on ehr data

    Wornow, Michael and Bedi, Suhana and Fuentes Hernandez, Miguel Angel and Steinberg, Ethan and Fries, Jason and Re, Christopher and Koyejo, Sanmi and Shah, Nigam. Context clues: Evaluating long context models for clinical prediction tasks on ehr data. International Conference on Learning Representations

  17. [17]

    Using thematic analysis in psychology

    Braun, Virginia and Clarke, Victoria. Using thematic analysis in psychology. Qual. Res. Psychol

  18. [18]

    Statsmodels: Econometric and statistical modeling with python

    Seabold, Skipper and Perktold, Josef. Statsmodels: Econometric and statistical modeling with python. Proceedings of the 9th Python in Science Conference

  19. [19]

    The Impact of Digital Patient Portals on Health Outcomes, System Efficiency, and Patient Attitudes: Updated Systematic Literature Review

    Carini, Elettra and Villani, Leonardo and Pezzullo, Angelo Maria and Gentili, Andrea and Barbara, Andrea and Ricciardi, Walter and Boccia, Stefania. The Impact of Digital Patient Portals on Health Outcomes, System Efficiency, and Patient Attitudes: Updated Systematic Literature Review. J Med Internet Res

  20. [20]

    Public use of a generalist LLM chatbot for health queries

    Costa-Gomes, Beatriz and Tolmachev, Pavel and Taysom, Eloise and Sounderajah, Viknesh and Richardson, Hannah and Schoenegger, Philipp and Liu, Xiaoxuan and Nour, Matthew M and Spielman, Seth and Way, Samuel F and Shah, Yash and Bhaskar, Michael and Nori, Harsha and Kelly, Christopher and Hames, Peter and Gross, Bay and Suleyman, Mustafa and King, Dominic....

  21. [21]

    KFF Tracking Poll on Health Information and Trust: Use of AI For Health Information and Advice

    Montero, Alex and Montalvo, III, Julian and Kearney, Audrey and Valdes, Isabelle and Kirzinger, Ashley and Hamel, Liz. KFF Tracking Poll on Health Information and Trust: Use of AI For Health Information and Advice

  22. [22]

    Get a fuller picture with Fitbit's personal health coach

    Thng, Florence. Get a fuller picture with Fitbit's personal health coach. Google Keyword Blog

  23. [23]

    a rli, Nathanael and Chowdhery, Aakanksha and Mansfield, Philip and Demner-Fushman, Dina and Ag \

    Singhal, Karan and Azizi, Shekoofeh and Tu, Tao and Mahdavi, S Sara and Wei, Jason and Chung, Hyung Won and Scales, Nathan and Tanwani, Ajay and Cole-Lewis, Heather and Pfohl, Stephen and Payne, Perry and Seneviratne, Martin and Gamble, Paul and Kelly, Chris and Babiker, Abubakr and Sch \"a rli, Nathanael and Chowdhery, Aakanksha and Mansfield, Philip and...

  24. [24]

    Perceptions of Quality of Care Among Users of a Web-Based Patient Portal: Cross-sectional Survey Analysis

    Lear, Rachael and Freise, Lisa and Kybert, Matthew and Darzi, Ara and Neves, Ana Luisa and Mayer, Erik K. Perceptions of Quality of Care Among Users of a Web-Based Patient Portal: Cross-sectional Survey Analysis. J Med Internet Res

  25. [25]

    Comparison of three comorbidity measures for predicting health service use in patients with osteoarthritis

    Dominick, Kelli L and Dudley, Tara K and Coffman, Cynthia J and Bosworth, Hayden B. Comparison of three comorbidity measures for predicting health service use in patients with osteoarthritis. Arthritis Care & Research

  26. [26]

    Personal health record use in the United States: forecasting future adoption levels

    Ford, Eric W and Hesse, Bradford W and Huerta, Timothy R. Personal health record use in the United States: forecasting future adoption levels. Journal of medical Internet research

  27. [27]

    The impact of electronic health records on diagnosis

    Graber, Mark L and Byrne, Colene and Johnston, Doug. The impact of electronic health records on diagnosis. Diagnosis

  28. [28]

    The use of a technology acceptance model ( TAM ) to predict patients' usage of a personal health record system: the role of security, privacy, and usability

    Alsyouf, Adi and Lutfi, Abdalwali and Alsubahi, Nizar and Alhazmi, Fahad Nasser and Al-Mugheed, Khalid and Anshasi, Rami J and Alharbi, Nora Ibrahim and Albugami, Moteb. The use of a technology acceptance model ( TAM ) to predict patients' usage of a personal health record system: the role of security, privacy, and usability. International journal of envi...

  29. [29]

    Online health information--seeking in the era of large language models: cross-sectional web-based survey study

    Yun, Hye Sun and Bickmore, Timothy. Online health information--seeking in the era of large language models: cross-sectional web-based survey study. Journal of medical Internet research

  30. [30]

    Framing health information: the impact of search methods and source types on user trust and satisfaction in the age of llms

    Yun, Hye Sun and Bickmore, Timothy. Framing health information: the impact of search methods and source types on user trust and satisfaction in the age of llms. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

  31. [31]

    Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases

    Li, Bing and Evans, Dewey and Faris, Peter and Dean, Stafford and Quan, Hude. Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases. BMC health services research

  32. [32]

    Question answering for electronic health records: scoping review of datasets and models

    Bardhan, Jayetri and Roberts, Kirk and Wang, Daisy Zhe. Question answering for electronic health records: scoping review of datasets and models. Journal of medical Internet research

  33. [33]

    Claude for Healthcare & Life Sciences: 2026 Technical Guide

    IntuitionLabs. Claude for Healthcare & Life Sciences: 2026 Technical Guide

  34. [34]

    Users of social media and AI chatbots for health information are more likely to say they are convenient than accurate

    Pasquini, Giancarlo and Stocking, Galen and Kikuchi, Emma and Pula, Isabelle and Yam, Eileen. Users of social media and AI chatbots for health information are more likely to say they are convenient than accurate

  35. [35]

    Frequency and types of patient-reported errors in electronic health record ambulatory care notes

    Bell, Sigall K and Delbanco, Tom and Elmore, Joann G and Fitzgerald, Patricia S and Fossa, Alan and Harcourt, Kendall and Leveille, Suzanne G and Payne, Thomas H and Stametz, Rebecca A and Walker, Jan and Others. Frequency and types of patient-reported errors in electronic health record ambulatory care notes. JAMA network open

  36. [36]

    ``What's up, doc?'': Analyzing how users seek health information in large-scale conversational ai datasets

    Paruchuri, Akshay and Aziz, Maryam and Vartak, Rohit and Ali, Ayman and Uchehara, Best and Liu, Xin and Chatterjee, Ishan and Agrawal, Monica. ``What's up, doc?'': Analyzing how users seek health information in large-scale conversational ai datasets. arXiv preprint arXiv:2506. 21532

  37. [37]

    Use of ChatGPT to obtain health information in Australia, 2024: insights from a nationally representative survey

    Ayre, Julie and Cvejic, Erin and McCaffery, Kirsten J. Use of ChatGPT to obtain health information in Australia, 2024: insights from a nationally representative survey. Medical Journal of Australia

  38. [38]

    Associations of the Charlson comorbidity index with depression and mortality among the US adults

    Wang, Ying-Zhao and Xue, Chun and Ma, Chao and Liu, An-Bang. Associations of the Charlson comorbidity index with depression and mortality among the US adults. Frontiers in Public Health

  39. [39]

    Companies Expand AI Health Offerings, Even as Accuracy Questions Remain --- The Monitor

    Luther, Joel and Yilma, Hagere and Washington, Irving. Companies Expand AI Health Offerings, Even as Accuracy Questions Remain --- The Monitor

  40. [40]

    Controlling the false discovery rate: a practical and powerful approach to multiple testing

    Benjamini, Yoav and Hochberg, Yosef. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological)

  41. [41]

    2026 , eprint=

    ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV , author=. 2026 , eprint=

  42. [42]

    SymptomAI: Toward a Conversational AI Agent for Everyday Symptom Assessment

    SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment , author=. arXiv preprint arXiv:2605.04012 , year=

  43. [43]

    JAMA internal medicine , volume=

    Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum , author=. JAMA internal medicine , volume=

  44. [44]

    JAMA network open , volume=

    Evaluating artificial intelligence responses to public health questions , author=. JAMA network open , volume=

  45. [45]

    BMC medical research methodology , volume=

    Scalable information extraction from free text electronic health records using large language models , author=. BMC medical research methodology , volume=. 2025 , publisher=

  46. [46]

    Journal of the American Medical Informatics Association , volume=

    Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies , author=. Journal of the American Medical Informatics Association , volume=. 2025 , publisher=

  47. [47]

    JMIR human factors , volume=

    User-Centered Delivery of AI-Powered Health Care Technologies in Clinical Settings: Mixed Methods Case Study , author=. JMIR human factors , volume=. 2025 , publisher=

  48. [48]

    arXiv preprint arXiv:2405.03066 , year=

    A scoping review of using large language models (llms) to investigate electronic health records (ehrs) , author=. arXiv preprint arXiv:2405.03066 , year=