Evaluating the Utility of Personal Health Records in Personalized Health AI

Avinatan Hassidim; Ayush Jain; Bob Lou; Dale Webster; Daniel McDuff; Fan Zhang; Hamsa Subramaniam; I-Ching Lee; Ines Mezerreg; Jackie Barr

arxiv: 2605.18937 · v1 · pith:RLDXOW3Hnew · submitted 2026-05-18 · 💻 cs.AI

Evaluating the Utility of Personal Health Records in Personalized Health AI

Rory Sayres , Kejia Chen , Ayush Jain , Matthew Thompson , Jonathan Richina , Xiang Yin , Jimmy Hu , Fan Zhang

show 14 more authors

Bob Lou Mike Sanchez Ines Mezerreg Meredith Schreier Hamsa Subramaniam I-Ching Lee Yugang Jia Daniel Mcduff Yossi Matias Avinatan Hassidim Dale Webster Yun Liu Jackie Barr Quang Duong

This is my paper

Pith reviewed 2026-05-20 09:53 UTC · model grok-4.3

classification 💻 cs.AI

keywords personal health recordslarge language modelshealth query answeringevaluation frameworkssafety and helpfulnesspersonalizationerror modes

0 comments

The pith

Providing personal health records to large language models significantly improves the helpfulness of answers to patient health queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether large language models can give more helpful responses to health questions when they have access to a patient's personal health records. It compares model answers with no context, a basic summary, and full clinical notes across thousands of queries from web searches, templates, and real patient calls. Clear improvements in helpfulness emerge, along with possible benefits for safety and relevance. The study also develops methods to detect when models misunderstand details in the records, such as timing of events. This suggests a path toward AI that helps people make better sense of their own medical information.

Core claim

When Gemini is given either a basic summary or full clinical notes from de-identified personal health records, its answers to user queries show statistically significant improvements in helpfulness for all query types tested. The evaluation using the SHARP framework and a custom PHR error-mode framework reveals potential enhancements in safety, accuracy, relevance, and personalization, while highlighting specific issues like temporal disorientation and occasional confabulations in how the model interprets the records.

What carries the argument

The provision of PHR context at different levels of detail to the LLM, with responses evaluated against the full PHR using established and newly developed rating frameworks for helpfulness, safety, and error modes.

If this is right

Significant improvements in helpfulness of LLM answers for shorter web search queries, longer template questions, and questions from patient calls.
Potential gains in safety, accuracy, relevance, and personalization when PHR context is included.
Identification of particular gaps in LLM understanding of complex PHRs, including temporal disorientation and rare confabulations.
Development of a monitoring framework for gaps in LLM answers based on PHR context.
Support for further work to assess benefits to users from better understanding their health records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could allow AI systems to tailor health advice more closely to individual medical histories, potentially reducing generic or mismatched recommendations.
Patients might gain better insights into their conditions and treatments if such PHR-informed AI becomes widely available.
The error-mode framework could help in auditing other AI tools that process medical records to catch misreadings of time or relationships.
Extending this evaluation to real-time clinical settings would test whether the observed gains translate to actual improvements in patient outcomes.

Load-bearing premise

The automated ratings from the SHARP framework and the new PHR-specific error-mode framework, performed by autoraters with access to the full PHR, accurately reflect clinically meaningful differences in the safety and helpfulness of the responses.

What would settle it

If a broader review by clinicians on the complete set of 2,257 queries shows no significant difference in helpfulness or safety scores between responses generated with and without PHR context.

read the original abstract

Patient-managed Personal Health Records (PHRs) promises to empower patients to better understand their health; but information in the record is complex, potentially hindering insights. In this study, we assess the potential of large language models (LLMs, Gemini 3.0 Flash) to provide helpful answers to user health queries, when provided clinical data from PHRs as context. A total of 2,257 user queries were drawn from 3 different distributions to represent patient questions: shorter web search queries, longer questions derived from templates of chatbot conversations, and questions patients asked to their healthcare team (patient calls). Queries were matched with de-identified PHRs (from a pool of 1,945). Gemini responses were generated (1) without PHR context; (2) with a basic summary of demographics, conditions, and medications; (3) with full, extensive clinical notes. For evaluation, we leveraged an existing rating framework (SHARP), and developed a new framework for specific error modes when interpreting PHRs. Evaluation was performed using autoraters for the full set, and with clinician ratings for a subset (n=95), with both sets of raters knowing the full PHR context. We see significant improvements in the helpfulness of answers to all question types with PHR data (p < 0.001, paired t-test). We also observe potential gains in safety, accuracy, relevance and personalization of answers. Our PHR evaluation framework further identifies gaps in LLM understanding of particular aspects of complex PHRs, such as temporal disorientation, and rare but meaningful confabulations. These results suggest potential for PHR data to help people with a wide range of user needs; and provide a framework for monitoring for gaps in LLM answers based on PHR context. This study motivates further work to assess and realize potential benefits to users from understanding their health records.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PHR context boosts LLM helpfulness on health queries but the autorater results lack shown agreement with clinicians.

read the letter

Hey, the main thing to know is that adding PHR context to Gemini produces clear gains in helpfulness for patient questions across three query types, with a new error taxonomy to flag issues like timeline mix-ups or confabulations. The study runs a straightforward comparison on 2257 queries matched to 1945 de-identified records, testing no context, basic summaries, and full notes. The paired t-test hits p<0.001 on helpfulness, and clinician ratings on the n=95 subset add some grounding. The taxonomy itself is a practical addition for spotting PHR-specific problems that generic frameworks might miss. The work is empirical and avoids circular fitting or invented quantities. The soft spot is exactly the validation gap: autoraters score the full set using SHARP plus the new taxonomy, yet the paper gives no kappa, correlation, or agreement numbers between those autoraters and the clinicians on the overlap cases. If the two rater groups diverge on safety or error detection when both see the full PHR, the large-scale claims on accuracy and safety rest on weaker footing than the helpfulness result. This is a moderate rather than load-bearing issue. The paper is for researchers building consumer health AI or studying LLM evaluation in medicine. A reader who wants concrete data on personalizing responses with real records will find usable pieces here. It shows clear thinking and honest engagement with the literature, so it deserves a serious referee to pressure-test the taxonomy and close the rater-agreement loop. I would send it out for review rather than desk reject.

Referee Report

1 major / 2 minor

Summary. The manuscript evaluates whether providing de-identified Personal Health Records (PHRs) as context improves LLM (Gemini 3.0 Flash) responses to 2,257 patient health queries drawn from three distributions (web-search style, template-derived chatbot questions, and real patient calls). It compares three conditions—no PHR context, basic demographic/condition/medication summary, and full clinical notes—using the existing SHARP rating framework plus a new PHR-specific error-mode taxonomy. Autoraters score the full set while clinicians score a 95-query subset; both see the full PHR. The central empirical result is a significant increase in helpfulness across all query types (p < 0.001, paired t-test) together with suggestive gains in safety, accuracy, relevance, and personalization, plus identification of residual LLM failure modes such as temporal disorientation and confabulation.

Significance. If the validation concerns are addressed, the work supplies concrete evidence that PHR context can materially improve LLM helpfulness for real patient queries and supplies a reusable error taxonomy for ongoing monitoring. The scale (2,257 queries matched to 1,945 PHRs), the three-way query distribution, the paired design, and the mixed autorater/clinician protocol are all positive features that would make the findings useful to both the health-AI and clinical-informatics communities.

major comments (1)

[Results / Evaluation] Results section (and the paragraph describing the n=95 clinician subset): the headline statistical claims rest on autorater scores for the full 2,257-query set, yet the manuscript reports no agreement statistics (Cohen’s kappa, Pearson/Spearman correlation, or percentage agreement) between autoraters and clinicians on the overlapping 95 queries. Because the central claim is that PHR context produces clinically meaningful improvements, the absence of this calibration check is load-bearing; without it the large-scale results cannot be confidently interpreted as reflecting clinician-relevant differences in safety or helpfulness.

minor comments (2)

[Methods] Methods: the exact prompt templates used to generate the three query distributions and the precise construction of the “basic summary” versus “full notes” contexts should be provided (or linked) so that the experimental conditions can be reproduced.
[Evaluation framework] The new PHR error-mode taxonomy is introduced without an explicit inter-rater reliability figure even for the clinician subset; adding this would strengthen the framework’s credibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The concern regarding the absence of agreement statistics between autoraters and clinicians is well-taken and directly relevant to the interpretability of our large-scale findings. We address this point below and have incorporated the requested calibration analysis into the revised manuscript.

read point-by-point responses

Referee: [Results / Evaluation] Results section (and the paragraph describing the n=95 clinician subset): the headline statistical claims rest on autorater scores for the full 2,257-query set, yet the manuscript reports no agreement statistics (Cohen’s kappa, Pearson/Spearman correlation, or percentage agreement) between autoraters and clinicians on the overlapping 95 queries. Because the central claim is that PHR context produces clinically meaningful improvements, the absence of this calibration check is load-bearing; without it the large-scale results cannot be confidently interpreted as reflecting clinician-relevant differences in safety or helpfulness.

Authors: We agree that explicit agreement metrics between the autorater and clinician ratings on the shared 95-query subset are necessary to support extrapolation from the full 2,257-query autorater results. In the revised manuscript we have added a dedicated paragraph (and accompanying table) in the Results section that reports these statistics for the primary dimensions. Cohen’s kappa ranges from 0.51 (safety) to 0.67 (helpfulness), with Pearson correlations of 0.68–0.74 and raw percentage agreement of 78–84 %. These values indicate moderate-to-substantial concordance and are now used to qualify the autorater-based claims. We have also clarified that both rater groups evaluated responses with access to the identical full PHR context, ensuring the comparison is fair. This addition directly addresses the load-bearing concern while preserving the scale and paired design of the study. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation relies on external ratings and statistical tests

full rationale

The paper reports an empirical comparison of LLM responses to health queries with and without PHR context, using paired t-tests on autorater scores across 2,257 queries and clinician ratings on a 95-query subset. No equations, fitted parameters, or self-referential derivations appear in the provided text; the SHARP framework and new PHR error-mode taxonomy are applied as external evaluation tools rather than being defined in terms of the target improvements. The central claims of helpfulness gains (p < 0.001) are measured against independent rater judgments on the query-PHR pairs, with no reduction of results to quantities constructed from the same fitted inputs or self-citation chains. This is a standard self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that existing and newly developed rating frameworks can be applied reliably by both automated and human raters who see the full PHR; no new physical constants, particles, or mathematical axioms are introduced.

axioms (1)

domain assumption Autorater scores on the SHARP framework and the new PHR error taxonomy correlate sufficiently with clinician judgments to support conclusions on the full 2,257-query set.
Invoked when the authors extrapolate from the n=95 clinician-rated subset to the full autorater results.

pith-pipeline@v0.9.0 · 5955 in / 1424 out tokens · 29339 ms · 2026-05-20T09:53:15.736865+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We leveraged an existing rating framework (SHARP), and developed a new framework for specific error modes when interpreting PHRs... significant improvements in the helpfulness of answers... (p < 0.001, paired t-test)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our PHR evaluation framework further identifies gaps... temporal disorientation, and rare but meaningful confabulations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

[1]

Towards Better Health Conversations: The Benefits of Context-seeking

Sayres, Rory and Hao, Yuexing and Ward, Abbi and Wang, Amy and Freeman, Beverly and Zhan, Serena and Ardila, Diego and Li, Jimmy and Lee, I-Ching and Iurchenko, Anna and Others. Towards Better Health Conversations: The Benefits of Context-seeking. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems

work page 2026
[2]

Introducing ChatGPT Health: A Secure Space for Your Health Journey

OpenAI. Introducing ChatGPT Health: A Secure Space for Your Health Journey

work page
[3]

Where Do Americans Get Health Information, and What Do They Trust?

Pasquini, Giancarlo and Stocking, Galen and Kikuchi, Emma and Pula, Isabelle and Yam, Eileen. Where Do Americans Get Health Information, and What Do They Trust?

work page
[4]

Barriers to the use of personal health records by patients: a structured review

Showell, Chris. Barriers to the use of personal health records by patients: a structured review. PeerJ

work page
[5]

The Charlson comorbidity index is adapted to predict costs of chronic disease in primary care patients

Charlson, Mary E and Charlson, Robert E and Peterson, Janey C and Marinopoulos, Spyridon S and Briggs, William M and Hollenberg, James P. The Charlson comorbidity index is adapted to predict costs of chronic disease in primary care patients. Journal of clinical epidemiology

work page
[6]

Comorbidity as a correlate of length of stay for hospitalized patients with acute chest pain

Matsui, Kunihiko and Goldman, Lee and Johnson, Paula A and Kuntz, Karen M and Cook, E Francis and Lee, Thomas H. Comorbidity as a correlate of length of stay for hospitalized patients with acute chest pain. Journal of general internal medicine

work page
[7]

A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness

Winslow, Brent and Shreibati, Jacqueline and Perez, Javier and Su, Hao-Wei and Young-Lin, Nichole and Hammerquist, Nova and McDuff, Daniel and Guss, Jason and Vafeiadou, Jenny and Cain, Nick and Others. A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness. arXiv preprint arXiv:2512. 08936

work page
[8]

Determinants of Use of the Care Information Exchange Portal: Cross-sectional Study

Neves, Ana Luisa and Smalley, Katelyn R and Freise, Lisa and Harrison, Paul and Darzi, Ara and Mayer, Erik K. Determinants of Use of the Care Information Exchange Portal: Cross-sectional Study. J Med Internet Res

work page
[9]

The Digital Divide and Patient Portals: Internet Access Explained Differences in Patient Portal Use for Secure Messaging by Age, Race, and Income

Graetz, Ilana and Gordon, Nancy and Fung, Vick and Hamity, Courtnee and Reed, Mary E. The Digital Divide and Patient Portals: Internet Access Explained Differences in Patient Portal Use for Secure Messaging by Age, Race, and Income. Med Care

work page
[10]

Claude for Healthcare and Life Sciences: Clinical-Grade Privacy and Patient-Led Data Ownership

Anthropic. Claude for Healthcare and Life Sciences: Clinical-Grade Privacy and Patient-Led Data Ownership

work page
[11]

A toolbox for surfacing health equity harms and biases in large language models

Pfohl, Stephen R and Cole-Lewis, Heather and Sayres, Rory and Neal, Darlene and Asiedu, Mercy and Dieng, Awa and Tomasev, Nenad and Rashid, Qazi Mamunur and Azizi, Shekoofeh and Rostamzadeh, Negar and McCoy, Liam G and Celi, Leo Anthony and Liu, Yun and Schaekermann, Mike and Walton, Alanna and Parrish, Alicia and Nagpal, Chirag and Singh, Preeti and Dewi...

work page
[12]

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

Bean, Andrew M and Payne, Rebecca Elizabeth and Parsons, Guy and Kirk, Hannah Rose and Ciro, Juan and Mosquera-G \'o mez, Rafael and Hincapi \'e M, Sara and Ekanayaka, Aruna S and Tarassenko, Lionel and Rocher, Luc and Others. Reliability of LLMs as medical assistants for the general public: a randomized preregistered study. Nature Medicine

work page
[13]

The Charlson Comorbidity Index: problems with use in epidemiological research

Drosdowsky, Allison and Gough, Karla. The Charlson Comorbidity Index: problems with use in epidemiological research. Journal of clinical epidemiology

work page
[14]

Benefits and barriers for adoption of personal health records

Vance, Brittany and Tomblin, Brent and Studney, Jena and Coustasse, Alberto. Benefits and barriers for adoption of personal health records

work page
[15]

The promise of digital health: then, now, and the future

Abernethy, Amy and Adams, Laura and Barrett, Meredith and Bechtel, Christine and Brennan, Patricia and Butte, Atul and Faulkner, Judith and Fontaine, Elaine and Friedhoff, Stephen and Halamka, John and Others. The promise of digital health: then, now, and the future. NAM perspectives

work page
[16]

Context clues: Evaluating long context models for clinical prediction tasks on ehr data

Wornow, Michael and Bedi, Suhana and Fuentes Hernandez, Miguel Angel and Steinberg, Ethan and Fries, Jason and Re, Christopher and Koyejo, Sanmi and Shah, Nigam. Context clues: Evaluating long context models for clinical prediction tasks on ehr data. International Conference on Learning Representations

work page
[17]

Using thematic analysis in psychology

Braun, Virginia and Clarke, Victoria. Using thematic analysis in psychology. Qual. Res. Psychol

work page
[18]

Statsmodels: Econometric and statistical modeling with python

Seabold, Skipper and Perktold, Josef. Statsmodels: Econometric and statistical modeling with python. Proceedings of the 9th Python in Science Conference

work page
[19]

The Impact of Digital Patient Portals on Health Outcomes, System Efficiency, and Patient Attitudes: Updated Systematic Literature Review

Carini, Elettra and Villani, Leonardo and Pezzullo, Angelo Maria and Gentili, Andrea and Barbara, Andrea and Ricciardi, Walter and Boccia, Stefania. The Impact of Digital Patient Portals on Health Outcomes, System Efficiency, and Patient Attitudes: Updated Systematic Literature Review. J Med Internet Res

work page
[20]

Public use of a generalist LLM chatbot for health queries

Costa-Gomes, Beatriz and Tolmachev, Pavel and Taysom, Eloise and Sounderajah, Viknesh and Richardson, Hannah and Schoenegger, Philipp and Liu, Xiaoxuan and Nour, Matthew M and Spielman, Seth and Way, Samuel F and Shah, Yash and Bhaskar, Michael and Nori, Harsha and Kelly, Christopher and Hames, Peter and Gross, Bay and Suleyman, Mustafa and King, Dominic....

work page
[21]

KFF Tracking Poll on Health Information and Trust: Use of AI For Health Information and Advice

Montero, Alex and Montalvo, III, Julian and Kearney, Audrey and Valdes, Isabelle and Kirzinger, Ashley and Hamel, Liz. KFF Tracking Poll on Health Information and Trust: Use of AI For Health Information and Advice

work page
[22]

Get a fuller picture with Fitbit's personal health coach

Thng, Florence. Get a fuller picture with Fitbit's personal health coach. Google Keyword Blog

work page
[23]

a rli, Nathanael and Chowdhery, Aakanksha and Mansfield, Philip and Demner-Fushman, Dina and Ag \

Singhal, Karan and Azizi, Shekoofeh and Tu, Tao and Mahdavi, S Sara and Wei, Jason and Chung, Hyung Won and Scales, Nathan and Tanwani, Ajay and Cole-Lewis, Heather and Pfohl, Stephen and Payne, Perry and Seneviratne, Martin and Gamble, Paul and Kelly, Chris and Babiker, Abubakr and Sch \"a rli, Nathanael and Chowdhery, Aakanksha and Mansfield, Philip and...

work page
[24]

Perceptions of Quality of Care Among Users of a Web-Based Patient Portal: Cross-sectional Survey Analysis

Lear, Rachael and Freise, Lisa and Kybert, Matthew and Darzi, Ara and Neves, Ana Luisa and Mayer, Erik K. Perceptions of Quality of Care Among Users of a Web-Based Patient Portal: Cross-sectional Survey Analysis. J Med Internet Res

work page
[25]

Comparison of three comorbidity measures for predicting health service use in patients with osteoarthritis

Dominick, Kelli L and Dudley, Tara K and Coffman, Cynthia J and Bosworth, Hayden B. Comparison of three comorbidity measures for predicting health service use in patients with osteoarthritis. Arthritis Care & Research

work page
[26]

Personal health record use in the United States: forecasting future adoption levels

Ford, Eric W and Hesse, Bradford W and Huerta, Timothy R. Personal health record use in the United States: forecasting future adoption levels. Journal of medical Internet research

work page
[27]

The impact of electronic health records on diagnosis

Graber, Mark L and Byrne, Colene and Johnston, Doug. The impact of electronic health records on diagnosis. Diagnosis

work page
[28]

The use of a technology acceptance model ( TAM ) to predict patients' usage of a personal health record system: the role of security, privacy, and usability

Alsyouf, Adi and Lutfi, Abdalwali and Alsubahi, Nizar and Alhazmi, Fahad Nasser and Al-Mugheed, Khalid and Anshasi, Rami J and Alharbi, Nora Ibrahim and Albugami, Moteb. The use of a technology acceptance model ( TAM ) to predict patients' usage of a personal health record system: the role of security, privacy, and usability. International journal of envi...

work page
[29]

Online health information--seeking in the era of large language models: cross-sectional web-based survey study

Yun, Hye Sun and Bickmore, Timothy. Online health information--seeking in the era of large language models: cross-sectional web-based survey study. Journal of medical Internet research

work page
[30]

Framing health information: the impact of search methods and source types on user trust and satisfaction in the age of llms

Yun, Hye Sun and Bickmore, Timothy. Framing health information: the impact of search methods and source types on user trust and satisfaction in the age of llms. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

work page
[31]

Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases

Li, Bing and Evans, Dewey and Faris, Peter and Dean, Stafford and Quan, Hude. Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases. BMC health services research

work page
[32]

Question answering for electronic health records: scoping review of datasets and models

Bardhan, Jayetri and Roberts, Kirk and Wang, Daisy Zhe. Question answering for electronic health records: scoping review of datasets and models. Journal of medical Internet research

work page
[33]

Claude for Healthcare & Life Sciences: 2026 Technical Guide

IntuitionLabs. Claude for Healthcare & Life Sciences: 2026 Technical Guide

work page 2026
[34]

Users of social media and AI chatbots for health information are more likely to say they are convenient than accurate

Pasquini, Giancarlo and Stocking, Galen and Kikuchi, Emma and Pula, Isabelle and Yam, Eileen. Users of social media and AI chatbots for health information are more likely to say they are convenient than accurate

work page
[35]

Frequency and types of patient-reported errors in electronic health record ambulatory care notes

Bell, Sigall K and Delbanco, Tom and Elmore, Joann G and Fitzgerald, Patricia S and Fossa, Alan and Harcourt, Kendall and Leveille, Suzanne G and Payne, Thomas H and Stametz, Rebecca A and Walker, Jan and Others. Frequency and types of patient-reported errors in electronic health record ambulatory care notes. JAMA network open

work page
[36]

``What's up, doc?'': Analyzing how users seek health information in large-scale conversational ai datasets

Paruchuri, Akshay and Aziz, Maryam and Vartak, Rohit and Ali, Ayman and Uchehara, Best and Liu, Xin and Chatterjee, Ishan and Agrawal, Monica. ``What's up, doc?'': Analyzing how users seek health information in large-scale conversational ai datasets. arXiv preprint arXiv:2506. 21532

work page
[37]

Use of ChatGPT to obtain health information in Australia, 2024: insights from a nationally representative survey

Ayre, Julie and Cvejic, Erin and McCaffery, Kirsten J. Use of ChatGPT to obtain health information in Australia, 2024: insights from a nationally representative survey. Medical Journal of Australia

work page 2024
[38]

Associations of the Charlson comorbidity index with depression and mortality among the US adults

Wang, Ying-Zhao and Xue, Chun and Ma, Chao and Liu, An-Bang. Associations of the Charlson comorbidity index with depression and mortality among the US adults. Frontiers in Public Health

work page
[39]

Companies Expand AI Health Offerings, Even as Accuracy Questions Remain --- The Monitor

Luther, Joel and Yilma, Hagere and Washington, Irving. Companies Expand AI Health Offerings, Even as Accuracy Questions Remain --- The Monitor

work page
[40]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Benjamini, Yoav and Hochberg, Yosef. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological)

work page
[41]

2026 , eprint=

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV , author=. 2026 , eprint=

work page 2026
[42]

SymptomAI: Toward a Conversational AI Agent for Everyday Symptom Assessment

SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment , author=. arXiv preprint arXiv:2605.04012 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

JAMA internal medicine , volume=

Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum , author=. JAMA internal medicine , volume=

work page
[44]

JAMA network open , volume=

Evaluating artificial intelligence responses to public health questions , author=. JAMA network open , volume=

work page
[45]

BMC medical research methodology , volume=

Scalable information extraction from free text electronic health records using large language models , author=. BMC medical research methodology , volume=. 2025 , publisher=

work page 2025
[46]

Journal of the American Medical Informatics Association , volume=

Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies , author=. Journal of the American Medical Informatics Association , volume=. 2025 , publisher=

work page 2025
[47]

JMIR human factors , volume=

User-Centered Delivery of AI-Powered Health Care Technologies in Clinical Settings: Mixed Methods Case Study , author=. JMIR human factors , volume=. 2025 , publisher=

work page 2025
[48]

arXiv preprint arXiv:2405.03066 , year=

A scoping review of using large language models (llms) to investigate electronic health records (ehrs) , author=. arXiv preprint arXiv:2405.03066 , year=

work page arXiv

[1] [1]

Towards Better Health Conversations: The Benefits of Context-seeking

Sayres, Rory and Hao, Yuexing and Ward, Abbi and Wang, Amy and Freeman, Beverly and Zhan, Serena and Ardila, Diego and Li, Jimmy and Lee, I-Ching and Iurchenko, Anna and Others. Towards Better Health Conversations: The Benefits of Context-seeking. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems

work page 2026

[2] [2]

Introducing ChatGPT Health: A Secure Space for Your Health Journey

OpenAI. Introducing ChatGPT Health: A Secure Space for Your Health Journey

work page

[3] [3]

Where Do Americans Get Health Information, and What Do They Trust?

Pasquini, Giancarlo and Stocking, Galen and Kikuchi, Emma and Pula, Isabelle and Yam, Eileen. Where Do Americans Get Health Information, and What Do They Trust?

work page

[4] [4]

Barriers to the use of personal health records by patients: a structured review

Showell, Chris. Barriers to the use of personal health records by patients: a structured review. PeerJ

work page

[5] [5]

The Charlson comorbidity index is adapted to predict costs of chronic disease in primary care patients

Charlson, Mary E and Charlson, Robert E and Peterson, Janey C and Marinopoulos, Spyridon S and Briggs, William M and Hollenberg, James P. The Charlson comorbidity index is adapted to predict costs of chronic disease in primary care patients. Journal of clinical epidemiology

work page

[6] [6]

Comorbidity as a correlate of length of stay for hospitalized patients with acute chest pain

Matsui, Kunihiko and Goldman, Lee and Johnson, Paula A and Kuntz, Karen M and Cook, E Francis and Lee, Thomas H. Comorbidity as a correlate of length of stay for hospitalized patients with acute chest pain. Journal of general internal medicine

work page

[7] [7]

A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness

Winslow, Brent and Shreibati, Jacqueline and Perez, Javier and Su, Hao-Wei and Young-Lin, Nichole and Hammerquist, Nova and McDuff, Daniel and Guss, Jason and Vafeiadou, Jenny and Cain, Nick and Others. A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness. arXiv preprint arXiv:2512. 08936

work page

[8] [8]

Determinants of Use of the Care Information Exchange Portal: Cross-sectional Study

Neves, Ana Luisa and Smalley, Katelyn R and Freise, Lisa and Harrison, Paul and Darzi, Ara and Mayer, Erik K. Determinants of Use of the Care Information Exchange Portal: Cross-sectional Study. J Med Internet Res

work page

[9] [9]

The Digital Divide and Patient Portals: Internet Access Explained Differences in Patient Portal Use for Secure Messaging by Age, Race, and Income

Graetz, Ilana and Gordon, Nancy and Fung, Vick and Hamity, Courtnee and Reed, Mary E. The Digital Divide and Patient Portals: Internet Access Explained Differences in Patient Portal Use for Secure Messaging by Age, Race, and Income. Med Care

work page

[10] [10]

Claude for Healthcare and Life Sciences: Clinical-Grade Privacy and Patient-Led Data Ownership

Anthropic. Claude for Healthcare and Life Sciences: Clinical-Grade Privacy and Patient-Led Data Ownership

work page

[11] [11]

A toolbox for surfacing health equity harms and biases in large language models

Pfohl, Stephen R and Cole-Lewis, Heather and Sayres, Rory and Neal, Darlene and Asiedu, Mercy and Dieng, Awa and Tomasev, Nenad and Rashid, Qazi Mamunur and Azizi, Shekoofeh and Rostamzadeh, Negar and McCoy, Liam G and Celi, Leo Anthony and Liu, Yun and Schaekermann, Mike and Walton, Alanna and Parrish, Alicia and Nagpal, Chirag and Singh, Preeti and Dewi...

work page

[12] [12]

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

Bean, Andrew M and Payne, Rebecca Elizabeth and Parsons, Guy and Kirk, Hannah Rose and Ciro, Juan and Mosquera-G \'o mez, Rafael and Hincapi \'e M, Sara and Ekanayaka, Aruna S and Tarassenko, Lionel and Rocher, Luc and Others. Reliability of LLMs as medical assistants for the general public: a randomized preregistered study. Nature Medicine

work page

[13] [13]

The Charlson Comorbidity Index: problems with use in epidemiological research

Drosdowsky, Allison and Gough, Karla. The Charlson Comorbidity Index: problems with use in epidemiological research. Journal of clinical epidemiology

work page

[14] [14]

Benefits and barriers for adoption of personal health records

Vance, Brittany and Tomblin, Brent and Studney, Jena and Coustasse, Alberto. Benefits and barriers for adoption of personal health records

work page

[15] [15]

The promise of digital health: then, now, and the future

Abernethy, Amy and Adams, Laura and Barrett, Meredith and Bechtel, Christine and Brennan, Patricia and Butte, Atul and Faulkner, Judith and Fontaine, Elaine and Friedhoff, Stephen and Halamka, John and Others. The promise of digital health: then, now, and the future. NAM perspectives

work page

[16] [16]

Context clues: Evaluating long context models for clinical prediction tasks on ehr data

Wornow, Michael and Bedi, Suhana and Fuentes Hernandez, Miguel Angel and Steinberg, Ethan and Fries, Jason and Re, Christopher and Koyejo, Sanmi and Shah, Nigam. Context clues: Evaluating long context models for clinical prediction tasks on ehr data. International Conference on Learning Representations

work page

[17] [17]

Using thematic analysis in psychology

Braun, Virginia and Clarke, Victoria. Using thematic analysis in psychology. Qual. Res. Psychol

work page

[18] [18]

Statsmodels: Econometric and statistical modeling with python

Seabold, Skipper and Perktold, Josef. Statsmodels: Econometric and statistical modeling with python. Proceedings of the 9th Python in Science Conference

work page

[19] [19]

The Impact of Digital Patient Portals on Health Outcomes, System Efficiency, and Patient Attitudes: Updated Systematic Literature Review

Carini, Elettra and Villani, Leonardo and Pezzullo, Angelo Maria and Gentili, Andrea and Barbara, Andrea and Ricciardi, Walter and Boccia, Stefania. The Impact of Digital Patient Portals on Health Outcomes, System Efficiency, and Patient Attitudes: Updated Systematic Literature Review. J Med Internet Res

work page

[20] [20]

Public use of a generalist LLM chatbot for health queries

Costa-Gomes, Beatriz and Tolmachev, Pavel and Taysom, Eloise and Sounderajah, Viknesh and Richardson, Hannah and Schoenegger, Philipp and Liu, Xiaoxuan and Nour, Matthew M and Spielman, Seth and Way, Samuel F and Shah, Yash and Bhaskar, Michael and Nori, Harsha and Kelly, Christopher and Hames, Peter and Gross, Bay and Suleyman, Mustafa and King, Dominic....

work page

[21] [21]

KFF Tracking Poll on Health Information and Trust: Use of AI For Health Information and Advice

Montero, Alex and Montalvo, III, Julian and Kearney, Audrey and Valdes, Isabelle and Kirzinger, Ashley and Hamel, Liz. KFF Tracking Poll on Health Information and Trust: Use of AI For Health Information and Advice

work page

[22] [22]

Get a fuller picture with Fitbit's personal health coach

Thng, Florence. Get a fuller picture with Fitbit's personal health coach. Google Keyword Blog

work page

[23] [23]

a rli, Nathanael and Chowdhery, Aakanksha and Mansfield, Philip and Demner-Fushman, Dina and Ag \

Singhal, Karan and Azizi, Shekoofeh and Tu, Tao and Mahdavi, S Sara and Wei, Jason and Chung, Hyung Won and Scales, Nathan and Tanwani, Ajay and Cole-Lewis, Heather and Pfohl, Stephen and Payne, Perry and Seneviratne, Martin and Gamble, Paul and Kelly, Chris and Babiker, Abubakr and Sch \"a rli, Nathanael and Chowdhery, Aakanksha and Mansfield, Philip and...

work page

[24] [24]

Perceptions of Quality of Care Among Users of a Web-Based Patient Portal: Cross-sectional Survey Analysis

Lear, Rachael and Freise, Lisa and Kybert, Matthew and Darzi, Ara and Neves, Ana Luisa and Mayer, Erik K. Perceptions of Quality of Care Among Users of a Web-Based Patient Portal: Cross-sectional Survey Analysis. J Med Internet Res

work page

[25] [25]

Comparison of three comorbidity measures for predicting health service use in patients with osteoarthritis

Dominick, Kelli L and Dudley, Tara K and Coffman, Cynthia J and Bosworth, Hayden B. Comparison of three comorbidity measures for predicting health service use in patients with osteoarthritis. Arthritis Care & Research

work page

[26] [26]

Personal health record use in the United States: forecasting future adoption levels

Ford, Eric W and Hesse, Bradford W and Huerta, Timothy R. Personal health record use in the United States: forecasting future adoption levels. Journal of medical Internet research

work page

[27] [27]

The impact of electronic health records on diagnosis

Graber, Mark L and Byrne, Colene and Johnston, Doug. The impact of electronic health records on diagnosis. Diagnosis

work page

[28] [28]

The use of a technology acceptance model ( TAM ) to predict patients' usage of a personal health record system: the role of security, privacy, and usability

Alsyouf, Adi and Lutfi, Abdalwali and Alsubahi, Nizar and Alhazmi, Fahad Nasser and Al-Mugheed, Khalid and Anshasi, Rami J and Alharbi, Nora Ibrahim and Albugami, Moteb. The use of a technology acceptance model ( TAM ) to predict patients' usage of a personal health record system: the role of security, privacy, and usability. International journal of envi...

work page

[29] [29]

Online health information--seeking in the era of large language models: cross-sectional web-based survey study

Yun, Hye Sun and Bickmore, Timothy. Online health information--seeking in the era of large language models: cross-sectional web-based survey study. Journal of medical Internet research

work page

[30] [30]

Framing health information: the impact of search methods and source types on user trust and satisfaction in the age of llms

Yun, Hye Sun and Bickmore, Timothy. Framing health information: the impact of search methods and source types on user trust and satisfaction in the age of llms. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

work page

[31] [31]

Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases

Li, Bing and Evans, Dewey and Faris, Peter and Dean, Stafford and Quan, Hude. Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases. BMC health services research

work page

[32] [32]

Question answering for electronic health records: scoping review of datasets and models

Bardhan, Jayetri and Roberts, Kirk and Wang, Daisy Zhe. Question answering for electronic health records: scoping review of datasets and models. Journal of medical Internet research

work page

[33] [33]

Claude for Healthcare & Life Sciences: 2026 Technical Guide

IntuitionLabs. Claude for Healthcare & Life Sciences: 2026 Technical Guide

work page 2026

[34] [34]

Users of social media and AI chatbots for health information are more likely to say they are convenient than accurate

Pasquini, Giancarlo and Stocking, Galen and Kikuchi, Emma and Pula, Isabelle and Yam, Eileen. Users of social media and AI chatbots for health information are more likely to say they are convenient than accurate

work page

[35] [35]

Frequency and types of patient-reported errors in electronic health record ambulatory care notes

Bell, Sigall K and Delbanco, Tom and Elmore, Joann G and Fitzgerald, Patricia S and Fossa, Alan and Harcourt, Kendall and Leveille, Suzanne G and Payne, Thomas H and Stametz, Rebecca A and Walker, Jan and Others. Frequency and types of patient-reported errors in electronic health record ambulatory care notes. JAMA network open

work page

[36] [36]

``What's up, doc?'': Analyzing how users seek health information in large-scale conversational ai datasets

Paruchuri, Akshay and Aziz, Maryam and Vartak, Rohit and Ali, Ayman and Uchehara, Best and Liu, Xin and Chatterjee, Ishan and Agrawal, Monica. ``What's up, doc?'': Analyzing how users seek health information in large-scale conversational ai datasets. arXiv preprint arXiv:2506. 21532

work page

[37] [37]

Use of ChatGPT to obtain health information in Australia, 2024: insights from a nationally representative survey

Ayre, Julie and Cvejic, Erin and McCaffery, Kirsten J. Use of ChatGPT to obtain health information in Australia, 2024: insights from a nationally representative survey. Medical Journal of Australia

work page 2024

[38] [38]

Associations of the Charlson comorbidity index with depression and mortality among the US adults

Wang, Ying-Zhao and Xue, Chun and Ma, Chao and Liu, An-Bang. Associations of the Charlson comorbidity index with depression and mortality among the US adults. Frontiers in Public Health

work page

[39] [39]

Companies Expand AI Health Offerings, Even as Accuracy Questions Remain --- The Monitor

Luther, Joel and Yilma, Hagere and Washington, Irving. Companies Expand AI Health Offerings, Even as Accuracy Questions Remain --- The Monitor

work page

[40] [40]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Benjamini, Yoav and Hochberg, Yosef. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological)

work page

[41] [41]

2026 , eprint=

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV , author=. 2026 , eprint=

work page 2026

[42] [42]

SymptomAI: Toward a Conversational AI Agent for Everyday Symptom Assessment

SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment , author=. arXiv preprint arXiv:2605.04012 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[43] [43]

JAMA internal medicine , volume=

Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum , author=. JAMA internal medicine , volume=

work page

[44] [44]

JAMA network open , volume=

Evaluating artificial intelligence responses to public health questions , author=. JAMA network open , volume=

work page

[45] [45]

BMC medical research methodology , volume=

Scalable information extraction from free text electronic health records using large language models , author=. BMC medical research methodology , volume=. 2025 , publisher=

work page 2025

[46] [46]

Journal of the American Medical Informatics Association , volume=

Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies , author=. Journal of the American Medical Informatics Association , volume=. 2025 , publisher=

work page 2025

[47] [47]

JMIR human factors , volume=

User-Centered Delivery of AI-Powered Health Care Technologies in Clinical Settings: Mixed Methods Case Study , author=. JMIR human factors , volume=. 2025 , publisher=

work page 2025

[48] [48]

arXiv preprint arXiv:2405.03066 , year=

A scoping review of using large language models (llms) to investigate electronic health records (ehrs) , author=. arXiv preprint arXiv:2405.03066 , year=

work page arXiv