An Underexplored Frontier: Large Language Models for Rare Disease Patient Education and Communication -- A scoping review

Anita Burgun; Kai Yu; Min Zeng; Rui Zhang; Xiaoyi Chen; Yu Hou; Zaifu Zhan

arxiv: 2604.14179 · v1 · submitted 2026-03-30 · 💻 cs.CL · cs.AI

An Underexplored Frontier: Large Language Models for Rare Disease Patient Education and Communication -- A scoping review

Zaifu Zhan , Yu Hou , Kai Yu , Min Zeng , Anita Burgun , Xiaoyi Chen , Rui Zhang This is my paper

Pith reviewed 2026-05-14 21:55 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords rare diseaseslarge language modelspatient educationpatient communicationscoping reviewChatGPThealthcare AImedical communication

0 comments

The pith

Scoping review shows LLM use for rare disease patient education stays early and narrow.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Rare diseases affect over 300 million people with complex, long-term communication needs that general clinical resources often cannot meet. This paper reviews the existing studies on large language models applied to patient education and support in this area. It finds only 12 published works, almost all very recent and built on general-purpose systems such as ChatGPT. These studies mainly test the models on pre-written question sets rather than actual patient data or ongoing conversations. Evaluations focus on factual accuracy while giving little weight to whether answers are readable, empathetic, or useful across languages and real care journeys.

Core claim

The literature is highly recent and dominated by general-purpose models, particularly ChatGPT. Most studies focus on patient question answering using curated question sets, with limited use of real-world data or longitudinal communication scenarios. Evaluations are primarily centered on accuracy, with limited attention to patient-centered dimensions such as readability, empathy, and communication quality. Multilingual communication is rarely addressed. Overall, the field remains at an early stage.

What carries the argument

The scoping review that located and analyzed 12 studies, extracting details on application scenarios, model types, and evaluation approaches.

If this is right

Patient-centered design must be added to future work so responses address readability, empathy, and overall communication quality.
Domain-adapted models should replace reliance on general-purpose systems like ChatGPT.
Real-world testing with actual patient data and ongoing scenarios is required for safe deployment.
Multilingual capabilities need explicit development to serve diverse patient populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Filling these gaps could let LLMs reduce isolation for patients who lack easy access to specialists during extended care journeys.
Similar evaluation shortfalls may appear when LLMs are applied to other low-prevalence or complex medical topics.
Direct trials that compare accuracy scores against patient-reported outcomes would test whether current metrics predict practical benefit.
Specialized training on rare-disease knowledge bases could produce more reliable outputs than general models alone.

Load-bearing premise

The search across major databases from January 2022 to March 2026 captured every relevant study on LLM-based rare disease patient education and communication without major omissions from database limits or keyword choices.

What would settle it

A later or broader search that locates many additional studies using specialized models, real patient records, longitudinal interactions, or evaluations of empathy and readability would show the field is further along than the review concludes.

Figures

Figures reproduced from arXiv: 2604.14179 by Anita Burgun, Kai Yu, Min Zeng, Rui Zhang, Xiaoyi Chen, Yu Hou, Zaifu Zhan.

**Figure 1.** Figure 1: PRISMA flowchart of study records. the diversity, heterogeneity, and evolving needs of rare disease populations, particularly in real-world settings, highlighting the need for scalable, adaptive, and patient-centered solutions. In this context, recent advances in large language models (LLMs) [13] offer new opportunities to rethink patient communication in rare diseases. LLMs have demonstrated strong capab… view at source ↗

**Figure 2.** Figure 2: Metadata of information from LLM-based rare disease patient communication and education studies included in this review. (a) Distribution of [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Rare diseases affect over 300 million people worldwide and are characterized by complex care pathways, limited clinical expertise, and substantial unmet communication needs throughout the long patient journey. Recent advances in large language models (LLMs) offer new opportunities to support patient education and communication, yet their application in rare diseases remains unclear. We conducted a scoping review of studies published between January 2022 and March 2026 across major databases, identifying 12 studies on LLM-based rare disease patient education and communication. Data were extracted on study characteristics, application scenarios, model usage, and evaluation methods, and synthesized using descriptive and qualitative analyses. The literature is highly recent and dominated by general-purpose models, particularly ChatGPT. Most studies focus on patient question answering using curated question sets, with limited use of real-world data or longitudinal communication scenarios. Evaluations are primarily centered on accuracy, with limited attention to patient-centered dimensions such as readability, empathy, and communication quality. Multilingual communication is rarely addressed. Overall, the field remains at an early stage. Future research should prioritize patient-centered design, domain-adapted methods, and real-world deployment to support safe, adaptive, and effective communication in rare diseases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is the first scoping review on LLMs for rare disease patient education, and it usefully flags the field's narrow focus on ChatGPT and accuracy, but the search methods are too lightly documented to fully trust the 'only 12 studies' picture.

read the letter

This scoping review pulls together the first overview of LLM use for rare disease patient education and communication. It identifies 12 very recent studies, almost all built on general models like ChatGPT, with most work limited to answering questions from hand-curated sets rather than real patient data or ongoing conversations. Evaluations lean heavily on accuracy while giving short shrift to readability, empathy, or actual communication outcomes, and multilingual needs get almost no attention.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a scoping review of LLM applications for rare disease patient education and communication. It searched major databases from January 2022 to March 2026, identified 12 studies, extracted data on characteristics, scenarios, models, and evaluations, and used descriptive/qualitative synthesis to conclude that the literature is recent and ChatGPT-dominated, focuses on curated QA sets with accuracy-centric evaluations, shows limited real-world/longitudinal or patient-centered elements (e.g., empathy, readability), and remains at an early stage with recommendations for future patient-centered and domain-adapted work.

Significance. If the 12-study sample is representative, the review usefully maps an emerging area relevant to over 300 million people with rare diseases, where communication needs are high. It provides a clear baseline by documenting model preferences, scenario limitations, and evaluation gaps, which can guide subsequent research toward safer, more adaptive LLM tools. The scoping design is appropriate for this nascent topic.

major comments (2)

[Methods] Methods: The search process is described only at a high level (major databases, January 2022–March 2026) with no explicit search strings, keyword/MeSH combinations, database list, inclusion/exclusion criteria, or PRISMA flow diagram. This makes it impossible to verify the completeness of the 12-study count or rule out omissions from synonym coverage or indexing, which directly affects the reliability of the headline synthesis on model dominance, scenario focus, and evaluation practices.
[Results] Results: The claims that 'most studies focus on patient question answering using curated question sets' and 'evaluations are primarily centered on accuracy' with 'limited attention to patient-centered dimensions' are presented without a supporting table or breakdown (e.g., counts or percentages across the 12 studies for each metric). This weakens the ability to assess the strength and distribution of these patterns.

minor comments (1)

[Abstract] The search end date of March 2026 should be clarified (e.g., whether it reflects a planned cutoff or requires updating), as it affects the currency of the review.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of our scoping review and for the constructive suggestions for minor revisions. We have carefully considered the comments on the Methods and Results sections and will incorporate the recommended changes to improve the manuscript's transparency and clarity.

read point-by-point responses

Referee: [Methods] Methods: The search process is described only at a high level (major databases, January 2022–March 2026) with no explicit search strings, keyword/MeSH combinations, database list, inclusion/exclusion criteria, or PRISMA flow diagram. This makes it impossible to verify the completeness of the 12-study count or rule out omissions from synonym coverage or indexing, which directly affects the reliability of the headline synthesis on model dominance, scenario focus, and evaluation practices.

Authors: We agree that a more detailed description of the search process is necessary to ensure the review's reproducibility. In the revised manuscript, we will provide the explicit search strings and keyword combinations used in each database, a complete list of the databases searched, the full inclusion and exclusion criteria, and a PRISMA flow diagram illustrating the identification and selection of the 12 studies. This will enable verification of the search strategy and support the reliability of our findings on model usage and evaluation practices. revision: yes
Referee: [Results] Results: The claims that 'most studies focus on patient question answering using curated question sets' and 'evaluations are primarily centered on accuracy' with 'limited attention to patient-centered dimensions' are presented without a supporting table or breakdown (e.g., counts or percentages across the 12 studies for each metric). This weakens the ability to assess the strength and distribution of these patterns.

Authors: We appreciate this observation and agree that quantitative breakdowns would strengthen the results section. We will add a summary table to the revised manuscript that reports the number and percentage of studies for each category of application scenario (e.g., curated question-answering), model type (e.g., ChatGPT), and evaluation focus (accuracy vs. patient-centered metrics such as empathy and readability). This table will provide the requested counts and percentages, allowing readers to better evaluate the distribution and strength of the observed patterns. revision: yes

Circularity Check

0 steps flagged

No significant circularity in descriptive scoping review synthesis

full rationale

This is a scoping review that performs descriptive and qualitative synthesis of 12 external studies identified via database search. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided text. The central claims (recent literature, ChatGPT dominance, curated QA focus, accuracy-centric evaluation) are direct summaries of the included papers rather than results that reduce to the paper's own inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are present. The search methodology is stated as a standard scoping process (Jan 2022–Mar 2026 across major databases) without any internal loop that would make the count or gap analysis circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard scoping review assumptions about database completeness and literature coverage without introducing new parameters, axioms beyond established methods, or invented entities.

axioms (1)

standard math Standard scoping review methodology following established guidelines for literature identification and synthesis
Invoked implicitly in the description of database searches, data extraction, and descriptive/qualitative analyses.

pith-pipeline@v0.9.0 · 5527 in / 1203 out tokens · 28240 ms · 2026-05-14T21:55:53.755058+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We conducted a scoping review of studies published between January 2022 and March 2026 across major databases, identifying 12 studies on LLM-based rare disease patient education and communication.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Evaluations are primarily centered on accuracy, with limited attention to patient-centered dimensions such as readability, empathy, and communication quality.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 5 internal anchors

[1]

The landscape for rare diseases in 2024,

The Lancet Global Health, “The landscape for rare diseases in 2024,” The Lancet Global Health, vol. 12, no. 3, p. e341, 2024. [Online]. Available: https://doi.org/10.1016/S2214-109X(24)00056-1

work page doi:10.1016/s2214-109x(24)00056-1 2024
[2]

100,000 genomes pilot on rare-disease diagnosis in health care—preliminary report,

G. P. P. I. 100, “100,000 genomes pilot on rare-disease diagnosis in health care—preliminary report,”New England Journal of Medicine, vol. 385, no. 20, pp. 1868–1880, 2021

work page 2021
[3]

The diagnostic odyssey: insights from parents of children living with an undiagnosed condition,

A. Bauskis, C. Strange, C. Molster, and C. Fisher, “The diagnostic odyssey: insights from parents of children living with an undiagnosed condition,”Orphanet journal of rare diseases, vol. 17, no. 1, p. 233, 2022

work page 2022
[4]

Challenges in the clinical management of rare diseases and center-based multidisciplinary approach to creating solutions,

D. Gunes, M. Karaca, A. Durmus, B. Ak, N. Aktay Ayaz, Z. Altınel, A. Aslanger, F. Atalar, M. Balci, L. Bilginet al., “Challenges in the clinical management of rare diseases and center-based multidisciplinary approach to creating solutions,”European Journal of Pediatrics, vol. 184, no. 5, p. 281, 2025

work page 2025
[5]

Embracing the unknown: investigating medical commu- nication around uncertainty and the implications on patient and family well-being,

L. Devisetti, “Embracing the unknown: investigating medical commu- nication around uncertainty and the implications on patient and family well-being,”Orphanet Journal of Rare Diseases, vol. 19, no. 1, p. 37, 2024

work page 2024
[6]

Patient passports for rare diseases: results of a pilot study,

J. Balfour, V . Morrison, L. Seed, J. Clymer, E. Warnants, A. Lampkin, S. M. Leiter, and G. Chandratillake, “Patient passports for rare diseases: results of a pilot study,”European Journal of Human Genetics, vol. 34, no. 1, pp. 99–107, 2026

work page 2026
[7]

Reimagining care of people living with rare diseases with artificial intelligence,

T. Groza, G. Baynam, and S. S. Jamuar, “Reimagining care of people living with rare diseases with artificial intelligence,”Plos Medicine, vol. 23, no. 2, p. e1004966, 2026

work page 2026
[8]

Rare disease research: breaking the privacy barrier,

D. Mascalzoni, A. Paradiso, and M. Hansson, “Rare disease research: breaking the privacy barrier,”Applied & Translational Genomics, vol. 3, no. 2, pp. 23–29, 2014

work page 2014
[9]

Effective provider-patient communication of a rare disease diagnosis: A qualitative study of people diagnosed with schwannomato- sis,

V . L. Merker, S. R. Plotkin, M. P. Charns, M. Meterko, J. T. Jordan, and A. R. Elwy, “Effective provider-patient communication of a rare disease diagnosis: A qualitative study of people diagnosed with schwannomato- sis,”Patient education and counseling, vol. 104, no. 4, pp. 808–814, 2021

work page 2021
[10]

Capturing real-world rare disease patient journeys: Are current methodologies sufficient for informed healthcare decisions?

K. A. Cribbs, L. T. Blackmore, A. R. Banks, D. S. Kim, and B. J. Lahue, “Capturing real-world rare disease patient journeys: Are current methodologies sufficient for informed healthcare decisions?”Journal of Evaluation in Clinical Practice, vol. 31, no. 1, p. e70010, 2025

work page 2025
[11]

A scoping review of health literacy in rare disorders: key issues and research directions,

U. Stenberg, L. Westfal, A. Dybesland Rosenberger, K. Ørstavik, M. Flink, H. Holmen, S. Systad, K. F. Westermann, and G. Velvin, “A scoping review of health literacy in rare disorders: key issues and research directions,”Orphanet Journal of Rare Diseases, vol. 19, no. 1, p. 328, 2024

work page 2024
[12]

Global health for rare diseases through primary care,

G. Baynam, A. L. Hartman, M. C. V . Letinturier, M. Bolz-Johnson, P. Carrion, A. C. Grady, X. Dong, M. Dooms, L. Dreyer, H. Graessner et al., “Global health for rare diseases through primary care,”The Lancet Global Health, vol. 12, no. 7, pp. e1192–e1199, 2024

work page 2024
[13]

Large language models for disease diagnosis: A scoping review,

S. Zhou, Z. Xu, M. Zhang, C. Xu, Y . Guo, Z. Zhan, Y . Fang, S. Ding, J. Wang, K. Xuet al., “Large language models for disease diagnosis: A scoping review,”npj Artificial Intelligence, vol. 1, no. 1, p. 9, 2025

work page 2025
[14]

Ramie: retrieval-augmented multi-task information extraction with large language models on dietary supplements,

Z. Zhan, S. Zhou, M. Li, and R. Zhang, “Ramie: retrieval-augmented multi-task information extraction with large language models on dietary supplements,”Journal of the American Medical Informatics Association, vol. 32, no. 3, pp. 545–554, 2025

work page 2025
[15]

The paradox of artificial intelligence (ai) and narrative-based medicine: challenges and potential for enhanced patient care,

N. Ghenimi, R. Govender, and K. Moodley, “The paradox of artificial intelligence (ai) and narrative-based medicine: challenges and potential for enhanced patient care,”AI & SOCIETY, pp. 1–7, 2025

work page 2025
[16]

Benchmark- ing gpt-5 for biomedical natural language processing,

Y . Hou, Z. Zhan, M. Zeng, Y . Wu, S. Zhou, and R. Zhang, “Benchmark- ing gpt-5 for biomedical natural language processing,”arXiv preprint arXiv:2509.04462, 2025

work page arXiv 2025
[17]

Medcl-bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning,

M. Zeng, S. Zhou, Z. Zhan, and R. Zhang, “Medcl-bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.16738

work page arXiv 2026
[18]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

“can chatgpt answer patient’s questions?

D. Tao, K. M. Kochendorfer, T. Griffin, Q. McCrary, A. Gautam, B. S. Labib, M. Arvan, J. Flynn, and K. Jiang, ““can chatgpt answer patient’s questions?”: a preliminary analysis,” inMEDINFO 2025—Healthcare Smart×Medicine Deep. IOS Press, 2025, pp. 1586–1587

work page 2025
[20]

An academic evaluation of chatgpt’s ability and accuracy in creating patient education resources for rare cardiovascular diseases,

S. Sevinc ¸, M. Candemir, B. A. Yamak, E. Kızıltunc ¸, B. Sezen¨oz, O. B. S ¸ahin, S. Topal, Y . Demir, M. R. Yalc ¸ın, and A. S ¸ahinarslan, “An academic evaluation of chatgpt’s ability and accuracy in creating patient education resources for rare cardiovascular diseases,”Scientific Reports, vol. 15, no. 1, p. 25929, 2025

work page 2025
[21]

Artificial intelligence chatbots and narcolepsy: friend or foe for patient information?

F. Henriques, C. Costa, B. Oliveiros, J. B. Melo, C. Santos, and J. Jesus- Ribeiro, “Artificial intelligence chatbots and narcolepsy: friend or foe for patient information?”European Neurology, vol. 88, no. 3-4, pp. 122– 128, 2025

work page 2025
[22]

Artificial intelligence large language model chatgpt: is it a trustworthy and reliable source of information for sarcoma patients?

M. Valentini, J. Szkandera, M. A. Smolle, S. Scheipl, A. Leithner, and D. Andreou, “Artificial intelligence large language model chatgpt: is it a trustworthy and reliable source of information for sarcoma patients?” Frontiers in Public Health, vol. 12, p. 1303319, 2024

work page 2024
[23]

Assessing the application of large language models in generating dermatologic patient education materials according to reading level: qualitative study,

R. Lambert, Z.-Y . Choo, K. Gradwohl, L. Schroedl, and A. Ruiz De Luzuriaga, “Assessing the application of large language models in generating dermatologic patient education materials according to reading level: qualitative study,”JMIR dermatology, vol. 7, p. e55898, 2024

work page 2024
[24]

Automating evaluation of llm-generated responses to patient questions about rare diseases,

M. Zhao, I. Y . Oh, A. Gupta, S. Cohen-Cutler, K. M. Harmoney, A. M. Lai, and B. A. Sisk, “Automating evaluation of llm-generated responses to patient questions about rare diseases,”medRxiv, pp. 2025–10, 2025

work page 2025
[25]

Chatgpt, gemini, and grok on familial mediterranean fever: are they trustworthy?

S. Cilli Hayıro ˘glu and T. Bozkurt, “Chatgpt, gemini, and grok on familial mediterranean fever: are they trustworthy?”Clinical Rheumatology, vol. 45, no. 1, pp. 521–530, 2026

work page 2026
[26]

Enhancing rare disease education through ai-driven podcast genera- tion,

E. Perez-Palma, I. Miller, K. Johannesen, L. Chaby, L. Randall, M. Graglia, C. Grzeskowiak, L. Manaster, L. Lubbers, A. Freedet al., “Enhancing rare disease education through ai-driven podcast genera- tion,”medRxiv, pp. 2025–01, 2025

work page 2025
[27]

Evaluating the use of generative artificial intelligence to support genetic counseling for rare diseases,

S. Jeon, S.-A. Lee, H.-S. Chung, J. Y . Yun, E. A. Park, M.-K. So, and J. Huh, “Evaluating the use of generative artificial intelligence to support genetic counseling for rare diseases,”Diagnostics, vol. 15, no. 6, p. 672, 2025

work page 2025
[28]

Large language models in rare disease: accuracy in addressing fibromuscular dysplasia questions,

L. Tefera, A. Rosenzveig, J. Rajendran, B. Rajasekar, J. Kassab, D. Hor- nacek, M. McCarthy, T. Wu, N. F. Mahlay, and P. Chaudhury, “Large language models in rare disease: accuracy in addressing fibromuscular dysplasia questions,”VASA. Zeitschrift fur Gefasskrankheiten, vol. 54, no. 3, pp. 218–219, 2025

work page 2025
[29]

Medbot vs realdoc: efficacy of large language modeling in physician-patient communication for rare diseases,

M. T. Weber, R. Noll, A. Marchl, C. Facchinello, A. Gr ¨unewaldt, C. H ¨ugel, K. Musleh, T. O. Wagner, H. Storf, and J. Schaaf, “Medbot vs realdoc: efficacy of large language modeling in physician-patient communication for rare diseases,”Journal of the American Medical Informatics Association, vol. 32, no. 5, pp. 775–783, 2025

work page 2025
[30]

A. M. van Eerde, A. Teixeira, F. Galletti, M. Maternik, V . Capone, R. Westland, J. Mulder, J. Halbritter, T. Osterholt, V . Neukelet al., “Risks and benefits of chatgpt in informing patients and families with rare kidney diseases: an explorative assessment by the european rare kidney disease reference network (erknet),”Pediatric Nephrology, vol. 40, no. ...

work page 2025
[31]

Qwen Technical Report

J. Bai, S. Bai, Y . Chu, Z. Cui, K. Dang, X. Deng, Y . Fan, W. Ge, Y . Han, F. Huanget al., “Qwen technical report,”arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Gemma: Open Models Based on Gemini Research and Technology

G. Team, T. Mesnard, C. Hardin, R. Dadashi, S. Bhupatiraju, S. Pathak, L. Sifre, M. Rivi `ere, M. S. Kale, J. Loveet al., “Gemma: Open models based on gemini research and technology,”arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Large lan- guage models in patient education: a scoping review of applications in medicine,

S. Aydin, M. Karabacak, V . Vlachos, and K. Margetis, “Large lan- guage models in patient education: a scoping review of applications in medicine,”Frontiers in medicine, vol. 11, p. 1477898, 2024

work page 2024
[36]

The use of large language models in generating patient education materials: a scoping review,

A. AlSammarraie and M. Househ, “The use of large language models in generating patient education materials: a scoping review,”Acta Informatica Medica, vol. 33, no. 1, p. 4, 2025

work page 2025
[37]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” Iclr, vol. 1, no. 2, p. 3, 2022

work page 2022
[38]

Prompting large language models for zero-shot clinical prediction with structured longitudinal electronic health record data.arXiv preprint arXiv:2402.01713, 2024

Y . Zhu, Z. Wang, J. Gao, Y . Tong, J. An, W. Liao, E. M. Harrison, L. Ma, and C. Pan, “Prompting large language models for zero-shot clinical prediction with structured longitudinal electronic health record data,”arXiv preprint arXiv:2402.01713, 2024

work page arXiv 2024
[39]

Mmrag: multi-mode retrieval-augmented generation with large language models for biomed- ical in-context learning,

Z. Zhan, J. Wang, S. Zhou, J. Deng, and R. Zhang, “Mmrag: multi-mode retrieval-augmented generation with large language models for biomed- ical in-context learning,”Journal of the American Medical Informatics Association, vol. 32, no. 10, pp. 1505–1516, 2025

work page 2025
[40]

Retrieval-augmented in-context learning for multimodal large language models in disease classification,

Z. Zhan, S. Zhou, X. Zhou, Y . Xiao, J. Wang, J. Deng, H. Zhu, Y . Hou, Y . Song, M. Lin, and R. Zhang, “Retrieval-augmented in-context learning for multimodal large language models in disease classification,”Journal of Biomedical Informatics, vol. 178, p. 105017, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1532046426000419

work page 2026
[41]

Cross-cultural adaptation framework for enhancing large language model outputs in multilingual contexts,

X. Luoet al., “Cross-cultural adaptation framework for enhancing large language model outputs in multilingual contexts,”Journal of Advanced Computing Systems, vol. 3, no. 5, pp. 48–62, 2023

work page 2023

[1] [1]

The landscape for rare diseases in 2024,

The Lancet Global Health, “The landscape for rare diseases in 2024,” The Lancet Global Health, vol. 12, no. 3, p. e341, 2024. [Online]. Available: https://doi.org/10.1016/S2214-109X(24)00056-1

work page doi:10.1016/s2214-109x(24)00056-1 2024

[2] [2]

100,000 genomes pilot on rare-disease diagnosis in health care—preliminary report,

G. P. P. I. 100, “100,000 genomes pilot on rare-disease diagnosis in health care—preliminary report,”New England Journal of Medicine, vol. 385, no. 20, pp. 1868–1880, 2021

work page 2021

[3] [3]

The diagnostic odyssey: insights from parents of children living with an undiagnosed condition,

A. Bauskis, C. Strange, C. Molster, and C. Fisher, “The diagnostic odyssey: insights from parents of children living with an undiagnosed condition,”Orphanet journal of rare diseases, vol. 17, no. 1, p. 233, 2022

work page 2022

[4] [4]

Challenges in the clinical management of rare diseases and center-based multidisciplinary approach to creating solutions,

D. Gunes, M. Karaca, A. Durmus, B. Ak, N. Aktay Ayaz, Z. Altınel, A. Aslanger, F. Atalar, M. Balci, L. Bilginet al., “Challenges in the clinical management of rare diseases and center-based multidisciplinary approach to creating solutions,”European Journal of Pediatrics, vol. 184, no. 5, p. 281, 2025

work page 2025

[5] [5]

Embracing the unknown: investigating medical commu- nication around uncertainty and the implications on patient and family well-being,

L. Devisetti, “Embracing the unknown: investigating medical commu- nication around uncertainty and the implications on patient and family well-being,”Orphanet Journal of Rare Diseases, vol. 19, no. 1, p. 37, 2024

work page 2024

[6] [6]

Patient passports for rare diseases: results of a pilot study,

J. Balfour, V . Morrison, L. Seed, J. Clymer, E. Warnants, A. Lampkin, S. M. Leiter, and G. Chandratillake, “Patient passports for rare diseases: results of a pilot study,”European Journal of Human Genetics, vol. 34, no. 1, pp. 99–107, 2026

work page 2026

[7] [7]

Reimagining care of people living with rare diseases with artificial intelligence,

T. Groza, G. Baynam, and S. S. Jamuar, “Reimagining care of people living with rare diseases with artificial intelligence,”Plos Medicine, vol. 23, no. 2, p. e1004966, 2026

work page 2026

[8] [8]

Rare disease research: breaking the privacy barrier,

D. Mascalzoni, A. Paradiso, and M. Hansson, “Rare disease research: breaking the privacy barrier,”Applied & Translational Genomics, vol. 3, no. 2, pp. 23–29, 2014

work page 2014

[9] [9]

Effective provider-patient communication of a rare disease diagnosis: A qualitative study of people diagnosed with schwannomato- sis,

V . L. Merker, S. R. Plotkin, M. P. Charns, M. Meterko, J. T. Jordan, and A. R. Elwy, “Effective provider-patient communication of a rare disease diagnosis: A qualitative study of people diagnosed with schwannomato- sis,”Patient education and counseling, vol. 104, no. 4, pp. 808–814, 2021

work page 2021

[10] [10]

Capturing real-world rare disease patient journeys: Are current methodologies sufficient for informed healthcare decisions?

K. A. Cribbs, L. T. Blackmore, A. R. Banks, D. S. Kim, and B. J. Lahue, “Capturing real-world rare disease patient journeys: Are current methodologies sufficient for informed healthcare decisions?”Journal of Evaluation in Clinical Practice, vol. 31, no. 1, p. e70010, 2025

work page 2025

[11] [11]

A scoping review of health literacy in rare disorders: key issues and research directions,

U. Stenberg, L. Westfal, A. Dybesland Rosenberger, K. Ørstavik, M. Flink, H. Holmen, S. Systad, K. F. Westermann, and G. Velvin, “A scoping review of health literacy in rare disorders: key issues and research directions,”Orphanet Journal of Rare Diseases, vol. 19, no. 1, p. 328, 2024

work page 2024

[12] [12]

Global health for rare diseases through primary care,

G. Baynam, A. L. Hartman, M. C. V . Letinturier, M. Bolz-Johnson, P. Carrion, A. C. Grady, X. Dong, M. Dooms, L. Dreyer, H. Graessner et al., “Global health for rare diseases through primary care,”The Lancet Global Health, vol. 12, no. 7, pp. e1192–e1199, 2024

work page 2024

[13] [13]

Large language models for disease diagnosis: A scoping review,

S. Zhou, Z. Xu, M. Zhang, C. Xu, Y . Guo, Z. Zhan, Y . Fang, S. Ding, J. Wang, K. Xuet al., “Large language models for disease diagnosis: A scoping review,”npj Artificial Intelligence, vol. 1, no. 1, p. 9, 2025

work page 2025

[14] [14]

Ramie: retrieval-augmented multi-task information extraction with large language models on dietary supplements,

Z. Zhan, S. Zhou, M. Li, and R. Zhang, “Ramie: retrieval-augmented multi-task information extraction with large language models on dietary supplements,”Journal of the American Medical Informatics Association, vol. 32, no. 3, pp. 545–554, 2025

work page 2025

[15] [15]

The paradox of artificial intelligence (ai) and narrative-based medicine: challenges and potential for enhanced patient care,

N. Ghenimi, R. Govender, and K. Moodley, “The paradox of artificial intelligence (ai) and narrative-based medicine: challenges and potential for enhanced patient care,”AI & SOCIETY, pp. 1–7, 2025

work page 2025

[16] [16]

Benchmark- ing gpt-5 for biomedical natural language processing,

Y . Hou, Z. Zhan, M. Zeng, Y . Wu, S. Zhou, and R. Zhang, “Benchmark- ing gpt-5 for biomedical natural language processing,”arXiv preprint arXiv:2509.04462, 2025

work page arXiv 2025

[17] [17]

Medcl-bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning,

M. Zeng, S. Zhou, Z. Zhan, and R. Zhang, “Medcl-bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.16738

work page arXiv 2026

[18] [18]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

“can chatgpt answer patient’s questions?

D. Tao, K. M. Kochendorfer, T. Griffin, Q. McCrary, A. Gautam, B. S. Labib, M. Arvan, J. Flynn, and K. Jiang, ““can chatgpt answer patient’s questions?”: a preliminary analysis,” inMEDINFO 2025—Healthcare Smart×Medicine Deep. IOS Press, 2025, pp. 1586–1587

work page 2025

[20] [20]

An academic evaluation of chatgpt’s ability and accuracy in creating patient education resources for rare cardiovascular diseases,

S. Sevinc ¸, M. Candemir, B. A. Yamak, E. Kızıltunc ¸, B. Sezen¨oz, O. B. S ¸ahin, S. Topal, Y . Demir, M. R. Yalc ¸ın, and A. S ¸ahinarslan, “An academic evaluation of chatgpt’s ability and accuracy in creating patient education resources for rare cardiovascular diseases,”Scientific Reports, vol. 15, no. 1, p. 25929, 2025

work page 2025

[21] [21]

Artificial intelligence chatbots and narcolepsy: friend or foe for patient information?

F. Henriques, C. Costa, B. Oliveiros, J. B. Melo, C. Santos, and J. Jesus- Ribeiro, “Artificial intelligence chatbots and narcolepsy: friend or foe for patient information?”European Neurology, vol. 88, no. 3-4, pp. 122– 128, 2025

work page 2025

[22] [22]

Artificial intelligence large language model chatgpt: is it a trustworthy and reliable source of information for sarcoma patients?

M. Valentini, J. Szkandera, M. A. Smolle, S. Scheipl, A. Leithner, and D. Andreou, “Artificial intelligence large language model chatgpt: is it a trustworthy and reliable source of information for sarcoma patients?” Frontiers in Public Health, vol. 12, p. 1303319, 2024

work page 2024

[23] [23]

Assessing the application of large language models in generating dermatologic patient education materials according to reading level: qualitative study,

R. Lambert, Z.-Y . Choo, K. Gradwohl, L. Schroedl, and A. Ruiz De Luzuriaga, “Assessing the application of large language models in generating dermatologic patient education materials according to reading level: qualitative study,”JMIR dermatology, vol. 7, p. e55898, 2024

work page 2024

[24] [24]

Automating evaluation of llm-generated responses to patient questions about rare diseases,

M. Zhao, I. Y . Oh, A. Gupta, S. Cohen-Cutler, K. M. Harmoney, A. M. Lai, and B. A. Sisk, “Automating evaluation of llm-generated responses to patient questions about rare diseases,”medRxiv, pp. 2025–10, 2025

work page 2025

[25] [25]

Chatgpt, gemini, and grok on familial mediterranean fever: are they trustworthy?

S. Cilli Hayıro ˘glu and T. Bozkurt, “Chatgpt, gemini, and grok on familial mediterranean fever: are they trustworthy?”Clinical Rheumatology, vol. 45, no. 1, pp. 521–530, 2026

work page 2026

[26] [26]

Enhancing rare disease education through ai-driven podcast genera- tion,

E. Perez-Palma, I. Miller, K. Johannesen, L. Chaby, L. Randall, M. Graglia, C. Grzeskowiak, L. Manaster, L. Lubbers, A. Freedet al., “Enhancing rare disease education through ai-driven podcast genera- tion,”medRxiv, pp. 2025–01, 2025

work page 2025

[27] [27]

Evaluating the use of generative artificial intelligence to support genetic counseling for rare diseases,

S. Jeon, S.-A. Lee, H.-S. Chung, J. Y . Yun, E. A. Park, M.-K. So, and J. Huh, “Evaluating the use of generative artificial intelligence to support genetic counseling for rare diseases,”Diagnostics, vol. 15, no. 6, p. 672, 2025

work page 2025

[28] [28]

Large language models in rare disease: accuracy in addressing fibromuscular dysplasia questions,

L. Tefera, A. Rosenzveig, J. Rajendran, B. Rajasekar, J. Kassab, D. Hor- nacek, M. McCarthy, T. Wu, N. F. Mahlay, and P. Chaudhury, “Large language models in rare disease: accuracy in addressing fibromuscular dysplasia questions,”VASA. Zeitschrift fur Gefasskrankheiten, vol. 54, no. 3, pp. 218–219, 2025

work page 2025

[29] [29]

Medbot vs realdoc: efficacy of large language modeling in physician-patient communication for rare diseases,

M. T. Weber, R. Noll, A. Marchl, C. Facchinello, A. Gr ¨unewaldt, C. H ¨ugel, K. Musleh, T. O. Wagner, H. Storf, and J. Schaaf, “Medbot vs realdoc: efficacy of large language modeling in physician-patient communication for rare diseases,”Journal of the American Medical Informatics Association, vol. 32, no. 5, pp. 775–783, 2025

work page 2025

[30] [30]

A. M. van Eerde, A. Teixeira, F. Galletti, M. Maternik, V . Capone, R. Westland, J. Mulder, J. Halbritter, T. Osterholt, V . Neukelet al., “Risks and benefits of chatgpt in informing patients and families with rare kidney diseases: an explorative assessment by the european rare kidney disease reference network (erknet),”Pediatric Nephrology, vol. 40, no. ...

work page 2025

[31] [31]

Qwen Technical Report

J. Bai, S. Bai, Y . Chu, Z. Cui, K. Dang, X. Deng, Y . Fan, W. Ge, Y . Han, F. Huanget al., “Qwen technical report,”arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Biet al., “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[33] [33]

Gemma: Open Models Based on Gemini Research and Technology

G. Team, T. Mesnard, C. Hardin, R. Dadashi, S. Bhupatiraju, S. Pathak, L. Sifre, M. Rivi `ere, M. S. Kale, J. Loveet al., “Gemma: Open models based on gemini research and technology,”arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

Large lan- guage models in patient education: a scoping review of applications in medicine,

S. Aydin, M. Karabacak, V . Vlachos, and K. Margetis, “Large lan- guage models in patient education: a scoping review of applications in medicine,”Frontiers in medicine, vol. 11, p. 1477898, 2024

work page 2024

[36] [36]

The use of large language models in generating patient education materials: a scoping review,

A. AlSammarraie and M. Househ, “The use of large language models in generating patient education materials: a scoping review,”Acta Informatica Medica, vol. 33, no. 1, p. 4, 2025

work page 2025

[37] [37]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” Iclr, vol. 1, no. 2, p. 3, 2022

work page 2022

[38] [38]

Prompting large language models for zero-shot clinical prediction with structured longitudinal electronic health record data.arXiv preprint arXiv:2402.01713, 2024

Y . Zhu, Z. Wang, J. Gao, Y . Tong, J. An, W. Liao, E. M. Harrison, L. Ma, and C. Pan, “Prompting large language models for zero-shot clinical prediction with structured longitudinal electronic health record data,”arXiv preprint arXiv:2402.01713, 2024

work page arXiv 2024

[39] [39]

Mmrag: multi-mode retrieval-augmented generation with large language models for biomed- ical in-context learning,

Z. Zhan, J. Wang, S. Zhou, J. Deng, and R. Zhang, “Mmrag: multi-mode retrieval-augmented generation with large language models for biomed- ical in-context learning,”Journal of the American Medical Informatics Association, vol. 32, no. 10, pp. 1505–1516, 2025

work page 2025

[40] [40]

Retrieval-augmented in-context learning for multimodal large language models in disease classification,

Z. Zhan, S. Zhou, X. Zhou, Y . Xiao, J. Wang, J. Deng, H. Zhu, Y . Hou, Y . Song, M. Lin, and R. Zhang, “Retrieval-augmented in-context learning for multimodal large language models in disease classification,”Journal of Biomedical Informatics, vol. 178, p. 105017, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1532046426000419

work page 2026

[41] [41]

Cross-cultural adaptation framework for enhancing large language model outputs in multilingual contexts,

X. Luoet al., “Cross-cultural adaptation framework for enhancing large language model outputs in multilingual contexts,”Journal of Advanced Computing Systems, vol. 3, no. 5, pp. 48–62, 2023

work page 2023