Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization

Akshay S. Chaudhari; Anna Seehofnerova; Anuj Pareek; Asad Aali; Cara Van Uden; Christian Bluethgen; Curtis P. Langlotz; Dave Van Veen; Eduardo Pontes Reis; Jason Hom

arxiv: 2309.07430 · v5 · pith:5FL2LKOInew · submitted 2023-09-14 · 💻 cs.CL

Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization

Dave Van Veen , Cara Van Uden , Louis Blankemeier , Jean-Benoit Delbrouck , Asad Aali , Christian Bluethgen , Anuj Pareek , Malgorzata Polacin

show 11 more authors

Eduardo Pontes Reis Anna Seehofnerova Nidhi Rohatgi Poonam Hosamani William Collins Neera Ahuja Curtis P. Langlotz Jason Hom Sergios Gatidis John Pauly Akshay S. Chaudhari

This is my paper

classification 💻 cs.CL

keywords clinicalllmsmedicalexpertssummarizationlanguagemodelstasks

0 comments

read the original abstract

Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP), their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Quantitative assessments with syntactic, semantic, and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with ten physicians evaluates summary completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Literal Summarization: Redefining Hallucination for Medical SOAP Note Evaluation
cs.AI 2026-04 unverdicted novelty 5.0

Redefining hallucination evaluation for medical SOAP notes to credit clinical reasoning reduces reported hallucination rates from 35% to 9%.
Data-Centric Foundation Models in Computational Healthcare: A Survey
cs.LG 2024-01 unverdicted novelty 3.0

The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.