pith. sign in

arxiv: 2604.19774 · v1 · submitted 2026-03-27 · 💻 cs.CL · cs.AI

Phase 1 Implementation of LLM-generated Discharge Summaries showing high Adoption in a Dutch Academic Hospital

Pith reviewed 2026-05-14 22:47 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords LLMdischarge summariesEHR integrationclinical documentationadoptionpilot studyAI in healthcaretime savings
0
0 comments X

The pith

LLM-generated discharge summaries were copied in 58.5% of admissions with 91.3% of users intending to continue use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports results from a nine-week pilot of an EHR-integrated LLM that generated draft discharge summaries for 379 admissions by 25 clinical users at a Dutch academic hospital. LLM text was directly copied in 58.5% of cases and remained identifiable in 29.1% of final letters. Self-reported reductions in documentation time reached 86.9% of users, with 60.9% noting lower administrative workload and 91.3% planning to keep using the tool.

Core claim

An EHR-integrated LLM produced draft discharge summaries that were copied directly in 58.5% of admissions, leaving traceable LLM content in 29.1% of completed letters. Users reported reduced documentation time in 86.9% of cases and reduced workload in 60.9%, with 91.3% expressing intent to continue after the pilot.

What carries the argument

The EHR-integrated LLM that generates initial drafts of discharge summaries for users to copy or edit directly in clinical workflow.

If this is right

  • LLM drafts achieve direct incorporation into clinical notes at rates high enough to alter routine documentation practice.
  • User intent to continue supports scaling the tool beyond the pilot phase in similar hospital settings.
  • Self-reported efficiency gains suggest potential release of clinician time for direct patient care.
  • Future phases will require objective time-tracking methods to confirm efficiency claims.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Comparable adoption might occur for other EHR documentation tasks such as progress notes if similar integration is used.
  • Hospitals without native EHR-LLM links may see lower uptake unless workflow friction is minimized.
  • Quality and error-rate audits of the retained LLM text will be needed before full deployment.
  • Broader rollout could test whether cumulative time savings affect overall clinician workload and retention.

Load-bearing premise

Self-reported reductions in documentation time and workload accurately reflect real time savings, despite the acknowledged difficulty of precise measurement.

What would settle it

A controlled time-motion study that records actual minutes spent on discharge summaries with and without the LLM and detects no meaningful difference in duration.

Figures

Figures reproduced from arXiv: 2604.19774 by Anne H Hoekman, Charlotte M H H T Bootsma-Robroeks, Jacobien H F Oosterhoff, Job N Doornberg, Katerina Kagialari, Nettuno Nadalini, Rosanne C Schoonbeek, Tarannom Mehri, Tom P van der Laan.

Figure 2
Figure 2. Figure 2: Pair of LLM-generated discharge summary (left) and the discharge summary sent in the final letter (right). The texts were de-identified and translated from Dutch to English word-by-word to maintain semantic structure and are not grammatically correct. The highlighted text is identical between the two. The manual score was obtained by dividing the number of identical words by the total number of words in th… view at source ↗
Figure 5
Figure 5. Figure 5: Agreement with the given statements about the effects of the EHR [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Data selection procedure for the extrinsic validation. Each box contains a [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
read the original abstract

Writing discharge summaries to transfer medical information is an important but time-consuming process that can be assisted by Large Language Models (LLMs). This prospective mixed methods pilot study evaluated an Electronic Health Record (EHR)-integrated LLM to generate discharge summaries drafts. In total, 379 discharge summaries were generated in clinical practice by 21 residents and 4 physician assistants during 9 weeks in our academic hospital. LLM-generated text was copied in 58.5% of admissions, and identifiable LLM content could be traced to 29.1% of final discharge letters. Notably, 86.9% of users self-reported a reduction in documentation time, and 60.9% a reduction in administrative workload. Intent to use after the pilot phase was high (91.3%), supporting further implementation of this use-case. Accurately measuring the documentation time of users on discharge summaries remains challenging, but will be necessary for future extrinsic evaluation of LLM-assisted documentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript reports on a prospective mixed-methods pilot study evaluating an EHR-integrated LLM for generating discharge summary drafts in a Dutch academic hospital. Over 9 weeks, 379 summaries were generated by 21 residents and 4 physician assistants. Key findings include LLM-generated text being copied in 58.5% of admissions, identifiable LLM content in 29.1% of final discharge letters, 86.9% of users self-reporting reduced documentation time, 60.9% reporting reduced administrative workload, and 91.3% expressing intent to continue use post-pilot.

Significance. If the reported adoption rates and user perceptions hold, this work demonstrates the feasibility of integrating LLMs into clinical workflows for documentation tasks, providing concrete usage data from a real-world setting. The objective metrics on copying and traceability rates offer valuable evidence of actual utilization, which could inform future implementations of AI tools in healthcare to reduce documentation burden.

major comments (3)
  1. [Results] Results section: The claims of reduced documentation time (86.9%) and administrative workload (60.9%) rely exclusively on self-reported survey data from users without any objective corroboration such as EHR time logs, baseline comparisons, or timed observations, despite the abstract explicitly noting the challenges in accurately measuring documentation time. This is load-bearing for the efficiency and adoption claims.
  2. [Methods and Results] Methods and Results sections: There is no assessment or reporting of the accuracy, factual correctness, or error rates in the LLM-generated discharge summaries. This omission is critical as it directly impacts the clinical validity and safety of the implementation claims.
  3. [Discussion] Discussion section: The high intent to continue use (91.3%) is presented as supporting further implementation, but without objective validation of time savings or quality metrics, the sustainability of adoption remains uncertain.
minor comments (2)
  1. [Abstract] Abstract: The abstract states '21 residents and 4 physician assistants' totaling 25 users, but survey percentages (e.g., 86.9%) imply a subset responded; clarify the exact number of survey respondents and response rate for transparency.
  2. Provide more details on the survey methodology, including exact questions used to elicit time reduction reports and any validation of the survey instrument.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our pilot study manuscript. We address each major comment below with clarifications based on the study's design and scope as a real-world implementation evaluation, and indicate revisions to improve transparency without overstating our findings.

read point-by-point responses
  1. Referee: [Results] Results section: The claims of reduced documentation time (86.9%) and administrative workload (60.9%) rely exclusively on self-reported survey data from users without any objective corroboration such as EHR time logs, baseline comparisons, or timed observations, despite the abstract explicitly noting the challenges in accurately measuring documentation time. This is load-bearing for the efficiency and adoption claims.

    Authors: We agree that these percentages derive solely from self-reported survey data and lack objective corroboration such as time logs. The manuscript already notes the inherent challenges in accurately measuring documentation time. As a pilot focused on real-world adoption rather than a controlled time-motion study, we did not collect baseline EHR logs or perform timed observations. We will revise the Results and Discussion sections to frame these explicitly as user perceptions, add stronger caveats, and emphasize the need for future objective evaluations to support efficiency claims. revision: partial

  2. Referee: [Methods and Results] Methods and Results sections: There is no assessment or reporting of the accuracy, factual correctness, or error rates in the LLM-generated discharge summaries. This omission is critical as it directly impacts the clinical validity and safety of the implementation claims.

    Authors: This is a fair critique. The study prioritized implementation metrics (e.g., copying in 58.5% of cases and traceability in 29.1% of final letters) and user adoption over content validation. No systematic accuracy or error-rate assessment was conducted, as it would have required expert review of all 379 summaries beyond the pilot's scope. We will add an explicit limitations paragraph in the Discussion to state this omission, discuss its implications for clinical safety, and recommend dedicated validation studies for future work. revision: yes

  3. Referee: [Discussion] Discussion section: The high intent to continue use (91.3%) is presented as supporting further implementation, but without objective validation of time savings or quality metrics, the sustainability of adoption remains uncertain.

    Authors: We acknowledge that sustainability claims would be strengthened by objective data. However, the 91.3% intent is presented alongside objective usage evidence of workflow integration (58.5% copying rate and 29.1% traceable content). We will revise the Discussion to more tightly couple the continuation intent with these objective metrics, qualify language around further implementation, and note the requirement for ongoing evaluation of time savings and quality to assess long-term sustainability. revision: partial

Circularity Check

0 steps flagged

No circularity: purely observational pilot with direct counts and surveys

full rationale

This manuscript reports an empirical mixed-methods pilot study with no equations, derivations, fitted parameters, or first-principles claims. All headline results (58.5% copying rate, 29.1% traceable LLM content, 86.9% self-reported time reduction, 91.3% continuation intent) are direct tallies from 379 generated summaries and survey responses from 25 users. No load-bearing step reduces to a self-citation, ansatz, or renamed input; the paper explicitly flags the difficulty of objective time measurement as a limitation rather than deriving any prediction from it. The work is self-contained observational reporting with no circular structure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters or invented entities. The study rests on the domain assumption that self-reported time and workload reductions are meaningful proxies for impact despite acknowledged measurement difficulties.

axioms (1)
  • domain assumption Self-reported survey responses on time reduction and workload accurately reflect actual changes in documentation effort
    The central adoption claims depend on these self-reports; the abstract explicitly notes the difficulty of objective measurement.

pith-pipeline@v0.9.0 · 5517 in / 1239 out tokens · 54550 ms · 2026-05-14T22:47:24.244368+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    Hesselink, G. et al. Improving patient handovers from hospital to primary care: A systematic review. Annals of Internal Medicine vol. 157 Preprint at https://doi.org/10.7326/0003-4819-157-6-201209180-00006 (2012)

  2. [2]

    Kripalani, S. et al. Deficits in communication and information transfer between hospital-based and primary care physicians: Implications for patient safety and continuity of care. JAMA vol. 297 Preprint at https://doi.org/10.1001/jama.297.8.831 (2007)

  3. [3]

    Souza, D. L. B. et al. Trends of multimorbidity in 15 European countries: a population-based study in community-dwelling adults aged 50 and over. BMC Public Health 21, (2021)

  4. [4]

    E., Xiang, J

    King, D. E., Xiang, J. & Pilkerton, C. S. Multimorbidity trends in United States adults, 1988–2014. Journal of the American Board of Family Medicine 31, (2018)

  5. [5]

    I., Ampt, A., Kearney, L

    Westbrook, J. I., Ampt, A., Kearney, L. & Rob, M. I. All in a day’s work: An observational study to quantify how and with whom doctors on hospital wards spend thier time. Medical Journal of Australia 188, (2008)

  6. [6]

    & Spötl, H

    Ammenwerth, E. & Spötl, H. P. The time needed for clinical documentation versus direct patient care - A work-sampling analysis of physicians’ activities. Methods Inf. Med. 48, (2009)

  7. [7]

    Horwitz, L. I. et al. Comprehensive quality of discharge summaries at an academic medical center. J. Hosp. Med. 8, (2013)

  8. [8]

    S., Berg, M

    Ash, J. S., Berg, M. & Coiera, E. Some Unintended Consequences of Information Technology in Health Care: The Nature of Patient Care Information System- related Errors. Journal of the American Medical Informatics Association 11, (2004)

  9. [9]

    & Fortune, E

    Schulte, F., Fry, E., Shulte, F. & Fortune, E. F. Death By 1,000 Clicks: Where Electronic Health Records Went Wrong. Kaiser Health News (2019)

  10. [10]

    Mamykina, L., Vawdrey, D. K. & Hripcsak, G. How do residents spend their shift time? A time and motion study with a particular focus on the use of computers. Academic Medicine 91, (2016)

  11. [11]

    & De Keizer, N

    Joukes, E., Abu-Hanna, A., Cornet, R. & De Keizer, N. F. Time Spent on Dedicated Patient Care and Documentation Tasks before and after the Introduction of a Structured and Standardized Electronic Health Record. Appl. Clin. Inform. 9, (2018)

  12. [12]

    Arndt, B. G. et al. Tethered to the EHR: Primary care physician workload assessment using EHR event log data and time-motion observations. Ann. Fam. Med. 15, (2017)

  13. [13]

    Sinsky, C. et al. Allocation of physician time in ambulatory practice: A time and motion study in 4 specialties. Ann. Intern. Med. 165, (2016)

  14. [14]

    K., Ouyang, D., Hom, J., Chi, J

    Wang, J. K., Ouyang, D., Hom, J., Chi, J. & Chen, J. H. Characterizing electronic health record usage patterns of inpatient medicine residents using event log data. PLoS One 14, (2019)

  15. [15]

    Friedberg, M. W. et al. Factors Affecting Physician Professional Satisfaction and Their Implications for Patient Care, Health Systems, and Health Policy. Rand Health Q. 3, (2014)

  16. [16]

    Shanafelt, T. D. et al. Relationship Between Clerical Burden and Characteristics of the Electronic Environment With Physician Burnout and Professional Satisfaction. Mayo Clin. Proc. 91, (2016)

  17. [17]

    Kossman, S. P. & Scheidenhelm, S. L. Nurses’ perceptions of the impact of electronic health records on work and patient outcomes. CIN - Computers Informatics Nursing 26, (2008)

  18. [18]

    Rao, S. K. et al. The impact of administrative burden on academic physicians: Results of a hospital-wide physician survey. Academic Medicine 92, (2017)

  19. [19]

    Thirunavukarasu, A. J. et al. Large language models in medicine. Nature Medicine vol. 29 Preprint at https://doi.org/10.1038/s41591-023-02448-8 (2023)

  20. [20]

    Large language models for reducing clinicians’ documentation burden

    Roberts, K. Large language models for reducing clinicians’ documentation burden. Nat. Med. 30, 942–943 (2024)

  21. [21]

    & Cook, T

    Tripathi, S., Sukumaran, R. & Cook, T. S. Efficient healthcare with large language models: optimizing clinical workflow and enhancing patient care. Journal of the American Medical Informatics Association vol. 31 Preprint at https://doi.org/10.1093/jamia/ocad258 (2024)

  22. [22]

    & Carman, M

    Fornasiere, R., Brunello, N., Scotti, V. & Carman, M. J. Medical Information Extraction with Large Language Models. https://aclanthology.org/2024.icnlsp- 1.47/ (2024)

  23. [23]

    Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, (2024)

  24. [24]

    Schoonbeek, R. C. et al. Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study. BMJ Open 15, e099301 (2025)

  25. [25]

    Small, W. R. et al. Evaluating Hospital Course Summarization by an Electronic Health Record–Based Large Language Model. JAMA Netw. Open 8, e2526339 (2025)

  26. [26]

    Williams, C. Y. K. et al. Physician- and Large Language Model–Generated Hospital Discharge Summaries. JAMA Intern. Med. 185, 818 (2025)

  27. [27]

    K., Saha, A

    Goswami, J., Prajapati, K. K., Saha, A. & Saha, A. K. Parameter-efficient fine-tuning large language model approach for hospital discharge paper summarization. Appl. Soft Comput. 157, 111531 (2024)

  28. [28]

    & Matsumoto, Y

    Ando, K., Okumura, T., Komachi, M., Horiguchi, H. & Matsumoto, Y. Is artificial intelligence capable of generating hospital discharge summaries from inpatient records? PLOS Digital Health 1, (2022)

  29. [29]

    Chua, C. E. et al. Integration of customised LLM for discharge summary generation in real-world clinical settings: a pilot study on RUSSELL GPT. Lancet Reg. Health West. Pac. 51, 101211 (2024)

  30. [30]

    & Dobson, R

    Searle, T., Ibrahim, Z., Teo, J. & Dobson, R. J. B. Discharge summary hospital course summarisation of in patient Electronic Health Record text with clinical concept guided deep pre-trained Transformer models. J. Biomed. Inform. 141, (2023)

  31. [31]

    Williams, C. Y. K. et al. Physician- and Large Language Model-Generated Hospital Discharge Summaries. JAMA Intern. Med. 185, 818–825 (2025)

  32. [32]

    & Heaton, H

    Challener, D., Ayanian, S., Ryu, A., O’Horo, J. & Heaton, H. Quality assessment of artificial intelligence-generated versus human-written hospital summaries evaluating detail, usefulness, and continuity of care. J. Hosp. Med. https://doi.org/10.1002/jhm.70163 (2025) doi:10.1002/jhm.70163

  33. [33]

    van de Sande, D. et al. To warrant clinical adoption AI models require a multi- faceted implementation evaluation. NPJ Digit. Med. 7, (2024)

  34. [34]

    & Krahmer, E

    van der Lee, C., Gatt, A., van Miltenburg, E., Wubben, S. & Krahmer, E. Best practices for the human evaluation of automatically generated text. in INLG 2019 - 12th International Conference on Natural Language Generation, Proceedings of the Conference (2019). doi:10.18653/v1/w19-8643

  35. [35]

    Bedi, S. et al. Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. JAMA 333, 319–328 (2025)

  36. [36]

    W., Park, J., Kim, J

    Song, J. W., Park, J., Kim, J. H. & You, S. C. Large Language Model Assistant for Emergency Department Discharge Documentation. JAMA Netw. Open 8, e2538427 (2025)

  37. [37]

    W., You, S

    Lee, S., Song, J. W., You, S. C. & Kim, J. H. Shifts in emergency physicians’ attitudes toward large language model-based documentation: a pre- and post- implementation study. Sci. Rep. 15, 40643 (2025)

  38. [38]

    Bootsma-Robroeks, C. M. H. H. T. et al. AI-generated draft replies to patient messages: exploring effects of implementation. Front. Digit. Health 7, 1588143 (2025)

  39. [39]

    & Shamszare, H

    Choudhury, A., Shahsavar, Y. & Shamszare, H. User Intent to Use DeepSeek for Health Care Purposes and Their Trust in the Large Language Model: Multinational Survey Study. JMIR Hum. Factors 12, e72867–e72867 (2025)

  40. [40]

    Blease, C. et al. Generative Artificial Intelligence in Primary Care: Qualitative Study of UK General Practitioners’ Views. J. Med. Internet Res. 27, e74428 (2025)

  41. [41]

    & Sinha, S

    Proctor, S., Lawton, G. & Sinha, S. An AI-Powered Strategy for Managing Patient Messaging Load and Reducing Burnout. Appl. Clin. Inform. 16, 747–752 (2025)

  42. [42]

    Ke, Y. H. et al. Clinical and economic impact of a large language model in perioperative medicine: a randomized crossover trial. NPJ Digit. Med. 8, 462 (2025)

  43. [43]

    Nofal, H. A. et al. The impact of an artificial intelligence enhancement program on healthcare providers’ knowledge, attitudes, and workplace flourishing. Front. Public Health 13, (2025)

  44. [44]

    Khan Rony, M. K. et al. Healthcare workers’ knowledge and attitudes regarding artificial intelligence adoption in healthcare: A cross-sectional study. Heliyon 10, e40775 (2024)

  45. [45]

    Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, (2025)

  46. [46]

    Ng, Z. Q. P., Ling, L. Y. J., Chew, H. S. J. & Lau, Y. The role of artificial intelligence in enhancing clinical nursing care: A scoping review. J. Nurs. Manag. 30, (2022)