pith. machine review for the scientific record. sign in

arxiv: 2604.16364 · v1 · submitted 2026-03-21 · 💻 cs.CY · cs.AI· cs.CL

Recognition: 1 theorem link

· Lean Theorem

Clinical Note Bloat Reduction for Efficient LLM Use

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:36 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.CL
keywords clinical notesnote bloatLLM efficiencyEHR metadatadeduplicationpreprocessing pipelineinformation extractionclinical prediction
0
0 comments X

The pith

TRACE removes 47.3 percent of clinical note text while preserving LLM performance on extraction and prediction tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TRACE as a preprocessing pipeline that cuts duplicated and templated content from clinical notes by using electronic health record metadata to flag copied sections and frequency analysis as a fallback. This approach matters because modern notes are filled with repetitive text that inflates the token count and computational expense for large language models deployed in clinical decision support. The authors tested it on 5.3 million notes from liver transplantation, obstetrics, and inpatient cohorts. Blinded physician review and downstream modeling showed that information extraction and outcome prediction stayed intact after the reduction. The work frames this as a practical way to lower the cost of running LLMs at scale in health systems.

Core claim

TRACE is a scalable preprocessing pipeline that removes note bloat by leveraging EHR attribution metadata to identify templated and copied content and applying frequency-based deduplication when metadata are unavailable. Across four real-world clinical cohorts with 5.3 million notes, TRACE removed 47.3 percent of chart text while preserving performance for information extraction and clinical outcome prediction. At a large academic medical center this reduction corresponds to an estimated 9.5 million dollar annual decrease in LLM inference costs assuming one query per encounter.

What carries the argument

TRACE preprocessing pipeline, which identifies templated and copied content in clinical notes using EHR attribution metadata for targeted removal and frequency-based deduplication as backup.

If this is right

  • Reduces the volume of text fed to LLMs by nearly half, directly lowering inference compute and cost.
  • Keeps accuracy stable for clinical information extraction from notes.
  • Maintains predictive power for downstream clinical outcome models.
  • Delivers estimated annual savings of 9.5 million dollars in LLM costs at large medical centers.
  • Applies across diverse clinical areas including transplantation, obstetrics, and general inpatient care.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hospitals could run more frequent or complex LLM queries on patient records without proportional cost growth.
  • Similar metadata-driven deduplication might reduce repetitive text in other structured document domains such as legal or administrative records.
  • Lowering note volume could allow LLMs to handle longer patient histories within fixed context limits.
  • Wider adoption might encourage EHR vendors to improve metadata capture so that bloat reduction becomes automatic.

Load-bearing premise

Blinded physician review plus unchanged performance on information extraction and outcome prediction tasks is enough to confirm that no clinically important information is lost when bloat is removed.

What would settle it

A follow-up study in which physicians identify specific removed details that change their clinical decisions or miss key patient history elements in a meaningful fraction of cases.

Figures

Figures reproduced from arXiv: 2604.16364 by Asad Aali, Chloe Stanwyck, Emily Alsentzer, Emma Sun, Jordan L. Cahoon, Rachel Madding, Renumathy Dhanasekaran, Yixing Jiang.

Figure 1
Figure 1. Figure 1: TRACE Method. (A) TRACE identifies templated and copied text using Epic Clarity attribution metadata (Refer￾ence Module) and flags additional redundant text blocks using frequency-based de-duplication (Frequency Module). (B) Example TRACE-annotated note showing templated and copied spans with sources identified by the Reference and Frequency Modules. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: TRACE downstream tasks. (A) TRACE utility is evaluated through two downstream tasks: Information Extraction (IE) and Clinical Outcome Prediction. In IE, a large language model performs inference over batched notes and updates the prediction as new notes are added. In clinical outcome prediction, the most recent 32K tokens are used as input to predict a clinical variable. Notes can either be used for zero-s… view at source ↗
Figure 3
Figure 3. Figure 3: TRACE facilitates the analysis of templated and copied text in 1,000 randomly sampled patients [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: TRACE-Processed notes achieve comparable performance in information extraction (IE) to original notes with less [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: TRACE enables efficient clinical outcome prediction. [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
read the original abstract

Health systems are rapidly deploying large language models (LLMs) that use clinical notes for clinical decision support applications. However, modern documentation practices rely heavily on templates, copy--paste shortcuts, and auto-populated fields, producing extensive duplicated text (``note bloat'') that dilutes clinically meaningful signal and substantially increases the computational cost of LLM use. We introduce TRACE, a scalable preprocessing pipeline that removes note bloat by leveraging EHR attribution metadata to identify templated and copied content and applying frequency-based deduplication when metadata are unavailable. We evaluated TRACE across four real--world clinical cohorts spanning liver transplantation, obstetrics, and inpatient care (5.3 million notes) using blinded physician review and downstream modeling tasks. TRACE removed 47.3% of chart text while preserving performance for information extraction and clinical outcome prediction. At a large academic medical center, this reduction corresponds to an estimated $9.5 million annual decrease in LLM inference costs assuming one query per encounter. These findings show how underutilized EHR metadata can enable more scalable and cost-efficient deployment of LLM-based clinical systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TRACE, a scalable preprocessing pipeline that removes clinical note bloat by leveraging EHR attribution metadata to detect templated and copied content, with frequency-based deduplication as fallback. Evaluated on 5.3 million notes from four real-world cohorts (liver transplantation, obstetrics, inpatient care), it reports a 47.3% reduction in chart text while preserving performance on information extraction and clinical outcome prediction tasks, projecting $9.5M annual LLM inference cost savings at a large academic center.

Significance. If the claim of no clinically important information loss holds under broader scrutiny, the work has substantial practical significance for cost-efficient LLM deployment in healthcare. The large-scale, multi-cohort real-world evaluation with downstream task validation is a notable strength supporting scalability.

major comments (2)
  1. [§5.2] §5.2 (Evaluation): The central claim that TRACE removes only non-essential bloat while retaining all clinically important content rests on blinded physician review and unchanged performance on information extraction/outcome prediction; however, these checks may miss rare, narrative, or untested information losses (e.g., longitudinal reasoning), and no sample size, selection criteria, or inter-rater metrics are reported for the review.
  2. [§3.2] §3.2 (Methods): Exact deduplication thresholds, frequency cutoffs, and precise cohort inclusion/exclusion criteria are unspecified, undermining reproducibility of the 47.3% reduction figure and its generalizability across the four cohorts.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'note bloat' is used without a concise definition on first appearance, which could improve accessibility for readers outside clinical informatics.
  2. [Table 2] Table 2: Ensure all performance metrics include confidence intervals or statistical significance tests to strengthen the 'preserving performance' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us identify areas for clarification and improvement. We address each major comment point by point below. Where the manuscript requires additional detail for transparency and reproducibility, we have revised accordingly.

read point-by-point responses
  1. Referee: [§5.2] §5.2 (Evaluation): The central claim that TRACE removes only non-essential bloat while retaining all clinically important content rests on blinded physician review and unchanged performance on information extraction/outcome prediction; however, these checks may miss rare, narrative, or untested information losses (e.g., longitudinal reasoning), and no sample size, selection criteria, or inter-rater metrics are reported for the review.

    Authors: We agree that the original manuscript did not provide sufficient detail on the blinded physician review process. In the revised version, we will add the sample size, selection criteria (stratified random sampling across the four cohorts), and inter-rater metrics to the Evaluation section. We also acknowledge that downstream tasks cannot fully rule out loss of rare or longitudinal narrative elements not captured in the tested extraction and prediction benchmarks. We will add an explicit limitations paragraph discussing this possibility and the scope of the validation performed. The combination of physician review and task performance preservation remains our primary evidence, but greater transparency is warranted. revision: yes

  2. Referee: [§3.2] §3.2 (Methods): Exact deduplication thresholds, frequency cutoffs, and precise cohort inclusion/exclusion criteria are unspecified, undermining reproducibility of the 47.3% reduction figure and its generalizability across the four cohorts.

    Authors: We agree that these parameters must be specified for reproducibility. The revised manuscript will include a new subsection in Methods that reports the exact deduplication thresholds and frequency cutoffs applied in both the metadata-driven and fallback frequency-based components, as well as the full inclusion and exclusion criteria for each of the four cohorts. These details were used in the original experiments but were omitted for brevity; their addition will allow direct replication of the reported 47.3% reduction and assessment of generalizability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline evaluated on independent tasks

full rationale

The paper defines TRACE as an explicit preprocessing pipeline (metadata-based templated content removal plus frequency deduplication) and reports measured outcomes on external cohorts (5.3M notes). The 47.3% reduction figure and preserved downstream performance are direct empirical results, not quantities derived from the method definition or from fitted parameters renamed as predictions. No equations, no self-citation chains for uniqueness theorems, and no ansatz smuggled via prior work. Evaluation uses blinded review and separate information-extraction/outcome-prediction tasks, which are independent of the bloat-removal logic itself. This is the normal non-circular case for an applied methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced beyond standard preprocessing assumptions already present in clinical NLP literature.

pith-pipeline@v0.9.0 · 5516 in / 964 out tokens · 25897 ms · 2026-05-15T06:36:45.690966+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 3 internal anchors

  1. [1]

    Clinicians can ‘chat’ with medical records through new AI software, ChatEHR

    Armitage, H. Clinicians can ‘chat’ with medical records through new AI software, ChatEHR. Stanford Medicine News Center (2025). Accessed 2026-02-09

  2. [2]

    New AI tool helps doctors to sift and synthesize patient data

    Otto, F. New AI tool helps doctors to sift and synthesize patient data. Penn Medicine News (2026). Accessed 2026-02-09

  3. [3]

    Scout: A flexible EHR search and synthesis tool

    Duke Institute for Health Innovation. Scout: A flexible EHR search and synthesis tool. DIHI (2024). Accessed 2026-02-09

  4. [4]

    R., Tile, J

    Pandey, S. R., Tile, J. D. & Oghaz, M. M. D. Predicting 30-day hospital readmissions us- ing ClinicalT5 with structured and unstructured electronic health records.PLOS ONE20, e0328848 (2025)

  5. [5]

    P.et al.Development and prospective implementation of a large language model based system for early sepsis prediction.npj Digital Medicine8, 290 (2025)

    Shashikumar, S. P.et al.Development and prospective implementation of a large language model based system for early sepsis prediction.npj Digital Medicine8, 290 (2025)

  6. [6]

    Hirschtick, R. E. Copy-and-paste.JAMA295, 2335–2336 (2006)

  7. [7]

    Weis, J. M. & Levy, P. C. Copy, Paste, and Cloned Notes in Electronic Health Records.Chest 145, 632–638 (2014)

  8. [8]

    Rule, A., Bedrick, S., Chiang, M. F. & Hribar, M. R. Length and Redundancy of Outpatient Progress Notes Across a Decade at an Academic Medical Center.JAMA Network Open4, e2115334 (2021)

  9. [9]

    D., Khanna, R

    Wang, M. D., Khanna, R. & Najafi, N. Characterizing the Source of Text in Electronic Health Record Progress Notes.JAMA Internal Medicine177, 1212–1213 (2017)

  10. [10]

    O., Stein, D

    Wrenn, J. O., Stein, D. M., Bakken, S. & Stetson, P. D. Quantifying clinical narrative redun- dancy in an electronic health record.Journal of the American Medical Informatics Association 17, 49–53 (2010). 19

  11. [11]

    A., Kuo, T.-T., McAuley, J

    Gabriel, R. A., Kuo, T.-T., McAuley, J. & Hsu, C.-N. Identifying and characterizing highly similar notes in big clinical note datasets.Journal of Biomedical Informatics82, 63–69 (2018)

  12. [12]

    Note Bloat

    Liu, J., Capurro, D., Nguyen, A. & Verspoor, K. “Note Bloat” impacts deep learning-based NLP models for clinical prediction tasks.Journal of Biomedical Informatics133, 104149 (2022)

  13. [13]

    Epic Clarity Data Dictionary

    Epic Systems Corporation. Epic Clarity Data Dictionary. https://userweb.epic.com/ (2026). Web version. Accessed February 23, 2026. Restricted access via Epic UserWeb

  14. [14]

    Electronic health record

    STAnford medicine Research data Repository. Electronic health record. https://starr.stanford .edu/data-types/electronic-health-record (2026). Accessed: 2026-02-22

  15. [15]

    ArXiv:2509.08604 [cs]

    Li, A.et al.Memorization in Large Language Models in Medicine: Prevalence, Characteris- tics, and Implications (2025). ArXiv:2509.08604 [cs]

  16. [16]

    Johnson, A. E. W.et al.MIMIC-III, a freely accessible critical care database.Scientific Data 3, 160035 (2016)

  17. [17]

    & Mark, R

    Johnson, A., Pollard, T. & Mark, R. MIMIC-III Clinical Database.PhysioNet(2016). Version 1.4

  18. [18]

    L.et al.PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.Circulation101, E215–220 (2000)

    Goldberger, A. L.et al.PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.Circulation101, E215–220 (2000)

  19. [19]

    Adeniji, N.et al.Impact of Bridging Locoregional Therapies for Hepatocellular Carcinoma on Post-transplant Clinical Outcome.Clinical transplantation34, e14128 (2020)

  20. [20]

    B.et al.Development and validation of an automated, real-time predictive model for postpartum hemorrhage.Obstetrics & Gynecology144, 109–117 (2024)

    Ende, H. B.et al.Development and validation of an automated, real-time predictive model for postpartum hemorrhage.Obstetrics & Gynecology144, 109–117 (2024). Epub 2024 May 10. PMID: 38723260

  21. [21]

    Dhaliwal, J. S. & Dang, A. K. Reducing hospital readmissions. InStatPearls(StatPearls Publishing, Treasure Island (FL), 2025). First published June 7, 2024. Study Guide. PMID: 39163436. Book accession: NBK606114

  22. [22]

    ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

    Huang, K., Altosaar, J. & Ranganath, R. Clinicalbert: Modeling clinical notes and predicting hospital readmission.arXiv:1904.05342(2019)

  23. [23]

    P., Razmi, F

    Amrollahi, F., Shashikumar, S. P., Razmi, F. & Nemati, S. Contextual embeddings from clinical notes improves prediction of sepsis.Scientific Reports11, 8826 (2021)

  24. [24]

    Zhang, Y .et al.Qwen3 embedding: Advancing text embedding and reranking through foun- dation models.arXiv preprint arXiv:2506.05176(2025)

  25. [25]

    Api pricing (2026)

    OpenAI. Api pricing (2026)

  26. [26]

    Stanford hospital and clinics, california

    Fitch Ratings. Stanford hospital and clinics, california

  27. [27]

    Zha, D.et al.Data-centric artificial intelligence: A survey.arXiv preprint arXiv:2303.10158 (2023)

  28. [28]

    Zhang, Y .et al.Data-centric foundation models in computational healthcare: A survey.arXiv preprint arXiv:2401.02458(2024). 20

  29. [29]

    In Muresan, S., Nakov, P

    Lee, K.et al.Deduplicating training data makes language models better. In Muresan, S., Nakov, P. & Villavicencio, A. (eds.)Proceedings of the 60th Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers), 8424–8445 (Association for Computational Linguistics, Dublin, Ireland, 2022)

  30. [30]

    Geirhos, R.et al.Shortcut learning in deep neural networks.Nature Machine Intelligence2, 665–673 (2020)

  31. [31]

    W., Katabi, D

    Yang, Y ., Zhang, H., Gichoya, J. W., Katabi, D. & Ghassemi, M. The limits of fair medical imaging ai in real-world generalization.Nature Medicine30, 2838–2848 (2024)

  32. [32]

    Ong Ly, C., Unnikrishnan, B., Tadic, T.et al.Shortcut learning in medical ai hinders gen- eralization: method for estimating ai model generalization without external data.npj Digital Medicine7, 124 (2024)

  33. [33]

    G., Koback, F

    Hill, B. G., Koback, F. L. & Schilling, P. L. The risk of shortcutting in deep learning algorithms for medical imaging research.Scientific Reports14, 29224 (2024)

  34. [34]

    admission,

    Sakib, F. A., Zhu, Z., Grace, K. T., Yetisgen, M. & Uzuner, O. Spurious correlations and beyond: Understanding and mitigating shortcut learning in sdoh extraction with large language models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 1097–1106 (Association for Computational Linguisti...

  35. [35]

    templated text, or 2) structured data (labs, vitals, demographics, meds, or other info in the patient’s chart)

  36. [36]

    Identify sections of the full clinical note that are likely from the template(s) (identifying matching text) and identify them with the following start/stop characters [[T]] text here [[/T]]

  37. [37]

    Highlight these with [[S]] for start and [[/S]] for stop at the end of the text that comes from the smartlink

    Identify sections of the note that are likely from structured data (often denoted with the @ symbol as part of smartlinks: for example, @MEDLIST@ will populate a full list of patient meds). Highlight these with [[S]] for start and [[/S]] for stop at the end of the text that comes from the smartlink

  38. [38]

    The only different characters between the original and the returned version should be the template annotations

    Only return the VERBATIM text of the note; where it differs with the template text only return the note text itself. The only different characters between the original and the returned version should be the template annotations. Remember:

  39. [39]

    Close every span that’s opened and avoid overlapping spans

  40. [40]

    Do your best to determine which text came from the template itself by identifying identical spans of text, even if they are interspersed with new text

    The user may have changed or edited the template. Do your best to determine which text came from the template itself by identifying identical spans of text, even if they are interspersed with new text

  41. [41]

    If a structured span is a smartlink (@) in the template, the corresponding text should be marked as structured [[S]] in the highlights. Sometimes it is difficult to tell what text came from a smartlink; you can do your best based on the location of the @TEXT@ in the template and the surrounding text in the template/note

  42. [42]

    Do not change any of the final note characters - return them verbatim with the highlighting characters

  43. [43]

    Labs drawn and reviewed

    Match text from ANY of the provided templates “Labs drawn and reviewed.” does NOT match “Labs drawn 1/15 reviewed.” “Patient denies” does NOT match “Pt denies.” When unsure, DON’T highlight it. The templated text you are given may not actually be present in the note. If this is the case, just return an empty string. Supplementary Figure 1:Prompt 4 Informa...

  44. [44]

    Base your determinations ONLY on information explicitly stated or strongly implied in the notes

  45. [45]

    Do NOT infer diagnoses from weak, ambiguous, or unrelated evidence

  46. [46]

    Do not add conditions not provided

  47. [47]

    These are the variables you will extract

    Do not include explanations, comments, or text outside the requested JSON. These are the variables you will extract. Return a choice for each variable:{V ARIABLES} Output must be valid JSON. This is the only valid schema:{SCHEMA} Supplementary Figure 2:Prompt used for information extraciton for the first batch 5 Information Extraction Update System Prompt...

  48. [48]

    Start from the existing JSON assessment as the baseline

  49. [49]

    Review the new notes, recorded at a later date, carefully and comprehensively

  50. [50]

    More recent notes take precedence over older information

  51. [51]

    Change a condition’s value ONLY if the new notes provide clear, sufficient evidence to do so

  52. [52]

    If the new notes do not affect a condition, leave its value unchanged

  53. [53]

    Do not add or remove conditions

  54. [54]

    These are the variables you will extract

    Do not include explanations, comments, or text outside the requested JSON. These are the variables you will extract. Return a choice for each variable:{V ARIABLES} Output must be valid JSON. This is the only valid schema:{SCHEMA} Supplementary Figure 3:Prompt for used to update predictions after the first batch 6 Information Extraction Update System Promp...

  55. [55]

    Do NOT infer outcome from weak, ambiguous, or unrelated evidence

  56. [56]

    Do not add outcomes not provided

  57. [57]

    These are the outcomes you will predict

    Do not include explanations, comments, or text outside the requested JSON. These are the outcomes you will predict. Return a choice for each outcome:{OUTCOMES} Output must be valid JSON. This is the only valid schema:{SCHEMA} Supplementary Figure 4:Prompt for used to update predictions after the first batch Supplementary Figure 5:TRACE 95% confidence inte...