arxiv: 2604.16364 · v1 · submitted 2026-03-21 · 💻 cs.CY · cs.AI· cs.CL

Recognition: 1 theorem link

· Lean Theorem

Clinical Note Bloat Reduction for Efficient LLM Use

Jordan L. Cahoon , Chloe Stanwyck , Asad Aali , Rachel Madding , Emma Sun , Yixing Jiang , Renumathy Dhanasekaran , Emily Alsentzer

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:36 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.CL

keywords clinical notesnote bloatLLM efficiencyEHR metadatadeduplicationpreprocessing pipelineinformation extractionclinical prediction

0 comments

The pith

TRACE removes 47.3 percent of clinical note text while preserving LLM performance on extraction and prediction tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TRACE as a preprocessing pipeline that cuts duplicated and templated content from clinical notes by using electronic health record metadata to flag copied sections and frequency analysis as a fallback. This approach matters because modern notes are filled with repetitive text that inflates the token count and computational expense for large language models deployed in clinical decision support. The authors tested it on 5.3 million notes from liver transplantation, obstetrics, and inpatient cohorts. Blinded physician review and downstream modeling showed that information extraction and outcome prediction stayed intact after the reduction. The work frames this as a practical way to lower the cost of running LLMs at scale in health systems.

Core claim

TRACE is a scalable preprocessing pipeline that removes note bloat by leveraging EHR attribution metadata to identify templated and copied content and applying frequency-based deduplication when metadata are unavailable. Across four real-world clinical cohorts with 5.3 million notes, TRACE removed 47.3 percent of chart text while preserving performance for information extraction and clinical outcome prediction. At a large academic medical center this reduction corresponds to an estimated 9.5 million dollar annual decrease in LLM inference costs assuming one query per encounter.

What carries the argument

TRACE preprocessing pipeline, which identifies templated and copied content in clinical notes using EHR attribution metadata for targeted removal and frequency-based deduplication as backup.

If this is right

Reduces the volume of text fed to LLMs by nearly half, directly lowering inference compute and cost.
Keeps accuracy stable for clinical information extraction from notes.
Maintains predictive power for downstream clinical outcome models.
Delivers estimated annual savings of 9.5 million dollars in LLM costs at large medical centers.
Applies across diverse clinical areas including transplantation, obstetrics, and general inpatient care.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hospitals could run more frequent or complex LLM queries on patient records without proportional cost growth.
Similar metadata-driven deduplication might reduce repetitive text in other structured document domains such as legal or administrative records.
Lowering note volume could allow LLMs to handle longer patient histories within fixed context limits.
Wider adoption might encourage EHR vendors to improve metadata capture so that bloat reduction becomes automatic.

Load-bearing premise

Blinded physician review plus unchanged performance on information extraction and outcome prediction tasks is enough to confirm that no clinically important information is lost when bloat is removed.

What would settle it

A follow-up study in which physicians identify specific removed details that change their clinical decisions or miss key patient history elements in a meaningful fraction of cases.

Figures

Figures reproduced from arXiv: 2604.16364 by Asad Aali, Chloe Stanwyck, Emily Alsentzer, Emma Sun, Jordan L. Cahoon, Rachel Madding, Renumathy Dhanasekaran, Yixing Jiang.

**Figure 1.** Figure 1: TRACE Method. (A) TRACE identifies templated and copied text using Epic Clarity attribution metadata (Reference Module) and flags additional redundant text blocks using frequency-based de-duplication (Frequency Module). (B) Example TRACE-annotated note showing templated and copied spans with sources identified by the Reference and Frequency Modules. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_1.png] view at source ↗

**Figure 2.** Figure 2: TRACE downstream tasks. (A) TRACE utility is evaluated through two downstream tasks: Information Extraction (IE) and Clinical Outcome Prediction. In IE, a large language model performs inference over batched notes and updates the prediction as new notes are added. In clinical outcome prediction, the most recent 32K tokens are used as input to predict a clinical variable. Notes can either be used for zero-s… view at source ↗

**Figure 3.** Figure 3: TRACE facilitates the analysis of templated and copied text in 1,000 randomly sampled patients [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: TRACE-Processed notes achieve comparable performance in information extraction (IE) to original notes with less [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: TRACE enables efficient clinical outcome prediction. [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

read the original abstract

Health systems are rapidly deploying large language models (LLMs) that use clinical notes for clinical decision support applications. However, modern documentation practices rely heavily on templates, copy--paste shortcuts, and auto-populated fields, producing extensive duplicated text (``note bloat'') that dilutes clinically meaningful signal and substantially increases the computational cost of LLM use. We introduce TRACE, a scalable preprocessing pipeline that removes note bloat by leveraging EHR attribution metadata to identify templated and copied content and applying frequency-based deduplication when metadata are unavailable. We evaluated TRACE across four real--world clinical cohorts spanning liver transplantation, obstetrics, and inpatient care (5.3 million notes) using blinded physician review and downstream modeling tasks. TRACE removed 47.3% of chart text while preserving performance for information extraction and clinical outcome prediction. At a large academic medical center, this reduction corresponds to an estimated $9.5 million annual decrease in LLM inference costs assuming one query per encounter. These findings show how underutilized EHR metadata can enable more scalable and cost-efficient deployment of LLM-based clinical systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TRACE gives a workable metadata-driven way to cut clinical note bloat by nearly half while keeping extraction and prediction tasks stable on real data.

read the letter

The main thing to know is that TRACE strips templated and copied text from notes using EHR attribution metadata, with frequency deduplication as fallback, and delivers a 47% text reduction across 5.3 million notes from four cohorts without hurting information extraction or outcome prediction performance. The blinded physician review and the $9.5 million annual savings estimate at one center make the practical payoff clear. The pipeline itself is the new piece: prior work on note summarization or general deduplication does not combine attribution metadata with this exact fallback in the same way, and the scale of the real-world test is larger than most earlier reports. The evaluation design is straightforward and uses independent downstream tasks, which avoids circularity. The cost framing is also useful because it ties the reduction directly to inference expenses rather than leaving it abstract. The soft spot is the one the stress-test note flags. Blinded review plus task performance can miss rare narrative details or information that matters for untested clinical uses such as edge-case longitudinal reasoning. Without the exact review sample size and criteria spelled out, it is hard to judge how sensitive those checks really are. If the full methods section shows a reasonable sample and clear criteria, that concern shrinks; otherwise the safety claim rests on indirect evidence. This paper is for groups already running LLMs on EHR data who need a concrete preprocessing step to control costs. A reader building production pipelines will find the numbers and the pipeline description worth checking. It deserves a serious referee to verify the evaluation details and the metadata handling.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TRACE, a scalable preprocessing pipeline that removes clinical note bloat by leveraging EHR attribution metadata to detect templated and copied content, with frequency-based deduplication as fallback. Evaluated on 5.3 million notes from four real-world cohorts (liver transplantation, obstetrics, inpatient care), it reports a 47.3% reduction in chart text while preserving performance on information extraction and clinical outcome prediction tasks, projecting $9.5M annual LLM inference cost savings at a large academic center.

Significance. If the claim of no clinically important information loss holds under broader scrutiny, the work has substantial practical significance for cost-efficient LLM deployment in healthcare. The large-scale, multi-cohort real-world evaluation with downstream task validation is a notable strength supporting scalability.

major comments (2)

[§5.2] §5.2 (Evaluation): The central claim that TRACE removes only non-essential bloat while retaining all clinically important content rests on blinded physician review and unchanged performance on information extraction/outcome prediction; however, these checks may miss rare, narrative, or untested information losses (e.g., longitudinal reasoning), and no sample size, selection criteria, or inter-rater metrics are reported for the review.
[§3.2] §3.2 (Methods): Exact deduplication thresholds, frequency cutoffs, and precise cohort inclusion/exclusion criteria are unspecified, undermining reproducibility of the 47.3% reduction figure and its generalizability across the four cohorts.

minor comments (2)

[Abstract] Abstract: The phrase 'note bloat' is used without a concise definition on first appearance, which could improve accessibility for readers outside clinical informatics.
[Table 2] Table 2: Ensure all performance metrics include confidence intervals or statistical significance tests to strengthen the 'preserving performance' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us identify areas for clarification and improvement. We address each major comment point by point below. Where the manuscript requires additional detail for transparency and reproducibility, we have revised accordingly.

read point-by-point responses

Referee: [§5.2] §5.2 (Evaluation): The central claim that TRACE removes only non-essential bloat while retaining all clinically important content rests on blinded physician review and unchanged performance on information extraction/outcome prediction; however, these checks may miss rare, narrative, or untested information losses (e.g., longitudinal reasoning), and no sample size, selection criteria, or inter-rater metrics are reported for the review.

Authors: We agree that the original manuscript did not provide sufficient detail on the blinded physician review process. In the revised version, we will add the sample size, selection criteria (stratified random sampling across the four cohorts), and inter-rater metrics to the Evaluation section. We also acknowledge that downstream tasks cannot fully rule out loss of rare or longitudinal narrative elements not captured in the tested extraction and prediction benchmarks. We will add an explicit limitations paragraph discussing this possibility and the scope of the validation performed. The combination of physician review and task performance preservation remains our primary evidence, but greater transparency is warranted. revision: yes
Referee: [§3.2] §3.2 (Methods): Exact deduplication thresholds, frequency cutoffs, and precise cohort inclusion/exclusion criteria are unspecified, undermining reproducibility of the 47.3% reduction figure and its generalizability across the four cohorts.

Authors: We agree that these parameters must be specified for reproducibility. The revised manuscript will include a new subsection in Methods that reports the exact deduplication thresholds and frequency cutoffs applied in both the metadata-driven and fallback frequency-based components, as well as the full inclusion and exclusion criteria for each of the four cohorts. These details were used in the original experiments but were omitted for brevity; their addition will allow direct replication of the reported 47.3% reduction and assessment of generalizability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline evaluated on independent tasks

full rationale

The paper defines TRACE as an explicit preprocessing pipeline (metadata-based templated content removal plus frequency deduplication) and reports measured outcomes on external cohorts (5.3M notes). The 47.3% reduction figure and preserved downstream performance are direct empirical results, not quantities derived from the method definition or from fitted parameters renamed as predictions. No equations, no self-citation chains for uniqueness theorems, and no ansatz smuggled via prior work. Evaluation uses blinded review and separate information-extraction/outcome-prediction tasks, which are independent of the bloat-removal logic itself. This is the normal non-circular case for an applied methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced beyond standard preprocessing assumptions already present in clinical NLP literature.

pith-pipeline@v0.9.0 · 5516 in / 964 out tokens · 25897 ms · 2026-05-15T06:36:45.690966+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TRACE reduces note length by 47.3% via Reference Module (Ratcliff-Obershelp alignment on Epic Clarity attributions) and Frequency Module (chunk deduplication >5 patients or >1 occurrence per patient)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 3 internal anchors

[1]

Clinicians can ‘chat’ with medical records through new AI software, ChatEHR

Armitage, H. Clinicians can ‘chat’ with medical records through new AI software, ChatEHR. Stanford Medicine News Center (2025). Accessed 2026-02-09

work page 2025
[2]

New AI tool helps doctors to sift and synthesize patient data

Otto, F. New AI tool helps doctors to sift and synthesize patient data. Penn Medicine News (2026). Accessed 2026-02-09

work page 2026
[3]

Scout: A flexible EHR search and synthesis tool

Duke Institute for Health Innovation. Scout: A flexible EHR search and synthesis tool. DIHI (2024). Accessed 2026-02-09

work page 2024
[4]

R., Tile, J

Pandey, S. R., Tile, J. D. & Oghaz, M. M. D. Predicting 30-day hospital readmissions us- ing ClinicalT5 with structured and unstructured electronic health records.PLOS ONE20, e0328848 (2025)

work page 2025
[5]

P.et al.Development and prospective implementation of a large language model based system for early sepsis prediction.npj Digital Medicine8, 290 (2025)

Shashikumar, S. P.et al.Development and prospective implementation of a large language model based system for early sepsis prediction.npj Digital Medicine8, 290 (2025)

work page 2025
[6]

Hirschtick, R. E. Copy-and-paste.JAMA295, 2335–2336 (2006)

work page 2006
[7]

Weis, J. M. & Levy, P. C. Copy, Paste, and Cloned Notes in Electronic Health Records.Chest 145, 632–638 (2014)

work page 2014
[8]

Rule, A., Bedrick, S., Chiang, M. F. & Hribar, M. R. Length and Redundancy of Outpatient Progress Notes Across a Decade at an Academic Medical Center.JAMA Network Open4, e2115334 (2021)

work page 2021
[9]

D., Khanna, R

Wang, M. D., Khanna, R. & Najafi, N. Characterizing the Source of Text in Electronic Health Record Progress Notes.JAMA Internal Medicine177, 1212–1213 (2017)

work page 2017
[10]

O., Stein, D

Wrenn, J. O., Stein, D. M., Bakken, S. & Stetson, P. D. Quantifying clinical narrative redun- dancy in an electronic health record.Journal of the American Medical Informatics Association 17, 49–53 (2010). 19

work page 2010
[11]

A., Kuo, T.-T., McAuley, J

Gabriel, R. A., Kuo, T.-T., McAuley, J. & Hsu, C.-N. Identifying and characterizing highly similar notes in big clinical note datasets.Journal of Biomedical Informatics82, 63–69 (2018)

work page 2018
[12]

Note Bloat

Liu, J., Capurro, D., Nguyen, A. & Verspoor, K. “Note Bloat” impacts deep learning-based NLP models for clinical prediction tasks.Journal of Biomedical Informatics133, 104149 (2022)

work page 2022
[13]

Epic Clarity Data Dictionary

Epic Systems Corporation. Epic Clarity Data Dictionary. https://userweb.epic.com/ (2026). Web version. Accessed February 23, 2026. Restricted access via Epic UserWeb

work page 2026
[14]

Electronic health record

STAnford medicine Research data Repository. Electronic health record. https://starr.stanford .edu/data-types/electronic-health-record (2026). Accessed: 2026-02-22

work page 2026
[15]

ArXiv:2509.08604 [cs]

Li, A.et al.Memorization in Large Language Models in Medicine: Prevalence, Characteris- tics, and Implications (2025). ArXiv:2509.08604 [cs]

work page arXiv 2025
[16]

Johnson, A. E. W.et al.MIMIC-III, a freely accessible critical care database.Scientific Data 3, 160035 (2016)

work page 2016
[17]

& Mark, R

Johnson, A., Pollard, T. & Mark, R. MIMIC-III Clinical Database.PhysioNet(2016). Version 1.4

work page 2016
[18]

L.et al.PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.Circulation101, E215–220 (2000)

Goldberger, A. L.et al.PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.Circulation101, E215–220 (2000)

work page 2000
[19]

Adeniji, N.et al.Impact of Bridging Locoregional Therapies for Hepatocellular Carcinoma on Post-transplant Clinical Outcome.Clinical transplantation34, e14128 (2020)

work page 2020
[20]

B.et al.Development and validation of an automated, real-time predictive model for postpartum hemorrhage.Obstetrics & Gynecology144, 109–117 (2024)

Ende, H. B.et al.Development and validation of an automated, real-time predictive model for postpartum hemorrhage.Obstetrics & Gynecology144, 109–117 (2024). Epub 2024 May 10. PMID: 38723260

work page 2024
[21]

Dhaliwal, J. S. & Dang, A. K. Reducing hospital readmissions. InStatPearls(StatPearls Publishing, Treasure Island (FL), 2025). First published June 7, 2024. Study Guide. PMID: 39163436. Book accession: NBK606114

work page 2025
[22]

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

Huang, K., Altosaar, J. & Ranganath, R. Clinicalbert: Modeling clinical notes and predicting hospital readmission.arXiv:1904.05342(2019)

work page internal anchor Pith review arXiv 1904
[23]

P., Razmi, F

Amrollahi, F., Shashikumar, S. P., Razmi, F. & Nemati, S. Contextual embeddings from clinical notes improves prediction of sepsis.Scientific Reports11, 8826 (2021)

work page 2021
[24]

Zhang, Y .et al.Qwen3 embedding: Advancing text embedding and reranking through foun- dation models.arXiv preprint arXiv:2506.05176(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Api pricing (2026)

OpenAI. Api pricing (2026)

work page 2026
[26]

Stanford hospital and clinics, california

Fitch Ratings. Stanford hospital and clinics, california

work page
[27]

Zha, D.et al.Data-centric artificial intelligence: A survey.arXiv preprint arXiv:2303.10158 (2023)

work page arXiv 2023
[28]

Zhang, Y .et al.Data-centric foundation models in computational healthcare: A survey.arXiv preprint arXiv:2401.02458(2024). 20

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

In Muresan, S., Nakov, P

Lee, K.et al.Deduplicating training data makes language models better. In Muresan, S., Nakov, P. & Villavicencio, A. (eds.)Proceedings of the 60th Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers), 8424–8445 (Association for Computational Linguistics, Dublin, Ireland, 2022)

work page 2022
[30]

Geirhos, R.et al.Shortcut learning in deep neural networks.Nature Machine Intelligence2, 665–673 (2020)

work page 2020
[31]

W., Katabi, D

Yang, Y ., Zhang, H., Gichoya, J. W., Katabi, D. & Ghassemi, M. The limits of fair medical imaging ai in real-world generalization.Nature Medicine30, 2838–2848 (2024)

work page 2024
[32]

Ong Ly, C., Unnikrishnan, B., Tadic, T.et al.Shortcut learning in medical ai hinders gen- eralization: method for estimating ai model generalization without external data.npj Digital Medicine7, 124 (2024)

work page 2024
[33]

G., Koback, F

Hill, B. G., Koback, F. L. & Schilling, P. L. The risk of shortcutting in deep learning algorithms for medical imaging research.Scientific Reports14, 29224 (2024)

work page 2024
[34]

admission,

Sakib, F. A., Zhu, Z., Grace, K. T., Yetisgen, M. & Uzuner, O. Spurious correlations and beyond: Understanding and mitigating shortcut learning in sdoh extraction with large language models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 1097–1106 (Association for Computational Linguisti...

work page 2025
[35]

templated text, or 2) structured data (labs, vitals, demographics, meds, or other info in the patient’s chart)

work page
[36]

Identify sections of the full clinical note that are likely from the template(s) (identifying matching text) and identify them with the following start/stop characters [[T]] text here [[/T]]

work page
[37]

Highlight these with [[S]] for start and [[/S]] for stop at the end of the text that comes from the smartlink

Identify sections of the note that are likely from structured data (often denoted with the @ symbol as part of smartlinks: for example, @MEDLIST@ will populate a full list of patient meds). Highlight these with [[S]] for start and [[/S]] for stop at the end of the text that comes from the smartlink

work page
[38]

The only different characters between the original and the returned version should be the template annotations

Only return the VERBATIM text of the note; where it differs with the template text only return the note text itself. The only different characters between the original and the returned version should be the template annotations. Remember:

work page
[39]

Close every span that’s opened and avoid overlapping spans

work page
[40]

Do your best to determine which text came from the template itself by identifying identical spans of text, even if they are interspersed with new text

The user may have changed or edited the template. Do your best to determine which text came from the template itself by identifying identical spans of text, even if they are interspersed with new text

work page
[41]

If a structured span is a smartlink (@) in the template, the corresponding text should be marked as structured [[S]] in the highlights. Sometimes it is difficult to tell what text came from a smartlink; you can do your best based on the location of the @TEXT@ in the template and the surrounding text in the template/note

work page
[42]

Do not change any of the final note characters - return them verbatim with the highlighting characters

work page
[43]

Labs drawn and reviewed

Match text from ANY of the provided templates “Labs drawn and reviewed.” does NOT match “Labs drawn 1/15 reviewed.” “Patient denies” does NOT match “Pt denies.” When unsure, DON’T highlight it. The templated text you are given may not actually be present in the note. If this is the case, just return an empty string. Supplementary Figure 1:Prompt 4 Informa...

work page
[44]

Base your determinations ONLY on information explicitly stated or strongly implied in the notes

work page
[45]

Do NOT infer diagnoses from weak, ambiguous, or unrelated evidence

work page
[46]

Do not add conditions not provided

work page
[47]

These are the variables you will extract

Do not include explanations, comments, or text outside the requested JSON. These are the variables you will extract. Return a choice for each variable:{V ARIABLES} Output must be valid JSON. This is the only valid schema:{SCHEMA} Supplementary Figure 2:Prompt used for information extraciton for the first batch 5 Information Extraction Update System Prompt...

work page
[48]

Start from the existing JSON assessment as the baseline

work page
[49]

Review the new notes, recorded at a later date, carefully and comprehensively

work page
[50]

More recent notes take precedence over older information

work page
[51]

Change a condition’s value ONLY if the new notes provide clear, sufficient evidence to do so

work page
[52]

If the new notes do not affect a condition, leave its value unchanged

work page
[53]

Do not add or remove conditions

work page
[54]

These are the variables you will extract

Do not include explanations, comments, or text outside the requested JSON. These are the variables you will extract. Return a choice for each variable:{V ARIABLES} Output must be valid JSON. This is the only valid schema:{SCHEMA} Supplementary Figure 3:Prompt for used to update predictions after the first batch 6 Information Extraction Update System Promp...

work page
[55]

Do NOT infer outcome from weak, ambiguous, or unrelated evidence

work page
[56]

Do not add outcomes not provided

work page
[57]

These are the outcomes you will predict

Do not include explanations, comments, or text outside the requested JSON. These are the outcomes you will predict. Return a choice for each outcome:{OUTCOMES} Output must be valid JSON. This is the only valid schema:{SCHEMA} Supplementary Figure 4:Prompt for used to update predictions after the first batch Supplementary Figure 5:TRACE 95% confidence inte...

work page 2045