CLIC: Contextual Language-Informed Cardiac Pathology Classification

Andre Guarnier De Mitri; Diego Furtado Silva; Giovani D. Lucafo; Jo\~ao Lucas Luz Lima Sarcinelli; Rafael da Costa Silva

arxiv: 2605.19132 · v1 · pith:3A22NA6Hnew · submitted 2026-05-18 · 💻 cs.LG

CLIC: Contextual Language-Informed Cardiac Pathology Classification

Giovani D. Lucafo , Rafael da Costa Silva , Jo\~ao Lucas Luz Lima Sarcinelli , Andre Guarnier De Mitri , Diego Furtado Silva This is my paper

Pith reviewed 2026-05-20 12:03 UTC · model grok-4.3

classification 💻 cs.LG

keywords ECG classificationmultimodal learningcardiac pathologycontextual informationnatural languagedeep learningtemplate-based textlarge language models

0 comments

The pith

Encoding patient demographics and acquisition details as natural language text improves the accuracy of deep learning models for classifying cardiac pathologies from ECG signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard ECG classifiers process only the raw signal and overlook patient details and acquisition context that clinicians routinely use. The CLIC framework converts these variables into descriptive text, which serves as an anchor to help disambiguate complex patterns in the physiological data. This multimodal approach yields better diagnostic precision, and template-based text descriptions deliver more consistent gains than those generated by large language models. The work shows how integrating such context bridges automated systems with clinical reality.

Core claim

By translating patient-level contextual data into descriptive text, the model receives an informative anchor that helps disambiguate complex physiological patterns observed in ECG signals. Template-based contextual clinical text leads to consistent improvements in classification performance over both signal-only baselines and LLM-generated descriptions.

What carries the argument

The CLIC multimodal framework, which encodes contextual variables such as demographics and acquisition metadata through natural language descriptions to augment raw ECG signal inputs for pathology classification.

Load-bearing premise

The selected contextual variables can be converted into text that provides disambiguating information not already present in the raw ECG signal or standard clinical features.

What would settle it

Running the classification models with and without the contextual text inputs and finding no significant difference or a decrease in performance metrics such as accuracy or F1 score.

Figures

Figures reproduced from arXiv: 2605.19132 by Andre Guarnier De Mitri, Diego Furtado Silva, Giovani D. Lucafo, Jo\~ao Lucas Luz Lima Sarcinelli, Rafael da Costa Silva.

**Figure 1.** Figure 1: Illustration of the Contextual Language-Informed Cardiac pathology classification (CLIC) framework. The framework consists of two input data workflows: a Resnet18 that receives a 12-lead ECG as input, and a ClinicalBERT that receives a contextual clinical text generated via a templatebased strategy called Data-to-text (1) and a Prompt-guided strategy that uses Llama (2). Unlike many multimodal approaches … view at source ↗

**Figure 2.** Figure 2: UMAP visualization of embedding distributions across models. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

The electrocardiogram (ECG) is the gold standard for non-invasive diagnosis of cardiac pathologies and is a fundamental pillar of cardiovascular medicine. Recent progress in deep learning has led to the development of robust automated classifiers that achieve high performance by processing raw physiological signals. However, in clinical practice, diagnosis is rarely based solely on the signal. Cardiologists commonly support their interpretation with the patient's characteristics and the specific data-acquisition context. Despite this, most current algorithms remain restricted to signal-only analysis, failing to integrate technical metadata and demographic variables. This paper proposes Contextual Language-Informed Cardiac pathology classification (CLIC), a multimodal framework that significantly enhances diagnostic precision by encoding these variables through natural language. We demonstrate that translating patient-level contextual data into descriptive text provides an informative anchor that helps the model disambiguate complex physiological patterns. We further investigate the use of Large Language Models to synthesize richer clinical descriptions and observe that, while these generated texts remain competitive, controlled template-based contextual clinical text leads to consistent improvements in downstream classification performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CLIC gets gains on ECG classification by turning demographics and metadata into text, with templates beating LLM descriptions, but the language step itself is not isolated from just adding the extra variables.

read the letter

The main thing to know is that this paper shows adding patient context as natural language text improves pathology detection on ECG signals, and their tests find template-based descriptions more reliable than ones generated by LLMs. It is a direct attempt to close the gap between signal-only models and how cardiologists actually read traces with extra details in mind. The template versus LLM comparison is a useful empirical note that could help others decide how to inject side information without overcomplicating the pipeline. The work is incremental on the multimodal side but applies the idea cleanly to cardiac data where most prior work stays signal-only. It earns credit for grounding the motivation in real clinical practice rather than chasing abstract benchmarks. The soft spot is exactly the one flagged in the stress test. Without a control that feeds the same demographics and acquisition variables as structured numerical or categorical inputs, any lift could come from having more patient information at all rather than from the text encoding step. That makes the claim about text acting as a special disambiguating anchor harder to pin down. The paper is aimed at researchers doing applied multimodal work on physiological signals. A reader who needs practical ways to bring context into ECG classifiers would find the experiments worth looking at, even if the controls are tightened later. I would send it for peer review. The clinical motivation is solid and the template-LLM angle is concrete enough to justify referee time, though revisions on the ablation design would be expected.

Referee Report

1 major / 2 minor

Summary. The paper proposes CLIC, a multimodal framework for ECG-based cardiac pathology classification that encodes patient demographics and acquisition metadata as natural language text (via templates or LLMs) and fuses it with the raw signal to improve diagnostic accuracy. It claims that template-based contextual text provides a consistent informative anchor for disambiguating physiological patterns and outperforms both signal-only baselines and LLM-generated descriptions.

Significance. If the central empirical claims hold after addressing the ablation gap, the work would provide evidence that language-mediated clinical context can enhance ECG models beyond standard multimodal fusion, aligning automated systems more closely with cardiologist practice and offering a practical route to incorporate readily available metadata.

major comments (1)

[Results] Results section (and associated tables/figures): the reported gains from adding contextual text are not accompanied by an ablation that injects the identical demographic and acquisition variables as structured numerical/categorical features (e.g., via an auxiliary MLP branch concatenated to the ECG encoder). Without this control, it remains unclear whether improvements arise specifically from the language-informed encoding or simply from the presence of extra patient information, directly undermining the claim that 'translating patient-level contextual data into descriptive text provides an informative anchor'.

minor comments (2)

[Abstract] Abstract: quantitative performance metrics, dataset sizes, and validation protocol details are absent, making it difficult for readers to gauge the magnitude and reliability of the claimed improvements.
[Methods] Methods: the exact fusion mechanism between the language encoder and the ECG backbone (e.g., cross-attention, concatenation, or late fusion) should be specified with an equation or diagram for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights an important control experiment. We address the major comment below and commit to revisions that directly respond to the concern.

read point-by-point responses

Referee: [Results] Results section (and associated tables/figures): the reported gains from adding contextual text are not accompanied by an ablation that injects the identical demographic and acquisition variables as structured numerical/categorical features (e.g., via an auxiliary MLP branch concatenated to the ECG encoder). Without this control, it remains unclear whether improvements arise specifically from the language-informed encoding or simply from the presence of extra patient information, directly undermining the claim that 'translating patient-level contextual data into descriptive text provides an informative anchor'.

Authors: We agree that the absence of this ablation leaves open the possibility that gains stem from additional patient information rather than its language-mediated encoding. In the revised manuscript we will add the requested control: the same demographic and acquisition variables will be encoded as numerical/categorical features, passed through an auxiliary MLP, and concatenated to the ECG encoder output before the final classifier. We will report the resulting performance alongside the template-based and LLM-based CLIC variants. This addition will allow readers to assess whether the natural-language format supplies a distinct disambiguation benefit beyond the raw variables themselves. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical multimodal framework

full rationale

The paper describes an empirical multimodal deep learning approach for ECG classification that incorporates contextual data via natural language text. No mathematical derivations, equations, or first-principles results are presented that could reduce to inputs by construction. Performance claims rest on experimental comparisons rather than self-definitional fits or self-citation chains. The central mechanism (text encoding of demographics and metadata) is evaluated through ablation-style experiments on downstream classification accuracy, remaining independent of any fitted parameter renamed as prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested premise that language-encoded context supplies independent disambiguating signal; no free parameters, axioms, or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Natural language descriptions of demographics and acquisition context contain information orthogonal to the raw ECG waveform for pathology discrimination.
This premise is required for the multimodal fusion to improve performance and is stated in the abstract's description of the informative anchor.

pith-pipeline@v0.9.0 · 5727 in / 1190 out tokens · 34448 ms · 2026-05-20T12:03:47.790772+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

translating patient-level contextual data into descriptive text provides an informative anchor... template-based text leading to consistent improvements
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CLIC-DtT achieves the best overall performance... CLIC-LLM... does not consistently surpass the structured attribute baseline

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Gardiner and Haraldur T Hallgrimsson and Maxwell A Xu and Shirley You Ren , booktitle=

Winnie Chow and Lauren E. Gardiner and Haraldur T Hallgrimsson and Maxwell A Xu and Shirley You Ren , booktitle=. Towards Time-Series Reasoning with. 2024 , url=

work page 2024
[2]

2025 , eprint=

How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook , author=. 2025 , eprint=

work page 2025
[3]

2025 , isbn =

Jiang, Yushan and Ning, Kanghui and Pan, Zijie and Shen, Xuyang and Ni, Jingchao and Yu, Wenchao and Schneider, Anderson and Chen, Haifeng and Nevmyvaka, Yuriy and Song, Dongjin , title =. 2025 , isbn =. doi:10.1145/3711896.3736567 , booktitle =

work page doi:10.1145/3711896.3736567 2025
[4]

2024 , eprint=

Towards Time Series Reasoning with LLMs , author=. 2024 , eprint=

work page 2024
[5]

2025 , eprint=

Can LLMs Understand Time Series Anomalies? , author=. 2025 , eprint=

work page 2025
[6]

2020 , eprint=

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission , author=. 2020 , eprint=

work page 2020
[7]

2025 , eprint=

Multimodal Alignment and Fusion: A Survey , author=. 2025 , eprint=

work page 2025
[8]

2020 , url =

Wagner, Patrick and Strodthoff, Nils and Bousseljot, Ralf-Dieter and Samek, Wojciech and Schaeffter, Tobias , title =. 2020 , url =. doi:10.13026/x4td-x982 , note =

work page doi:10.13026/x4td-x982 2020
[9]

2015 , eprint=

Deep Residual Learning for Image Recognition , author=. 2015 , eprint=

work page 2015
[10]

2023 , eprint=

LLaMA: Open and Efficient Foundation Language Models , author=. 2023 , eprint=

work page 2023
[11]

2024 , eprint=

Universal Time-Series Representation Learning: A Survey , author=. 2024 , eprint=

work page 2024
[12]

2023 , eprint=

Frozen Language Model Helps ECG Zero-Shot Learning , author=. 2023 , eprint=

work page 2023
[13]

medRxiv , year=

Multimodal Electronic Health Record Foundation Models with Electrocardiogram for Cardiovascular Disease Prediction , author=. medRxiv , year=. doi:10.1101/2025.11.10.25339886 , note=

work page doi:10.1101/2025.11.10.25339886 2025
[14]

and Ashim Prasad and Francisco Lopez-Jimenez and Samir Awasthi and Attia, Zachi I

Lalam, Sravan Kumar and Kunderu, Hari Krishna and Shayan Ghosh and Harish, Kumar A. and Ashim Prasad and Francisco Lopez-Jimenez and Samir Awasthi and Attia, Zachi I. and Asirvatham, Samuel J. and Friedman, Paul Andrew and Rakesh Barve and Melwin Babu. ECG Representation Learning with Multi-Modal EHR Data. Transactions on Machine Learning Research. 2023

work page 2023
[15]

2024 , eprint=

Time Series Representation Learning with Supervised Contrastive Temporal Transformer , author=. 2024 , eprint=

work page 2024
[16]

2021 , eprint=

How to Leverage Multimodal EHR Data for Better Medical Predictions? , author=. 2021 , eprint=

work page 2021
[17]

2023 , eprint=

A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction , author=. 2023 , eprint=

work page 2023
[18]

2023 , eprint=

Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series , author=. 2023 , eprint=

work page 2023
[19]

2020 , eprint=

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , author=. 2020 , eprint=

work page 2020
[20]

A survey On large language models for medical time series , journal =

Xingyue Liu and Feizhong Zhou and Hanguang Xiao and Zhipeng Li and Shuai Liu and Lingling Qian , keywords =. A survey On large language models for medical time series , journal =. 2026 , issn =. doi:https://doi.org/10.1016/j.eswa.2026.131364 , url =

work page doi:10.1016/j.eswa.2026.131364 2026
[21]

JACC: Advances , volume=

Frequency of Electrocardiogram-Defined Cardiac Conduction Disorders in a Multi-Institutional Primary Care Cohort , author=. JACC: Advances , volume=. 2024 , doi=

work page 2024
[22]

Annals of Noninvasive Electrocardiology , volume=

Prevalence of Intraventricular Conduction Disturbances in a Large French Population , author=. Annals of Noninvasive Electrocardiology , volume=. 2016 , doi=

work page 2016

[1] [1]

Gardiner and Haraldur T Hallgrimsson and Maxwell A Xu and Shirley You Ren , booktitle=

Winnie Chow and Lauren E. Gardiner and Haraldur T Hallgrimsson and Maxwell A Xu and Shirley You Ren , booktitle=. Towards Time-Series Reasoning with. 2024 , url=

work page 2024

[2] [2]

2025 , eprint=

How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook , author=. 2025 , eprint=

work page 2025

[3] [3]

2025 , isbn =

Jiang, Yushan and Ning, Kanghui and Pan, Zijie and Shen, Xuyang and Ni, Jingchao and Yu, Wenchao and Schneider, Anderson and Chen, Haifeng and Nevmyvaka, Yuriy and Song, Dongjin , title =. 2025 , isbn =. doi:10.1145/3711896.3736567 , booktitle =

work page doi:10.1145/3711896.3736567 2025

[4] [4]

2024 , eprint=

Towards Time Series Reasoning with LLMs , author=. 2024 , eprint=

work page 2024

[5] [5]

2025 , eprint=

Can LLMs Understand Time Series Anomalies? , author=. 2025 , eprint=

work page 2025

[6] [6]

2020 , eprint=

ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission , author=. 2020 , eprint=

work page 2020

[7] [7]

2025 , eprint=

Multimodal Alignment and Fusion: A Survey , author=. 2025 , eprint=

work page 2025

[8] [8]

2020 , url =

Wagner, Patrick and Strodthoff, Nils and Bousseljot, Ralf-Dieter and Samek, Wojciech and Schaeffter, Tobias , title =. 2020 , url =. doi:10.13026/x4td-x982 , note =

work page doi:10.13026/x4td-x982 2020

[9] [9]

2015 , eprint=

Deep Residual Learning for Image Recognition , author=. 2015 , eprint=

work page 2015

[10] [10]

2023 , eprint=

LLaMA: Open and Efficient Foundation Language Models , author=. 2023 , eprint=

work page 2023

[11] [11]

2024 , eprint=

Universal Time-Series Representation Learning: A Survey , author=. 2024 , eprint=

work page 2024

[12] [12]

2023 , eprint=

Frozen Language Model Helps ECG Zero-Shot Learning , author=. 2023 , eprint=

work page 2023

[13] [13]

medRxiv , year=

Multimodal Electronic Health Record Foundation Models with Electrocardiogram for Cardiovascular Disease Prediction , author=. medRxiv , year=. doi:10.1101/2025.11.10.25339886 , note=

work page doi:10.1101/2025.11.10.25339886 2025

[14] [14]

and Ashim Prasad and Francisco Lopez-Jimenez and Samir Awasthi and Attia, Zachi I

Lalam, Sravan Kumar and Kunderu, Hari Krishna and Shayan Ghosh and Harish, Kumar A. and Ashim Prasad and Francisco Lopez-Jimenez and Samir Awasthi and Attia, Zachi I. and Asirvatham, Samuel J. and Friedman, Paul Andrew and Rakesh Barve and Melwin Babu. ECG Representation Learning with Multi-Modal EHR Data. Transactions on Machine Learning Research. 2023

work page 2023

[15] [15]

2024 , eprint=

Time Series Representation Learning with Supervised Contrastive Temporal Transformer , author=. 2024 , eprint=

work page 2024

[16] [16]

2021 , eprint=

How to Leverage Multimodal EHR Data for Better Medical Predictions? , author=. 2021 , eprint=

work page 2021

[17] [17]

2023 , eprint=

A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction , author=. 2023 , eprint=

work page 2023

[18] [18]

2023 , eprint=

Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series , author=. 2023 , eprint=

work page 2023

[19] [19]

2020 , eprint=

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , author=. 2020 , eprint=

work page 2020

[20] [20]

A survey On large language models for medical time series , journal =

Xingyue Liu and Feizhong Zhou and Hanguang Xiao and Zhipeng Li and Shuai Liu and Lingling Qian , keywords =. A survey On large language models for medical time series , journal =. 2026 , issn =. doi:https://doi.org/10.1016/j.eswa.2026.131364 , url =

work page doi:10.1016/j.eswa.2026.131364 2026

[21] [21]

JACC: Advances , volume=

Frequency of Electrocardiogram-Defined Cardiac Conduction Disorders in a Multi-Institutional Primary Care Cohort , author=. JACC: Advances , volume=. 2024 , doi=

work page 2024

[22] [22]

Annals of Noninvasive Electrocardiology , volume=

Prevalence of Intraventricular Conduction Disturbances in a Large French Population , author=. Annals of Noninvasive Electrocardiology , volume=. 2016 , doi=

work page 2016