Reasoning Before Diagnosis: Physician-Inspired Structured Thinking for ECG Classification

Hau-San Wong; Xiaoyan Yuan; Xiping Hu; Yang Wu

arxiv: 2605.17308 · v1 · pith:IF6IM3HSnew · submitted 2026-05-17 · 💻 cs.AI

Reasoning Before Diagnosis: Physician-Inspired Structured Thinking for ECG Classification

Yang Wu , Xiaoyan Yuan , Hau-San Wong , Xiping Hu This is my paper

Pith reviewed 2026-05-20 13:31 UTC · model grok-4.3

classification 💻 cs.AI

keywords ECG classificationstructured reasoningmultimodal large language modelinterpretable AIdiagnostic stagesSSPOclinical alignment

0 comments

The pith

CardioThink improves ECG classification accuracy by explicitly modeling diagnostic reasoning through four interpretable stages: rhythm, conduction, morphology, and impression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that ECG diagnosis benefits from explicit structured reasoning modeled after physicians rather than direct label prediction from signals. CardioThink generates human-readable intermediate outputs covering cardiac rhythm, conduction properties, waveform morphology, and overall impression before arriving at the final classification. It uses Structured Set Policy Optimization to train this behavior by jointly rewarding format compliance and set accuracy without any manually labeled reasoning traces. This matters because opaque AI decisions hinder clinical adoption, while this method offers both higher accuracy and traceable logic that aligns with medical practice. Tests across ECG benchmarks confirm gains in both performance and the validity of the produced rationales.

Core claim

CardioThink is a physician-inspired multimodal large language model that derives ECG classifications by first producing structured reasoning in four stages—rhythm, conduction, morphology, and impression—optimized through Structured Set Policy Optimization that enforces adherence to the format and accuracy of variable-size diagnostic outputs without requiring annotated reasoning traces.

What carries the argument

CardioThink framework using Structured Set Policy Optimization (SSPO) to generate and optimize through the four-stage clinical reasoning sequence.

If this is right

Models that follow explicit clinical reasoning stages achieve higher diagnostic accuracy than direct prediction methods.
The approach provides interpretable clinical reasoning that aligns with how physicians diagnose ECGs.
SSPO enables effective training of structured outputs without the need for manually annotated intermediate reasoning.
Reasoning quality improves substantially, leading to more clinically valid rationales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This structured decomposition might apply to other medical AI domains requiring sequential diagnostic logic, such as imaging or lab interpretation.
By avoiding the need for annotated reasoning traces, the method could scale more easily to new ECG classification tasks.
Clinicians might review and intervene at specific stages like morphology assessment to correct potential errors.

Load-bearing premise

That the specific four-stage breakdown of rhythm, conduction, morphology, and impression sufficiently represents the reasoning process required for accurate and interpretable ECG classification.

What would settle it

A controlled experiment where a direct-prediction baseline model matches or exceeds CardioThink's accuracy on an ECG benchmark featuring cases that do not fit neatly into the four stages would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 2605.17308 by Hau-San Wong, Xiaoyan Yuan, Xiping Hu, Yang Wu.

**Figure 2.** Figure 2: , the pipeline consists of two stages. First, following previous works [3, 17], we leveraged the PTB-XL, CPSC, and CSN datasets and employed “Expert Role-Playing” prompts to guide ECG-Chat-13B [37] in simulating cardiologist diagnostics. This process yielded a comprehensive collection of ECG analyses. To ensure data reliability at scale, we developed a semi-automated cleaning pipeline informed by the manua… view at source ↗

**Figure 3.** Figure 3: Impact of training data amount on performance under (a) supervised fine-tuning (CS) and [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of reasoning quality between the Cold-Start model and the SSPO-aligned [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Electrocardiogram (ECG) diagnosis in clinical practice relies on structured reasoning over multiple hierarchical aspects, including cardiac rhythm, conduction properties, waveform morphology, and overall diagnostic impression. However, most existing approaches predict labels directly from ECG signals without explicit clinical reasoning, resulting in opaque decisions that lack clinical alignment. To bridge this gap, we propose CardioThink, a physician-inspired multimodal large language model (MLLM) framework that explicitly models the diagnostic reasoning process through human-interpretable intermediate stages (rhythm, conduction, morphology, and impression) to derive final classification results. Furthermore, we introduce Structured Set Policy Optimization (SSPO) to jointly optimize adherence to this structured reasoning format and the accuracy of variable-size diagnostic sets, without requiring manually annotated reasoning traces. Extensive experiments on diverse ECG benchmarks demonstrate the significant superiority of our approach in diagnostic accuracy, while simultaneously providing interpretable clinical reasoning. Notably, reasoning quality evaluations confirm that SSPO substantially enhances the clinical validity of the generated rationales. These findings reveal that moving beyond direct label prediction toward structured reasoning offers a more clinically aligned direction for future ECG modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CardioThink adds four explicit clinical stages and SSPO to ECG diagnosis in an MLLM, but the lack of annotated reasoning traces and missing quantitative details in the write-up leave the performance and validity claims hard to evaluate.

read the letter

The paper introduces a four-stage reasoning chain—rhythm, conduction, morphology, impression—inside a multimodal LLM for ECG classification, plus a new objective called SSPO that tries to enforce both format adherence and accurate variable-size diagnostic sets without any manually labeled reasoning traces. That combination is the actual novelty here; it extends prior structured-prediction work to a medical signal task in a way that has not appeared in the cited literature.

Referee Report

3 major / 2 minor

Summary. The paper introduces CardioThink, a multimodal large language model framework for ECG classification inspired by physician diagnostic reasoning. It structures the process into four stages—rhythm, conduction, morphology, and impression—to generate final classifications. The authors propose Structured Set Policy Optimization (SSPO) to train the model on this structured format and variable-size diagnostic sets without requiring manually annotated reasoning traces. The manuscript claims that this approach achieves significant superiority in diagnostic accuracy on diverse ECG benchmarks while providing interpretable clinical reasoning, with reasoning quality evaluations showing enhanced clinical validity.

Significance. Should the empirical results be substantiated, this work has the potential to advance the field of AI for medical signal processing by demonstrating that explicit modeling of clinical reasoning steps can improve both performance and interpretability in ECG diagnosis. The SSPO method, if effective without annotations, represents a practical advance for training structured outputs in LLMs for healthcare applications.

major comments (3)

[Abstract] The abstract asserts 'extensive experiments' and 'significant superiority in diagnostic accuracy' along with 'reasoning quality evaluations' confirming enhancements, but the available manuscript text provides no quantitative metrics, baseline comparisons, statistical tests, or specific implementation details for SSPO, which leaves the central performance and validity claims without verifiable support.
[Methods] The assumption that the four-stage decomposition (rhythm, conduction, morphology, impression) is sufficient to capture the reasoning needed for accurate ECG classification is not supported by any ablation studies or justification in the text; this decomposition is load-bearing for the claim of clinical alignment.
[Experiments] The central claim requires that SSPO produces clinically aligned reasoning and superior accuracy without annotated traces, but reasoning quality appears measured by internal proxies (format adherence, label consistency, or LLM-as-judge scores) rather than expert comparison, risking that any accuracy gain arises from the underlying MLLM rather than the explicit structure.

minor comments (2)

Clarify the exact architecture of the MLLM backbone and how the stages are integrated into the input/output pipeline.
Provide more details on the ECG benchmarks used, including dataset sizes and class distributions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] The abstract asserts 'extensive experiments' and 'significant superiority in diagnostic accuracy' along with 'reasoning quality evaluations' confirming enhancements, but the available manuscript text provides no quantitative metrics, baseline comparisons, statistical tests, or specific implementation details for SSPO, which leaves the central performance and validity claims without verifiable support.

Authors: We agree that the abstract would benefit from greater specificity. The full manuscript reports quantitative results in Section 4 (Experiments), including accuracy, F1, and AUC metrics across multiple ECG benchmarks, direct comparisons to strong MLLM baselines, and statistical significance testing via paired t-tests with p-values. SSPO implementation details, including the structured policy objective, reward formulation, and training hyperparameters, appear in Section 3.2. To improve immediate verifiability, we will revise the abstract to include the key numerical improvements (e.g., absolute accuracy gains and p-values) while retaining its concise style. revision: yes
Referee: [Methods] The assumption that the four-stage decomposition (rhythm, conduction, morphology, and impression) is sufficient to capture the reasoning needed for accurate ECG classification is not supported by any ablation studies or justification in the text; this decomposition is load-bearing for the claim of clinical alignment.

Authors: The four-stage structure follows standard clinical ECG interpretation protocols as described in major cardiology references (e.g., AHA/ACC guidelines). We selected these stages because they correspond to the sequential diagnostic steps physicians use when reading ECGs. We acknowledge that the current manuscript lacks explicit ablation experiments on alternative decompositions. In the revision we will add an ablation study that compares the full four-stage pipeline against (i) a two-stage variant, (ii) a direct-prediction baseline without intermediate stages, and (iii) an alternative three-stage decomposition, reporting both accuracy and clinical-alignment metrics to empirically support the chosen structure. revision: yes
Referee: [Experiments] The central claim requires that SSPO produces clinically aligned reasoning and superior accuracy without annotated traces, but reasoning quality appears measured by internal proxies (format adherence, label consistency, or LLM-as-judge scores) rather than expert comparison, risking that any accuracy gain arises from the underlying MLLM rather than the explicit structure.

Authors: We recognize that expert review provides the strongest test of clinical validity. The current evaluation uses format adherence, label consistency, and an LLM-as-judge protocol whose prompts were derived from clinical criteria; however, we did not include cardiologist ratings in the submitted version. We will add a human evaluation in which a random subset of generated rationales is independently scored by two board-certified cardiologists for clinical plausibility, stage-wise alignment, and overall diagnostic utility. We will also report accuracy results against identical-base-MLLM baselines that lack both the structured format and SSPO training, thereby isolating the contribution of the explicit reasoning pipeline. revision: yes

Circularity Check

0 steps flagged

Central claim rests on empirical results from new training procedure rather than self-defined quantities or self-citation chains

full rationale

The paper introduces CardioThink and SSPO as a modeling choice to decompose ECG diagnosis into four human-interpretable stages and optimize format adherence plus set accuracy without annotated traces. No equations, fitted parameters, or self-citations are presented that reduce the reported accuracy gains or reasoning validity to quantities defined by the authors' own prior work or by construction from the final labels. Superiority is instead shown via external benchmark experiments, making the derivation self-contained against independent evaluation metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework depends on the domain assumption that ECG diagnosis decomposes cleanly into the four listed stages and introduces SSPO as a new optimization procedure whose effectiveness is demonstrated only through the reported experiments.

axioms (1)

domain assumption ECG diagnosis can be decomposed into the four independent clinical stages of rhythm, conduction, morphology, and impression.
The entire structured-reasoning pipeline is built on this decomposition of clinical practice.

invented entities (1)

Structured Set Policy Optimization (SSPO) no independent evidence
purpose: Jointly optimize adherence to the structured reasoning format and accuracy of variable-size diagnostic sets without manual reasoning annotations.
SSPO is presented as a novel training objective introduced in this work.

pith-pipeline@v0.9.0 · 5722 in / 1329 out tokens · 48126 ms · 2026-05-20T13:31:20.667219+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CardioThink ... explicitly models the diagnostic reasoning process through human-interpretable intermediate stages (rhythm, conduction, morphology, and impression) ... Structured Set Policy Optimization (SSPO) to jointly optimize adherence to this structured reasoning format
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Rstruct(o) = 1/Nrules (Itags + Σ Ivalid(τ,o)) ... Rdiag(o,Y) = 2|Y ∩ Ŷ(oa)| / (|Y| + |Ŷ(oa)|)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 7 internal anchors

[1]

Diagnostic reasoning in car- diovascular medicine.BMJ, 376, 2022

John E Brush, Jonathan Sherbino, and Geoffrey R Norman. Diagnostic reasoning in car- diovascular medicine.BMJ, 376, 2022. doi: 10.1136/bmj-2021-064389. URL https: //www.bmj.com/content/376/bmj-2021-064389

work page doi:10.1136/bmj-2021-064389 2022
[2]

Supreme: A supervised pre-training framework for multimodal ecg representation learning.arXiv preprint arXiv:2502.19668, 2025

Mingsheng Cai, Jiuming Jiang, Wenhao Huang, Che Liu, and Rossella Arcucci. Supreme: A supervised pre-training framework for multimodal ecg representation learning.arXiv preprint arXiv:2502.19668, 2025

work page arXiv 2025
[3]

Qoq-med: Building multimodal clinical foundation models with domain-aware GRPO training

Wei Dai, Peilin Chen, Chanakya Ekbote, and Paul Pu Liang. Qoq-med: Building multimodal clinical foundation models with domain-aware GRPO training. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview. net/forum?id=ZwCVFBFUFb

work page 2026
[4]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025. ISSN 1566-2535. doi: https:// doi.org/10.1016/j.inffus.2025.102963. URL https://www.sciencedirect.com/science/ arti...

work page doi:10.1016/j.inffus.2025.102963 2025
[5]

Gaussian Error Linear Units (GELUs)

D Hendrycks. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=nZeVKeeFYf9

work page 2022
[7]

A multi-resolution mutual learning network for multi-label ecg classification

Wei Huang, Ning Wang, Panpan Feng, Haiyan Wang, Zongmin Wang, and Bing Zhou. A multi-resolution mutual learning network for multi-label ecg classification. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 3303–3306. IEEE, 2024

work page 2024
[8]

Boosting masked ecg-text auto-encoders as discriminative learners

Manh Pham Hung, Aaqib Saeed, and Dong Ma. Boosting masked ecg-text auto-encoders as discriminative learners. InForty-second International Conference on Machine Learning

work page
[9]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Reading your heart: Learning ecg words and sentences via pre-training ecg language model

Jiarui Jin, Haoyu Wang, Hongyan Li, Jun Li, Jiahui Pan, and Shenda Hong. Reading your heart: Learning ecg words and sentences via pre-training ecg language model. InThe Thirteenth International Conference on Learning Representations

work page
[11]

Uniecg: Understanding and generating ecg in one unified model.arXiv preprint arXiv:2509.18588, 2025

Jiarui Jin, Haoyu Wang, Xiang Lan, Jun Li, Gaofeng Cheng, Hongyan Li, and Shenda Hong. Uniecg: Understanding and generating ecg in one unified model.arXiv preprint arXiv:2509.18588, 2025

work page arXiv 2025
[12]

ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

Jiarui Jin, Haoyu Wang, Xingliang Wu, Xiaocheng Fang, Xiang Lan, Zihan Wang, Deyun Zhang, Bo Liu, Yingying Zhang, Xian Wu, et al. Ecg-r1: Protocol-guided and modality-agnostic mllm for reliable ecg interpretation.arXiv preprint arXiv:2602.04279, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

Artificial intelli- gence for direct-to-physician reporting of ambulatory electrocardiography.Nature Medicine, 31 (3):925–931, 2025

LS Johnson, P Zadrozniak, G Jasina, A Grotek-Cuprjak, JG Andrade, E Svennberg, SZ Diederichsen, WF McIntyre, S Stavrakis, J Benezet-Mazuecos, et al. Artificial intelli- gence for direct-to-physician reporting of ambulatory electrocardiography.Nature Medicine, 31 (3):925–931, 2025

work page 2025
[14]

Gem: Empowering mllm for grounded ecg understanding with time series and images.arXiv preprint arXiv:2503.06073, 2025

Xiang Lan, Feng Wu, Kai He, Qinghao Zhao, Shenda Hong, and Mengling Feng. Gem: Empowering mllm for grounded ecg understanding with time series and images.arXiv preprint arXiv:2503.06073, 2025

work page arXiv 2025
[15]

Generative classifiers avoid shortcut solutions.arXiv preprint arXiv:2512.25034, 2025

Alexander C Li, Ananya Kumar, and Deepak Pathak. Generative classifiers avoid shortcut solutions.arXiv preprint arXiv:2512.25034, 2025. 10

work page arXiv 2025
[16]

Zero- shot ecg classification with multimodal learning and test-time clinical knowledge enhancement

Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, and Rossella Arcucci. Zero- shot ecg classification with multimodal learning and test-time clinical knowledge enhancement. InForty-first International Conference on Machine Learning

work page
[17]

Fleming- r1: Toward expert-level medical reasoning via reinforcement learning.arXiv preprint arXiv:2509.15279, 2025

Chi Liu, Derek Li, Yan Shu, Robin Chen, Derek Duan, Teng Fang, and Bryan Dai. Fleming- r1: Toward expert-level medical reasoning via reinforcement learning.arXiv preprint arXiv:2509.15279, 2025

work page arXiv 2025
[18]

Feifei Liu, Chengyu Liu, Lina Zhao, Xiangyu Zhang, Xiaoling Wu, Xiaoyan Xu, Yulin Liu, Caiyun Ma, Shoushui Wei, Zhiqiang He, et al. An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection.Journal of Medical Imaging and Health Informatics, 8(7):1368–1373, 2018

work page 2018
[19]

Improved baselines with visual instruction tuning

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26296–26306, 2024

work page 2024
[20]

Teach multimodal llms to comprehend electrocardiographic images.arXiv preprint arXiv:2410.19008, 2024

Ruoqi Liu, Yuelin Bai, Xiang Yue, and Ping Zhang. Teach multimodal llms to comprehend electrocardiographic images.arXiv preprint arXiv:2410.19008, 2024

work page arXiv 2024
[21]

Tracing the heart’s pathways: Ecg representation learning from a cardiac conduction perspective.arXiv preprint arXiv:2512.24002, 2025

Tan Pan, Yixuan Sun, Chen Jiang, Qiong Gao, Rui Sun, Xingmeng Zhang, Zhenqi Yang, Limei Han, Yixiu Liang, Yuan Cheng, et al. Tracing the heart’s pathways: Ecg representation learning from a cardiac conduction perspective.arXiv preprint arXiv:2512.24002, 2025

work page arXiv 2025
[22]

Q-heart: Ecg question answering via knowledge-informed multimodal llms.arXiv preprint arXiv:2505.06296, 2025

Hung Manh Pham, Jialu Tang, Aaqib Saeed, and Dong Ma. Q-heart: Ecg question answering via knowledge-informed multimodal llms.arXiv preprint arXiv:2505.06296, 2025

work page arXiv 2025
[23]

Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature communications, 11(1):1760, 2020

Antônio H Ribeiro, Manoel Horta Ribeiro, Gabriela MM Paixão, Derick M Oliveira, Paulo R Gomes, Jéssica A Canazart, Milton PS Ferreira, Carl R Andersson, Peter W Macfarlane, Wagner Meira Jr, et al. Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature communications, 11(1):1760, 2020

work page 2020
[24]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[25]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

Konstantinos C Siontis, Peter A Noseworthy, Zachi I Attia, and Paul A Friedman. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

work page 2021
[27]

Ptb-xl, a large publicly available electrocardiography dataset.Scientific data, 7(1):1–15, 2020

Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I Lunze, Wojciech Samek, and Tobias Schaeffter. Ptb-xl, a large publicly available electrocardiography dataset.Scientific data, 7(1):1–15, 2020

work page 2020
[28]

Meit: Multimodal electrocardiogram instruction tuning on large language models for report generation

Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Jing Xiong, Rossella Arcucci, Huaxiu Yao, and Mi Zhang. Meit: Multimodal electrocardiogram instruction tuning on large language models for report generation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 14510–14527, 2025

work page 2025
[29]

From token to rhythm: A multi-scale approach for ecg-language pretraining

Fuying Wang, Jiacheng Xu, and Lequan Yu. From token to rhythm: A multi-scale approach for ecg-language pretraining. InForty-second International Conference on Machine Learning

work page
[30]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

Ecg-lm: Understanding electrocardiogram with a large language model.Health Data Science, 5:0221, 2025

Kai Yang, Massimo Hong, Jiahuan Zhang, Yizhen Luo, Suyuan Zhao, Ou Zhang, Xiaomao Yu, Jiawen Zhou, Liuqing Yang, Ping Zhang, et al. Ecg-lm: Understanding electrocardiogram with a large language model.Health Data Science, 5:0221, 2025. 11

work page 2025
[32]

A multi-view multi-scale neural network for multi-label ecg classification.IEEE Transactions on Emerging Topics in Computational Intelligence, 7(3):648–660, 2023

Shunxiang Yang, Cheng Lian, Zhigang Zeng, Bingrong Xu, Junbin Zang, and Zhidong Zhang. A multi-view multi-scale neural network for multi-label ecg classification.IEEE Transactions on Emerging Topics in Computational Intelligence, 7(3):648–660, 2023

work page 2023
[33]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Enhancing multi-label ecg classification via task-guided lead correlations in internet of medical things.IEEE Internet of Things Journal, 2025

Xiaoyan Yuan, Wei Wang, Junxin Chen, Kai Fang, Ali Kashif Bashir, Tapas Mondal, Xiping Hu, and M Jamal Deen. Enhancing multi-label ecg classification via task-guided lead correlations in internet of medical things.IEEE Internet of Things Journal, 2025

work page 2025
[35]

Reading between the channels: Knowledge-augmented medical time series classification

Xiaoyan Yuan, Wei Wang, Junxin Chen, and Xiping Hu. Reading between the channels: Knowledge-augmented medical time series classification. InProceedings of the 33rd ACM International Conference on Multimedia, pages 8978–8987, 2025

work page 2025
[36]

Ecg2tok: Ecg pre-training with self-distillation semantic tokenizers

Xiaoyan Yuan, Wei Wang, Han Liu, Jian Chen, and Xiping Hu. Ecg2tok: Ecg pre-training with self-distillation semantic tokenizers. In34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025, pages 9990–9998. International Joint Conferences on Artificial Intelligence, 2025

work page 2025
[37]

Ecg-chat: A large ecg- language model for cardiac disease diagnosis

Yubao Zhao, Jiaju Kang, Tian Zhang, Puyu Han, and Tong Chen. Ecg-chat: A large ecg- language model for cardiac disease diagnosis. In2025 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2025

work page 2025
[38]

Optimal multi-stage arrhythmia classification approach.Scientific reports, 10(1):2898, 2020

Jianwei Zheng, Huimin Chu, Daniele Struppa, Jianming Zhang, Sir Magdi Yacoub, Hesham El-Askary, Anthony Chang, Louis Ehwerhemuepha, Islam Abudayyeh, Alexander Barrett, et al. Optimal multi-stage arrhythmia classification approach.Scientific reports, 10(1):2898, 2020

work page 2020
[39]

A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0

Jianwei Zheng, Hangyuan Guo, and Huimin Chu. A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0. 0).PhysioNet 2022Available online httpphysionet orgcontentecg arrhythmia10 0accessed on, 23:7, 2022

work page 2022
[40]

Robustness to spurious correlations via dynamic knowledge transfer

Xiaoling Zhou, Wei Ye, Zhemg Lee, and Shikun Zhang. Robustness to spurious correlations via dynamic knowledge transfer. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, pages 7182–7190, 2025. 12

work page 2025

[1] [1]

Diagnostic reasoning in car- diovascular medicine.BMJ, 376, 2022

John E Brush, Jonathan Sherbino, and Geoffrey R Norman. Diagnostic reasoning in car- diovascular medicine.BMJ, 376, 2022. doi: 10.1136/bmj-2021-064389. URL https: //www.bmj.com/content/376/bmj-2021-064389

work page doi:10.1136/bmj-2021-064389 2022

[2] [2]

Supreme: A supervised pre-training framework for multimodal ecg representation learning.arXiv preprint arXiv:2502.19668, 2025

Mingsheng Cai, Jiuming Jiang, Wenhao Huang, Che Liu, and Rossella Arcucci. Supreme: A supervised pre-training framework for multimodal ecg representation learning.arXiv preprint arXiv:2502.19668, 2025

work page arXiv 2025

[3] [3]

Qoq-med: Building multimodal clinical foundation models with domain-aware GRPO training

Wei Dai, Peilin Chen, Chanakya Ekbote, and Paul Pu Liang. Qoq-med: Building multimodal clinical foundation models with domain-aware GRPO training. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview. net/forum?id=ZwCVFBFUFb

work page 2026

[4] [4]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025. ISSN 1566-2535. doi: https:// doi.org/10.1016/j.inffus.2025.102963. URL https://www.sciencedirect.com/science/ arti...

work page doi:10.1016/j.inffus.2025.102963 2025

[5] [5]

Gaussian Error Linear Units (GELUs)

D Hendrycks. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=nZeVKeeFYf9

work page 2022

[7] [7]

A multi-resolution mutual learning network for multi-label ecg classification

Wei Huang, Ning Wang, Panpan Feng, Haiyan Wang, Zongmin Wang, and Bing Zhou. A multi-resolution mutual learning network for multi-label ecg classification. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 3303–3306. IEEE, 2024

work page 2024

[8] [8]

Boosting masked ecg-text auto-encoders as discriminative learners

Manh Pham Hung, Aaqib Saeed, and Dong Ma. Boosting masked ecg-text auto-encoders as discriminative learners. InForty-second International Conference on Machine Learning

work page

[9] [9]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

Reading your heart: Learning ecg words and sentences via pre-training ecg language model

Jiarui Jin, Haoyu Wang, Hongyan Li, Jun Li, Jiahui Pan, and Shenda Hong. Reading your heart: Learning ecg words and sentences via pre-training ecg language model. InThe Thirteenth International Conference on Learning Representations

work page

[11] [11]

Uniecg: Understanding and generating ecg in one unified model.arXiv preprint arXiv:2509.18588, 2025

Jiarui Jin, Haoyu Wang, Xiang Lan, Jun Li, Gaofeng Cheng, Hongyan Li, and Shenda Hong. Uniecg: Understanding and generating ecg in one unified model.arXiv preprint arXiv:2509.18588, 2025

work page arXiv 2025

[12] [12]

ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

Jiarui Jin, Haoyu Wang, Xingliang Wu, Xiaocheng Fang, Xiang Lan, Zihan Wang, Deyun Zhang, Bo Liu, Yingying Zhang, Xian Wu, et al. Ecg-r1: Protocol-guided and modality-agnostic mllm for reliable ecg interpretation.arXiv preprint arXiv:2602.04279, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[13] [13]

Artificial intelli- gence for direct-to-physician reporting of ambulatory electrocardiography.Nature Medicine, 31 (3):925–931, 2025

LS Johnson, P Zadrozniak, G Jasina, A Grotek-Cuprjak, JG Andrade, E Svennberg, SZ Diederichsen, WF McIntyre, S Stavrakis, J Benezet-Mazuecos, et al. Artificial intelli- gence for direct-to-physician reporting of ambulatory electrocardiography.Nature Medicine, 31 (3):925–931, 2025

work page 2025

[14] [14]

Gem: Empowering mllm for grounded ecg understanding with time series and images.arXiv preprint arXiv:2503.06073, 2025

Xiang Lan, Feng Wu, Kai He, Qinghao Zhao, Shenda Hong, and Mengling Feng. Gem: Empowering mllm for grounded ecg understanding with time series and images.arXiv preprint arXiv:2503.06073, 2025

work page arXiv 2025

[15] [15]

Generative classifiers avoid shortcut solutions.arXiv preprint arXiv:2512.25034, 2025

Alexander C Li, Ananya Kumar, and Deepak Pathak. Generative classifiers avoid shortcut solutions.arXiv preprint arXiv:2512.25034, 2025. 10

work page arXiv 2025

[16] [16]

Zero- shot ecg classification with multimodal learning and test-time clinical knowledge enhancement

Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, and Rossella Arcucci. Zero- shot ecg classification with multimodal learning and test-time clinical knowledge enhancement. InForty-first International Conference on Machine Learning

work page

[17] [17]

Fleming- r1: Toward expert-level medical reasoning via reinforcement learning.arXiv preprint arXiv:2509.15279, 2025

Chi Liu, Derek Li, Yan Shu, Robin Chen, Derek Duan, Teng Fang, and Bryan Dai. Fleming- r1: Toward expert-level medical reasoning via reinforcement learning.arXiv preprint arXiv:2509.15279, 2025

work page arXiv 2025

[18] [18]

Feifei Liu, Chengyu Liu, Lina Zhao, Xiangyu Zhang, Xiaoling Wu, Xiaoyan Xu, Yulin Liu, Caiyun Ma, Shoushui Wei, Zhiqiang He, et al. An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection.Journal of Medical Imaging and Health Informatics, 8(7):1368–1373, 2018

work page 2018

[19] [19]

Improved baselines with visual instruction tuning

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26296–26306, 2024

work page 2024

[20] [20]

Teach multimodal llms to comprehend electrocardiographic images.arXiv preprint arXiv:2410.19008, 2024

Ruoqi Liu, Yuelin Bai, Xiang Yue, and Ping Zhang. Teach multimodal llms to comprehend electrocardiographic images.arXiv preprint arXiv:2410.19008, 2024

work page arXiv 2024

[21] [21]

Tracing the heart’s pathways: Ecg representation learning from a cardiac conduction perspective.arXiv preprint arXiv:2512.24002, 2025

Tan Pan, Yixuan Sun, Chen Jiang, Qiong Gao, Rui Sun, Xingmeng Zhang, Zhenqi Yang, Limei Han, Yixiu Liang, Yuan Cheng, et al. Tracing the heart’s pathways: Ecg representation learning from a cardiac conduction perspective.arXiv preprint arXiv:2512.24002, 2025

work page arXiv 2025

[22] [22]

Q-heart: Ecg question answering via knowledge-informed multimodal llms.arXiv preprint arXiv:2505.06296, 2025

Hung Manh Pham, Jialu Tang, Aaqib Saeed, and Dong Ma. Q-heart: Ecg question answering via knowledge-informed multimodal llms.arXiv preprint arXiv:2505.06296, 2025

work page arXiv 2025

[23] [23]

Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature communications, 11(1):1760, 2020

Antônio H Ribeiro, Manoel Horta Ribeiro, Gabriela MM Paixão, Derick M Oliveira, Paulo R Gomes, Jéssica A Canazart, Milton PS Ferreira, Carl R Andersson, Peter W Macfarlane, Wagner Meira Jr, et al. Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature communications, 11(1):1760, 2020

work page 2020

[24] [24]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[25] [25]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[26] [26]

Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

Konstantinos C Siontis, Peter A Noseworthy, Zachi I Attia, and Paul A Friedman. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

work page 2021

[27] [27]

Ptb-xl, a large publicly available electrocardiography dataset.Scientific data, 7(1):1–15, 2020

Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I Lunze, Wojciech Samek, and Tobias Schaeffter. Ptb-xl, a large publicly available electrocardiography dataset.Scientific data, 7(1):1–15, 2020

work page 2020

[28] [28]

Meit: Multimodal electrocardiogram instruction tuning on large language models for report generation

Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Jing Xiong, Rossella Arcucci, Huaxiu Yao, and Mi Zhang. Meit: Multimodal electrocardiogram instruction tuning on large language models for report generation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 14510–14527, 2025

work page 2025

[29] [29]

From token to rhythm: A multi-scale approach for ecg-language pretraining

Fuying Wang, Jiacheng Xu, and Lequan Yu. From token to rhythm: A multi-scale approach for ecg-language pretraining. InForty-second International Conference on Machine Learning

work page

[30] [30]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[31] [31]

Ecg-lm: Understanding electrocardiogram with a large language model.Health Data Science, 5:0221, 2025

Kai Yang, Massimo Hong, Jiahuan Zhang, Yizhen Luo, Suyuan Zhao, Ou Zhang, Xiaomao Yu, Jiawen Zhou, Liuqing Yang, Ping Zhang, et al. Ecg-lm: Understanding electrocardiogram with a large language model.Health Data Science, 5:0221, 2025. 11

work page 2025

[32] [32]

A multi-view multi-scale neural network for multi-label ecg classification.IEEE Transactions on Emerging Topics in Computational Intelligence, 7(3):648–660, 2023

Shunxiang Yang, Cheng Lian, Zhigang Zeng, Bingrong Xu, Junbin Zang, and Zhidong Zhang. A multi-view multi-scale neural network for multi-label ecg classification.IEEE Transactions on Emerging Topics in Computational Intelligence, 7(3):648–660, 2023

work page 2023

[33] [33]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[34] [34]

Enhancing multi-label ecg classification via task-guided lead correlations in internet of medical things.IEEE Internet of Things Journal, 2025

Xiaoyan Yuan, Wei Wang, Junxin Chen, Kai Fang, Ali Kashif Bashir, Tapas Mondal, Xiping Hu, and M Jamal Deen. Enhancing multi-label ecg classification via task-guided lead correlations in internet of medical things.IEEE Internet of Things Journal, 2025

work page 2025

[35] [35]

Reading between the channels: Knowledge-augmented medical time series classification

Xiaoyan Yuan, Wei Wang, Junxin Chen, and Xiping Hu. Reading between the channels: Knowledge-augmented medical time series classification. InProceedings of the 33rd ACM International Conference on Multimedia, pages 8978–8987, 2025

work page 2025

[36] [36]

Ecg2tok: Ecg pre-training with self-distillation semantic tokenizers

Xiaoyan Yuan, Wei Wang, Han Liu, Jian Chen, and Xiping Hu. Ecg2tok: Ecg pre-training with self-distillation semantic tokenizers. In34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025, pages 9990–9998. International Joint Conferences on Artificial Intelligence, 2025

work page 2025

[37] [37]

Ecg-chat: A large ecg- language model for cardiac disease diagnosis

Yubao Zhao, Jiaju Kang, Tian Zhang, Puyu Han, and Tong Chen. Ecg-chat: A large ecg- language model for cardiac disease diagnosis. In2025 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2025

work page 2025

[38] [38]

Optimal multi-stage arrhythmia classification approach.Scientific reports, 10(1):2898, 2020

Jianwei Zheng, Huimin Chu, Daniele Struppa, Jianming Zhang, Sir Magdi Yacoub, Hesham El-Askary, Anthony Chang, Louis Ehwerhemuepha, Islam Abudayyeh, Alexander Barrett, et al. Optimal multi-stage arrhythmia classification approach.Scientific reports, 10(1):2898, 2020

work page 2020

[39] [39]

A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0

Jianwei Zheng, Hangyuan Guo, and Huimin Chu. A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0. 0).PhysioNet 2022Available online httpphysionet orgcontentecg arrhythmia10 0accessed on, 23:7, 2022

work page 2022

[40] [40]

Robustness to spurious correlations via dynamic knowledge transfer

Xiaoling Zhou, Wei Ye, Zhemg Lee, and Shikun Zhang. Robustness to spurious correlations via dynamic knowledge transfer. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, pages 7182–7190, 2025. 12

work page 2025