pith. sign in

arxiv: 2605.17308 · v1 · pith:IF6IM3HSnew · submitted 2026-05-17 · 💻 cs.AI

Reasoning Before Diagnosis: Physician-Inspired Structured Thinking for ECG Classification

Pith reviewed 2026-05-20 13:31 UTC · model grok-4.3

classification 💻 cs.AI
keywords ECG classificationstructured reasoningmultimodal large language modelinterpretable AIdiagnostic stagesSSPOclinical alignment
0
0 comments X

The pith

CardioThink improves ECG classification accuracy by explicitly modeling diagnostic reasoning through four interpretable stages: rhythm, conduction, morphology, and impression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that ECG diagnosis benefits from explicit structured reasoning modeled after physicians rather than direct label prediction from signals. CardioThink generates human-readable intermediate outputs covering cardiac rhythm, conduction properties, waveform morphology, and overall impression before arriving at the final classification. It uses Structured Set Policy Optimization to train this behavior by jointly rewarding format compliance and set accuracy without any manually labeled reasoning traces. This matters because opaque AI decisions hinder clinical adoption, while this method offers both higher accuracy and traceable logic that aligns with medical practice. Tests across ECG benchmarks confirm gains in both performance and the validity of the produced rationales.

Core claim

CardioThink is a physician-inspired multimodal large language model that derives ECG classifications by first producing structured reasoning in four stages—rhythm, conduction, morphology, and impression—optimized through Structured Set Policy Optimization that enforces adherence to the format and accuracy of variable-size diagnostic outputs without requiring annotated reasoning traces.

What carries the argument

CardioThink framework using Structured Set Policy Optimization (SSPO) to generate and optimize through the four-stage clinical reasoning sequence.

If this is right

  • Models that follow explicit clinical reasoning stages achieve higher diagnostic accuracy than direct prediction methods.
  • The approach provides interpretable clinical reasoning that aligns with how physicians diagnose ECGs.
  • SSPO enables effective training of structured outputs without the need for manually annotated intermediate reasoning.
  • Reasoning quality improves substantially, leading to more clinically valid rationales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This structured decomposition might apply to other medical AI domains requiring sequential diagnostic logic, such as imaging or lab interpretation.
  • By avoiding the need for annotated reasoning traces, the method could scale more easily to new ECG classification tasks.
  • Clinicians might review and intervene at specific stages like morphology assessment to correct potential errors.

Load-bearing premise

That the specific four-stage breakdown of rhythm, conduction, morphology, and impression sufficiently represents the reasoning process required for accurate and interpretable ECG classification.

What would settle it

A controlled experiment where a direct-prediction baseline model matches or exceeds CardioThink's accuracy on an ECG benchmark featuring cases that do not fit neatly into the four stages would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 2605.17308 by Hau-San Wong, Xiaoyan Yuan, Xiping Hu, Yang Wu.

Figure 1
Figure 1. Figure 1: Overall pipeline of CardioThink. Our model is trained in two stages: supervised instruction [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: , the pipeline consists of two stages. First, following previous works [3, 17], we leveraged the PTB-XL, CPSC, and CSN datasets and employed “Expert Role-Playing” prompts to guide ECG-Chat-13B [37] in simulating cardiologist diagnostics. This process yielded a comprehensive collection of ECG analyses. To ensure data reliability at scale, we developed a semi-automated cleaning pipeline informed by the manua… view at source ↗
Figure 3
Figure 3. Figure 3: Impact of training data amount on performance under (a) supervised fine-tuning (CS) and [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of reasoning quality between the Cold-Start model and the SSPO-aligned [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Electrocardiogram (ECG) diagnosis in clinical practice relies on structured reasoning over multiple hierarchical aspects, including cardiac rhythm, conduction properties, waveform morphology, and overall diagnostic impression. However, most existing approaches predict labels directly from ECG signals without explicit clinical reasoning, resulting in opaque decisions that lack clinical alignment. To bridge this gap, we propose CardioThink, a physician-inspired multimodal large language model (MLLM) framework that explicitly models the diagnostic reasoning process through human-interpretable intermediate stages (rhythm, conduction, morphology, and impression) to derive final classification results. Furthermore, we introduce Structured Set Policy Optimization (SSPO) to jointly optimize adherence to this structured reasoning format and the accuracy of variable-size diagnostic sets, without requiring manually annotated reasoning traces. Extensive experiments on diverse ECG benchmarks demonstrate the significant superiority of our approach in diagnostic accuracy, while simultaneously providing interpretable clinical reasoning. Notably, reasoning quality evaluations confirm that SSPO substantially enhances the clinical validity of the generated rationales. These findings reveal that moving beyond direct label prediction toward structured reasoning offers a more clinically aligned direction for future ECG modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces CardioThink, a multimodal large language model framework for ECG classification inspired by physician diagnostic reasoning. It structures the process into four stages—rhythm, conduction, morphology, and impression—to generate final classifications. The authors propose Structured Set Policy Optimization (SSPO) to train the model on this structured format and variable-size diagnostic sets without requiring manually annotated reasoning traces. The manuscript claims that this approach achieves significant superiority in diagnostic accuracy on diverse ECG benchmarks while providing interpretable clinical reasoning, with reasoning quality evaluations showing enhanced clinical validity.

Significance. Should the empirical results be substantiated, this work has the potential to advance the field of AI for medical signal processing by demonstrating that explicit modeling of clinical reasoning steps can improve both performance and interpretability in ECG diagnosis. The SSPO method, if effective without annotations, represents a practical advance for training structured outputs in LLMs for healthcare applications.

major comments (3)
  1. [Abstract] The abstract asserts 'extensive experiments' and 'significant superiority in diagnostic accuracy' along with 'reasoning quality evaluations' confirming enhancements, but the available manuscript text provides no quantitative metrics, baseline comparisons, statistical tests, or specific implementation details for SSPO, which leaves the central performance and validity claims without verifiable support.
  2. [Methods] The assumption that the four-stage decomposition (rhythm, conduction, morphology, impression) is sufficient to capture the reasoning needed for accurate ECG classification is not supported by any ablation studies or justification in the text; this decomposition is load-bearing for the claim of clinical alignment.
  3. [Experiments] The central claim requires that SSPO produces clinically aligned reasoning and superior accuracy without annotated traces, but reasoning quality appears measured by internal proxies (format adherence, label consistency, or LLM-as-judge scores) rather than expert comparison, risking that any accuracy gain arises from the underlying MLLM rather than the explicit structure.
minor comments (2)
  1. Clarify the exact architecture of the MLLM backbone and how the stages are integrated into the input/output pipeline.
  2. Provide more details on the ECG benchmarks used, including dataset sizes and class distributions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] The abstract asserts 'extensive experiments' and 'significant superiority in diagnostic accuracy' along with 'reasoning quality evaluations' confirming enhancements, but the available manuscript text provides no quantitative metrics, baseline comparisons, statistical tests, or specific implementation details for SSPO, which leaves the central performance and validity claims without verifiable support.

    Authors: We agree that the abstract would benefit from greater specificity. The full manuscript reports quantitative results in Section 4 (Experiments), including accuracy, F1, and AUC metrics across multiple ECG benchmarks, direct comparisons to strong MLLM baselines, and statistical significance testing via paired t-tests with p-values. SSPO implementation details, including the structured policy objective, reward formulation, and training hyperparameters, appear in Section 3.2. To improve immediate verifiability, we will revise the abstract to include the key numerical improvements (e.g., absolute accuracy gains and p-values) while retaining its concise style. revision: yes

  2. Referee: [Methods] The assumption that the four-stage decomposition (rhythm, conduction, morphology, and impression) is sufficient to capture the reasoning needed for accurate ECG classification is not supported by any ablation studies or justification in the text; this decomposition is load-bearing for the claim of clinical alignment.

    Authors: The four-stage structure follows standard clinical ECG interpretation protocols as described in major cardiology references (e.g., AHA/ACC guidelines). We selected these stages because they correspond to the sequential diagnostic steps physicians use when reading ECGs. We acknowledge that the current manuscript lacks explicit ablation experiments on alternative decompositions. In the revision we will add an ablation study that compares the full four-stage pipeline against (i) a two-stage variant, (ii) a direct-prediction baseline without intermediate stages, and (iii) an alternative three-stage decomposition, reporting both accuracy and clinical-alignment metrics to empirically support the chosen structure. revision: yes

  3. Referee: [Experiments] The central claim requires that SSPO produces clinically aligned reasoning and superior accuracy without annotated traces, but reasoning quality appears measured by internal proxies (format adherence, label consistency, or LLM-as-judge scores) rather than expert comparison, risking that any accuracy gain arises from the underlying MLLM rather than the explicit structure.

    Authors: We recognize that expert review provides the strongest test of clinical validity. The current evaluation uses format adherence, label consistency, and an LLM-as-judge protocol whose prompts were derived from clinical criteria; however, we did not include cardiologist ratings in the submitted version. We will add a human evaluation in which a random subset of generated rationales is independently scored by two board-certified cardiologists for clinical plausibility, stage-wise alignment, and overall diagnostic utility. We will also report accuracy results against identical-base-MLLM baselines that lack both the structured format and SSPO training, thereby isolating the contribution of the explicit reasoning pipeline. revision: yes

Circularity Check

0 steps flagged

Central claim rests on empirical results from new training procedure rather than self-defined quantities or self-citation chains

full rationale

The paper introduces CardioThink and SSPO as a modeling choice to decompose ECG diagnosis into four human-interpretable stages and optimize format adherence plus set accuracy without annotated traces. No equations, fitted parameters, or self-citations are presented that reduce the reported accuracy gains or reasoning validity to quantities defined by the authors' own prior work or by construction from the final labels. Superiority is instead shown via external benchmark experiments, making the derivation self-contained against independent evaluation metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework depends on the domain assumption that ECG diagnosis decomposes cleanly into the four listed stages and introduces SSPO as a new optimization procedure whose effectiveness is demonstrated only through the reported experiments.

axioms (1)
  • domain assumption ECG diagnosis can be decomposed into the four independent clinical stages of rhythm, conduction, morphology, and impression.
    The entire structured-reasoning pipeline is built on this decomposition of clinical practice.
invented entities (1)
  • Structured Set Policy Optimization (SSPO) no independent evidence
    purpose: Jointly optimize adherence to the structured reasoning format and accuracy of variable-size diagnostic sets without manual reasoning annotations.
    SSPO is presented as a novel training objective introduced in this work.

pith-pipeline@v0.9.0 · 5722 in / 1329 out tokens · 48126 ms · 2026-05-20T13:31:20.667219+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 7 internal anchors

  1. [1]

    Diagnostic reasoning in car- diovascular medicine.BMJ, 376, 2022

    John E Brush, Jonathan Sherbino, and Geoffrey R Norman. Diagnostic reasoning in car- diovascular medicine.BMJ, 376, 2022. doi: 10.1136/bmj-2021-064389. URL https: //www.bmj.com/content/376/bmj-2021-064389

  2. [2]

    Supreme: A supervised pre-training framework for multimodal ecg representation learning.arXiv preprint arXiv:2502.19668, 2025

    Mingsheng Cai, Jiuming Jiang, Wenhao Huang, Che Liu, and Rossella Arcucci. Supreme: A supervised pre-training framework for multimodal ecg representation learning.arXiv preprint arXiv:2502.19668, 2025

  3. [3]

    Qoq-med: Building multimodal clinical foundation models with domain-aware GRPO training

    Wei Dai, Peilin Chen, Chanakya Ekbote, and Paul Pu Liang. Qoq-med: Building multimodal clinical foundation models with domain-aware GRPO training. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview. net/forum?id=ZwCVFBFUFb

  4. [4]

    A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

    Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025. ISSN 1566-2535. doi: https:// doi.org/10.1016/j.inffus.2025.102963. URL https://www.sciencedirect.com/science/ arti...

  5. [5]

    Gaussian Error Linear Units (GELUs)

    D Hendrycks. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

  6. [6]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=nZeVKeeFYf9

  7. [7]

    A multi-resolution mutual learning network for multi-label ecg classification

    Wei Huang, Ning Wang, Panpan Feng, Haiyan Wang, Zongmin Wang, and Bing Zhou. A multi-resolution mutual learning network for multi-label ecg classification. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 3303–3306. IEEE, 2024

  8. [8]

    Boosting masked ecg-text auto-encoders as discriminative learners

    Manh Pham Hung, Aaqib Saeed, and Dong Ma. Boosting masked ecg-text auto-encoders as discriminative learners. InForty-second International Conference on Machine Learning

  9. [9]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024

  10. [10]

    Reading your heart: Learning ecg words and sentences via pre-training ecg language model

    Jiarui Jin, Haoyu Wang, Hongyan Li, Jun Li, Jiahui Pan, and Shenda Hong. Reading your heart: Learning ecg words and sentences via pre-training ecg language model. InThe Thirteenth International Conference on Learning Representations

  11. [11]

    Uniecg: Understanding and generating ecg in one unified model.arXiv preprint arXiv:2509.18588, 2025

    Jiarui Jin, Haoyu Wang, Xiang Lan, Jun Li, Gaofeng Cheng, Hongyan Li, and Shenda Hong. Uniecg: Understanding and generating ecg in one unified model.arXiv preprint arXiv:2509.18588, 2025

  12. [12]

    ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

    Jiarui Jin, Haoyu Wang, Xingliang Wu, Xiaocheng Fang, Xiang Lan, Zihan Wang, Deyun Zhang, Bo Liu, Yingying Zhang, Xian Wu, et al. Ecg-r1: Protocol-guided and modality-agnostic mllm for reliable ecg interpretation.arXiv preprint arXiv:2602.04279, 2026

  13. [13]

    Artificial intelli- gence for direct-to-physician reporting of ambulatory electrocardiography.Nature Medicine, 31 (3):925–931, 2025

    LS Johnson, P Zadrozniak, G Jasina, A Grotek-Cuprjak, JG Andrade, E Svennberg, SZ Diederichsen, WF McIntyre, S Stavrakis, J Benezet-Mazuecos, et al. Artificial intelli- gence for direct-to-physician reporting of ambulatory electrocardiography.Nature Medicine, 31 (3):925–931, 2025

  14. [14]

    Gem: Empowering mllm for grounded ecg understanding with time series and images.arXiv preprint arXiv:2503.06073, 2025

    Xiang Lan, Feng Wu, Kai He, Qinghao Zhao, Shenda Hong, and Mengling Feng. Gem: Empowering mllm for grounded ecg understanding with time series and images.arXiv preprint arXiv:2503.06073, 2025

  15. [15]

    Generative classifiers avoid shortcut solutions.arXiv preprint arXiv:2512.25034, 2025

    Alexander C Li, Ananya Kumar, and Deepak Pathak. Generative classifiers avoid shortcut solutions.arXiv preprint arXiv:2512.25034, 2025. 10

  16. [16]

    Zero- shot ecg classification with multimodal learning and test-time clinical knowledge enhancement

    Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, and Rossella Arcucci. Zero- shot ecg classification with multimodal learning and test-time clinical knowledge enhancement. InForty-first International Conference on Machine Learning

  17. [17]

    Fleming- r1: Toward expert-level medical reasoning via reinforcement learning.arXiv preprint arXiv:2509.15279, 2025

    Chi Liu, Derek Li, Yan Shu, Robin Chen, Derek Duan, Teng Fang, and Bryan Dai. Fleming- r1: Toward expert-level medical reasoning via reinforcement learning.arXiv preprint arXiv:2509.15279, 2025

  18. [18]

    Feifei Liu, Chengyu Liu, Lina Zhao, Xiangyu Zhang, Xiaoling Wu, Xiaoyan Xu, Yulin Liu, Caiyun Ma, Shoushui Wei, Zhiqiang He, et al. An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection.Journal of Medical Imaging and Health Informatics, 8(7):1368–1373, 2018

  19. [19]

    Improved baselines with visual instruction tuning

    Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26296–26306, 2024

  20. [20]

    Teach multimodal llms to comprehend electrocardiographic images.arXiv preprint arXiv:2410.19008, 2024

    Ruoqi Liu, Yuelin Bai, Xiang Yue, and Ping Zhang. Teach multimodal llms to comprehend electrocardiographic images.arXiv preprint arXiv:2410.19008, 2024

  21. [21]

    Tracing the heart’s pathways: Ecg representation learning from a cardiac conduction perspective.arXiv preprint arXiv:2512.24002, 2025

    Tan Pan, Yixuan Sun, Chen Jiang, Qiong Gao, Rui Sun, Xingmeng Zhang, Zhenqi Yang, Limei Han, Yixiu Liang, Yuan Cheng, et al. Tracing the heart’s pathways: Ecg representation learning from a cardiac conduction perspective.arXiv preprint arXiv:2512.24002, 2025

  22. [22]

    Q-heart: Ecg question answering via knowledge-informed multimodal llms.arXiv preprint arXiv:2505.06296, 2025

    Hung Manh Pham, Jialu Tang, Aaqib Saeed, and Dong Ma. Q-heart: Ecg question answering via knowledge-informed multimodal llms.arXiv preprint arXiv:2505.06296, 2025

  23. [23]

    Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature communications, 11(1):1760, 2020

    Antônio H Ribeiro, Manoel Horta Ribeiro, Gabriela MM Paixão, Derick M Oliveira, Paulo R Gomes, Jéssica A Canazart, Milton PS Ferreira, Carl R Andersson, Peter W Macfarlane, Wagner Meira Jr, et al. Automatic diagnosis of the 12-lead ecg using a deep neural network.Nature communications, 11(1):1760, 2020

  24. [24]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  25. [25]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  26. [26]

    Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

    Konstantinos C Siontis, Peter A Noseworthy, Zachi I Attia, and Paul A Friedman. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

  27. [27]

    Ptb-xl, a large publicly available electrocardiography dataset.Scientific data, 7(1):1–15, 2020

    Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I Lunze, Wojciech Samek, and Tobias Schaeffter. Ptb-xl, a large publicly available electrocardiography dataset.Scientific data, 7(1):1–15, 2020

  28. [28]

    Meit: Multimodal electrocardiogram instruction tuning on large language models for report generation

    Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Jing Xiong, Rossella Arcucci, Huaxiu Yao, and Mi Zhang. Meit: Multimodal electrocardiogram instruction tuning on large language models for report generation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 14510–14527, 2025

  29. [29]

    From token to rhythm: A multi-scale approach for ecg-language pretraining

    Fuying Wang, Jiacheng Xu, and Lequan Yu. From token to rhythm: A multi-scale approach for ecg-language pretraining. InForty-second International Conference on Machine Learning

  30. [30]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  31. [31]

    Ecg-lm: Understanding electrocardiogram with a large language model.Health Data Science, 5:0221, 2025

    Kai Yang, Massimo Hong, Jiahuan Zhang, Yizhen Luo, Suyuan Zhao, Ou Zhang, Xiaomao Yu, Jiawen Zhou, Liuqing Yang, Ping Zhang, et al. Ecg-lm: Understanding electrocardiogram with a large language model.Health Data Science, 5:0221, 2025. 11

  32. [32]

    A multi-view multi-scale neural network for multi-label ecg classification.IEEE Transactions on Emerging Topics in Computational Intelligence, 7(3):648–660, 2023

    Shunxiang Yang, Cheng Lian, Zhigang Zeng, Bingrong Xu, Junbin Zang, and Zhidong Zhang. A multi-view multi-scale neural network for multi-label ecg classification.IEEE Transactions on Emerging Topics in Computational Intelligence, 7(3):648–660, 2023

  33. [33]

    DAPO: An Open-Source LLM Reinforcement Learning System at Scale

    Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476, 2025

  34. [34]

    Enhancing multi-label ecg classification via task-guided lead correlations in internet of medical things.IEEE Internet of Things Journal, 2025

    Xiaoyan Yuan, Wei Wang, Junxin Chen, Kai Fang, Ali Kashif Bashir, Tapas Mondal, Xiping Hu, and M Jamal Deen. Enhancing multi-label ecg classification via task-guided lead correlations in internet of medical things.IEEE Internet of Things Journal, 2025

  35. [35]

    Reading between the channels: Knowledge-augmented medical time series classification

    Xiaoyan Yuan, Wei Wang, Junxin Chen, and Xiping Hu. Reading between the channels: Knowledge-augmented medical time series classification. InProceedings of the 33rd ACM International Conference on Multimedia, pages 8978–8987, 2025

  36. [36]

    Ecg2tok: Ecg pre-training with self-distillation semantic tokenizers

    Xiaoyan Yuan, Wei Wang, Han Liu, Jian Chen, and Xiping Hu. Ecg2tok: Ecg pre-training with self-distillation semantic tokenizers. In34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025, pages 9990–9998. International Joint Conferences on Artificial Intelligence, 2025

  37. [37]

    Ecg-chat: A large ecg- language model for cardiac disease diagnosis

    Yubao Zhao, Jiaju Kang, Tian Zhang, Puyu Han, and Tong Chen. Ecg-chat: A large ecg- language model for cardiac disease diagnosis. In2025 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2025

  38. [38]

    Optimal multi-stage arrhythmia classification approach.Scientific reports, 10(1):2898, 2020

    Jianwei Zheng, Huimin Chu, Daniele Struppa, Jianming Zhang, Sir Magdi Yacoub, Hesham El-Askary, Anthony Chang, Louis Ehwerhemuepha, Islam Abudayyeh, Alexander Barrett, et al. Optimal multi-stage arrhythmia classification approach.Scientific reports, 10(1):2898, 2020

  39. [39]

    A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0

    Jianwei Zheng, Hangyuan Guo, and Huimin Chu. A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0. 0).PhysioNet 2022Available online httpphysionet orgcontentecg arrhythmia10 0accessed on, 23:7, 2022

  40. [40]

    Robustness to spurious correlations via dynamic knowledge transfer

    Xiaoling Zhou, Wei Ye, Zhemg Lee, and Shikun Zhang. Robustness to spurious correlations via dynamic knowledge transfer. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, pages 7182–7190, 2025. 12