Physiology-Aware CNN and Zero-Shot Multimodal LLMs for ECG Image Classification: A Comparative Study

Derek Abbott; Khalil Ahammad; Mohsen Dorraki

arxiv: 2606.22889 · v1 · pith:Q2AI5YYZnew · submitted 2026-06-22 · 💻 cs.LG

Physiology-Aware CNN and Zero-Shot Multimodal LLMs for ECG Image Classification: A Comparative Study

Khalil Ahammad , Derek Abbott , Mohsen Dorraki This is my paper

Pith reviewed 2026-06-26 09:25 UTC · model grok-4.3

classification 💻 cs.LG

keywords ECG image classificationmultimodal LLMszero-shot learningphysiology-aware CNNlead-group aggregationPTB-XL datasetROC-AUC evaluation

0 comments

The pith

Zero-shot multimodal LLMs reach only chance-level accuracy on ECG abnormality detection while physiology-aware CNNs reach 0.92-0.94 ROC-AUC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether zero-shot multimodal LLMs can classify 12-lead ECG images as normal or abnormal. It finds that three leading models (GPT-5.2, GPT-4.1, Gemini-2.5 Pro) perform near random chance. In the same setting a custom CNN that groups leads by anatomical regions achieves stable high accuracy on both internal and external data. The authors conclude that clinically framed, domain-specific architectures remain necessary for reliable ECG interpretation.

Core claim

Standard 12-lead ECG images were presented to GPT-5.2, GPT-4.1 and Gemini-2.5 Pro under a fixed zero-shot prompt; all three models produced ROC-AUC values around 0.5. In parallel, a physiology-aware CNN (LeadGroupECG) that aggregates features from predefined anatomical lead groups reached average internal ROC-AUC 0.92-0.94 and external ROC-AUC 0.85-0.86 on PTB-XL, outperforming its backbone internally while remaining competitive with ResNet18, DenseNet121 and VGG16. The same CNN models stayed stable across random seeds and continued to highlight anatomical lead-group contributions.

What carries the argument

LeadGroupECG model that aggregates convolutional features from predefined anatomical lead groups before final classification.

If this is right

CNN performance remains stable across internal and external datasets while LLM performance stays near chance.
Anatomical lead-group aggregation improves internal discrimination without harming external generalization.
Grid-based calibration backgrounds yield modest PR-AUC gains over grid-free images for the LLMs.
Multimodal LLMs can still produce narrative descriptions even when their binary discrimination fails.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the chance-level LLM result holds under varied prompts, hospitals would need separate domain-specific models rather than relying on general-purpose LLMs for ECG triage.
The lead-group mechanism could be tested on finer-grained tasks such as arrhythmia subtyping to check whether anatomical grouping scales beyond binary normal-abnormal decisions.

Load-bearing premise

The internal test set and PTB-XL are representative of real clinical ECG distributions and the single fixed zero-shot prompt fairly tests LLM capability without hidden artifacts.

What would settle it

A replication that applies varied prompts, multi-page ECG layouts or a larger external ECG corpus and measures whether any LLM configuration exceeds 0.6 ROC-AUC on the same normal-abnormal task.

read the original abstract

Multimodal large language models (LLMs) are increasingly adopted to interpret 12-lead ECG images, though the interpretations often lack validation. However, ECG image understanding significantly differs from general images as it depends on precise waveform morphology, lead relationships and accurate interval measurements. This study investigated whether zero-shot multimodal LLMs can reliably distinguish normal and abnormal ECG images and, in parallel, evaluated CNN-based models for clinically grounded references. Standard 12-lead ECG recordings were rendered as single-page images for a binary normal-abnormal classification task. Three prominent LLMs (GPT-5.2, GPT-4.1, and Gemini-2.5 Pro) were tested using a fixed zero-shot prompt across multiple runs. In parallel, a physiology-aware CNN-based model was developed with the capability to aggregate features from the predefined anatomical lead groups. The model was compared with ResNet18, DenseNet121, VGG16 baselines, and all the models were evaluated on an internal test set and external PTB-XL dataset. Across seeds, CNN-based models demonstrated stable discrimination, with average internal ROC-AUC of 0.92-0.94, and external ROC-AUC of 0.85-0.86. The proposed LeadGroupECG model significantly improved over its backbone internally without compromising external generalization. It remained competitive with other baselines, while consistently highlighting anatomical lead-group contributions. In contrast, zero-shot LLM discrimination remained near-chance (ROC-AUC around 0.5). The PR-AUC improved slightly when ECGs used a grid-based calibration background compared with the grid-free ECGs. Although multimodal LLMs can generate reasonable ECG narratives, their zero-shot diagnostic discrimination remains limited. Therefore, clinically framed, domain-specific architectures remain essential for AI-based ECG interpretation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CNNs reach 0.85-0.94 AUC on ECG images while zero-shot LLMs stay near 0.5, but the LLM result may partly trace to fixed prompt and single-page rendering rather than pure capability limits.

read the letter

The main point is that standard CNNs deliver usable discrimination on rendered 12-lead ECG images for normal/abnormal classification, while the three tested zero-shot multimodal LLMs do not. The paper also introduces LeadGroupECG, which aggregates features by anatomical lead groups and improves internal AUC over its backbone without losing external performance on PTB-XL.

The CNN results look solid: stable across seeds, competitive with ResNet18/DenseNet121/VGG16, and the lead-group variant adds a modest but interpretable gain. Testing on both an internal set and PTB-XL is the right move for this kind of work.

The LLM side is the weaker part. The near-chance ROC-AUC is reported clearly, yet the abstract gives no dataset sizes, class balance, exact prompt wording, or image rendering details. The stress-test concern holds: a single fixed prompt plus one-page layout could easily suppress vision-encoder performance on waveform details and interval measurements. The paper notes that LLMs can still produce reasonable narratives, which suggests the limitation is narrower than the headline claim implies.

This is a straightforward empirical comparison rather than a new method or theoretical result. Readers working on medical imaging pipelines or deciding between general LLMs and domain models will find the numbers useful. The work is coherent on its own terms and deserves referee time; the LLM evaluation simply needs tighter controls on prompting and input format before the broader conclusion can be taken as settled.

Referee Report

3 major / 1 minor

Summary. The paper compares a proposed physiology-aware CNN (LeadGroupECG) that aggregates features from predefined anatomical lead groups against standard CNN baselines (ResNet18, DenseNet121, VGG16) and three zero-shot multimodal LLMs (GPT-5.2, GPT-4.1, Gemini-2.5 Pro) on binary normal/abnormal classification of single-page rendered 12-lead ECG images. CNN models achieve stable high performance (internal ROC-AUC 0.92-0.94, external PTB-XL 0.85-0.86) with the proposed model improving internally without harming generalization, while LLMs remain near chance (ROC-AUC ~0.5); the conclusion is that domain-specific architectures remain essential.

Significance. If the empirical results hold after supplying missing details, the work provides concrete evidence that zero-shot multimodal LLMs currently lack reliable discrimination on ECG waveform morphology and interval tasks despite narrative generation ability, reinforcing the value of clinically framed CNNs with lead-group structure for medical signal interpretation and offering both internal and external validation.

major comments (3)

[Abstract] Abstract and Methods: The reported ROC-AUC (0.92-0.94 internal, 0.85-0.86 external) and PR-AUC values for both CNNs and LLMs are supplied without dataset sizes, class balance, total sample counts, or any statistical tests, which are load-bearing for assessing whether the performance gap and the claim of LLM near-chance discrimination are reliable.
[Methods] Methods (LLM evaluation paragraph): The fixed zero-shot prompt text is not reproduced and image rendering details (resolution, single-page layout parameters, grid vs. grid-free specifics) are omitted; this directly affects evaluation of the skeptic concern that the ~0.5 ROC-AUC may partly reflect prompt or rendering artifacts rather than inherent limitation, undermining the generalization to 'clinically framed, domain-specific architectures remain essential'.
[Results] Results: The statement that LeadGroupECG 'significantly improved over its backbone internally' lacks reported p-values, confidence intervals, or effect sizes, and the external generalization claim cannot be fully evaluated without sample sizes or per-class metrics.

minor comments (1)

[Abstract] Clarify the exact versions or access dates for the cited LLMs (GPT-5.2, GPT-4.1) as these labels are non-standard.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the missing details limit full evaluation of the results and have prepared revisions to supply them. Point-by-point responses are provided below.

read point-by-point responses

Referee: [Abstract] Abstract and Methods: The reported ROC-AUC (0.92-0.94 internal, 0.85-0.86 external) and PR-AUC values for both CNNs and LLMs are supplied without dataset sizes, class balance, total sample counts, or any statistical tests, which are load-bearing for assessing whether the performance gap and the claim of LLM near-chance discrimination are reliable.

Authors: We agree these details are necessary. The revised manuscript will report internal and external dataset sizes, class balances, total sample counts, and statistical tests (including p-values and 95% confidence intervals) for all AUC and PR-AUC values to support the performance gap claims. revision: yes
Referee: [Methods] Methods (LLM evaluation paragraph): The fixed zero-shot prompt text is not reproduced and image rendering details (resolution, single-page layout parameters, grid vs. grid-free specifics) are omitted; this directly affects evaluation of the skeptic concern that the ~0.5 ROC-AUC may partly reflect prompt or rendering artifacts rather than inherent limitation, undermining the generalization to 'clinically framed, domain-specific architectures remain essential'.

Authors: The exact zero-shot prompt will be reproduced in the Methods. Image rendering parameters (resolution, single-page layout, grid vs. grid-free details) will be added to allow assessment of potential artifacts and strengthen the generalization argument. revision: yes
Referee: [Results] Results: The statement that LeadGroupECG 'significantly improved over its backbone internally' lacks reported p-values, confidence intervals, or effect sizes, and the external generalization claim cannot be fully evaluated without sample sizes or per-class metrics.

Authors: We will add p-values, confidence intervals, and effect sizes for the internal improvement. External PTB-XL results will include sample sizes and per-class metrics (e.g., sensitivity, specificity) to fully support the generalization claims. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with measured metrics

full rationale

The paper reports experimental results from training and evaluating CNN models (including a proposed LeadGroupECG variant) and testing zero-shot multimodal LLMs on binary ECG classification, using internal held-out data and external PTB-XL validation. All claims rest on observed ROC-AUC, PR-AUC and related performance numbers rather than any derivation, equation, fitted parameter renamed as prediction, or self-citation chain. No load-bearing step reduces to its own inputs by construction; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard machine learning evaluation assumptions and the representativeness of the chosen datasets; no free parameters, invented entities, or non-standard axioms are introduced in the abstract.

axioms (1)

domain assumption ROC-AUC and PR-AUC are suitable metrics for assessing binary ECG image classification performance.
Common practice in medical image classification tasks.

pith-pipeline@v0.9.1-grok · 5866 in / 1490 out tokens · 29791 ms · 2026-06-26T09:25:20.357236+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 37 canonical work pages · 3 internal anchors

[1]

Artificial intelligence -enhanced electrocardiography in cardiovascular disease management

Siontis KC, Noseworthy PA, Attia ZI, Friedman PA. Artificial intelligence -enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol. 2021;18:465 –78. https://doi.org/10.1038/s41569-020-00503-2

work page doi:10.1038/s41569-020-00503-2 2021
[2]

Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I

Strodthoff N, Wagner P, Schaeffter T, Samek W. Deep Learning for ECG Analysis: Benchmarks and Insights from PTB -XL. IEEE J Biomed Health Inform. 2021;25:1519 –28. https://doi.org/10.1109/JBHI.2020.3022989

work page doi:10.1109/jbhi.2020.3022989 2021
[3]

Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network

Hannun AY , Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019;25:65–9. https://doi.org/10.1038/s41591-018-0268-3

work page doi:10.1038/s41591-018-0268-3 2019
[4]

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Rajpurkar P, Hannun AY , Haghpanahi M, Bourn C, Ng AY . Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks. 2017. https://doi.org/10.48550/arXiv.1707.01836

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.01836 2017
[5]

A lightweight 1D convolutional neural network model for arrhythmia diagnosis from electrocardiogram signal

Saha Tchinda B, Tchiotsop D. A lightweight 1D convolutional neural network model for arrhythmia diagnosis from electrocardiogram signal. Phys Eng Sci Med. 2025;48:577 –89. https://doi.org/10.1007/s13246-025-01525-1

work page doi:10.1007/s13246-025-01525-1 2025
[6]

Deep learning for ECG classification: A comparative study of 1D and 2D representations and multimodal fusion approaches

Narotamo H, Dias M, Santos R, Carreiro A V , Gamboa H, Silveira M. Deep learning for ECG classification: A comparative study of 1D and 2D representations and multimodal fusion approaches. Biomed Signal Process Control. 2024;93:106141. https://doi.org/10.1016/j.bspc.2024.106141

work page doi:10.1016/j.bspc.2024.106141 2024
[7]

Automatic diagnosis of the 12 -lead ECG using a deep neural network

Ribeiro AH, Ribeiro MH, Paixão GMM, Oliveira DM, Gomes PR, Canazart JA, et al. Automatic diagnosis of the 12 -lead ECG using a deep neural network. Nat Commun. 2020;11:1760. https://doi.org/10.1038/s41467-020-15432-4. K. Ahammad et al. 21

work page doi:10.1038/s41467-020-15432-4 2020
[8]

Deep learning approach for active classification of electrocardiogram signals

Rahhal MMA, Bazi Y , AlHichri H, Alajlan N, Melgani F, Yager RR. Deep learning approach for active classification of electrocardiogram signals. Inf Sci. 2016;345:340 –54. https://doi.org/10.1016/j.ins.2016.01.082

work page doi:10.1016/j.ins.2016.01.082 2016
[9]

ECG -based multi-class arrhythmia detection using spatio -temporal attention -based convolutional recurrent neural network

Zhang J, Liu A, Gao M, Chen X, Zhang X, Chen X. ECG -based multi-class arrhythmia detection using spatio -temporal attention -based convolutional recurrent neural network. Artif Intell Med. 2020;106:101856. https://doi.org/10.1016/j.artmed.2020.101856

work page doi:10.1016/j.artmed.2020.101856 2020
[10]

12-Lead ECG signal classification for detecting ECG arrhythmia via an information bottleneck -based multi -scale network

Zhang S, Lian C, Xu B, Su Y , Alhudhaif A. 12-Lead ECG signal classification for detecting ECG arrhythmia via an information bottleneck -based multi -scale network. Inf Sci. 2024;662:120239. https://doi.org/10.1016/j.ins.2024.120239

work page doi:10.1016/j.ins.2024.120239 2024
[11]

ECG signal classification based on deep CNN and BiLSTM

Cheng J, Zou Q, Zhao Y . ECG signal classification based on deep CNN and BiLSTM. BMC Med Inform Decis Mak. 2021;21:365. https://doi.org/10.1186/s12911-021-01736-y

work page doi:10.1186/s12911-021-01736-y 2021
[12]

A Deep-Learning Approach to ECG Classification Based on Adversarial Domain Adaptation

Niu L, Chen C, Liu H, Zhou S, Shu M. A Deep-Learning Approach to ECG Classification Based on Adversarial Domain Adaptation. Healthcare. 2020;8:437. https://doi.org/10.3390/healthcare8040437

work page doi:10.3390/healthcare8040437 2020
[13]

Clinically meaningful interpretability of an AI model for ECG classification

Gliner V , Levy I, Tsutsui K, Acha MR, Schliamser J, Schuster A, et al. Clinically meaningful interpretability of an AI model for ECG classification. Npj Digit Med. 2025;8:109. https://doi.org/10.1038/s41746-025-01467-8

work page doi:10.1038/s41746-025-01467-8 2025
[14]

Impact of ECG data format on the performance of machine learning models for the prediction of myocardial infarction

Bellfield RAA, Ortega -Martorell S, Lip GYH, Oxborough D, Olier I. Impact of ECG data format on the performance of machine learning models for the prediction of myocardial infarction. J Electrocardiol. 2024;84:17–26. https://doi.org/10.1016/j.jelectrocard.2024.03.005

work page doi:10.1016/j.jelectrocard.2024.03.005 2024
[15]

FM-ECG: A fine-grained multi-label framework for ECG image classification

Du N, Cao Q, Yu L, Liu N, Zhong E, Liu Z, et al. FM-ECG: A fine-grained multi-label framework for ECG image classification. Inf Sci. 2021;549:164–77. https://doi.org/10.1016/j.ins.2020.10.014

work page doi:10.1016/j.ins.2020.10.014 2021
[16]

Electrocardiogram image classification for six classes of heart diseases

Oke OA, Cavus N. Electrocardiogram image classification for six classes of heart diseases. Iran J Comput Sci. 2025;8:419–39. https://doi.org/10.1007/s42044-025-00227-x

work page doi:10.1007/s42044-025-00227-x 2025
[17]

Enhancing cardiac diagnostics: a deep learning ensemble approach for precise ECG image classification

Alsayat A, Mahmoud AA, Alanazi S, Mostafa AM, Alshammari N, Alrowaily MA, et al. Enhancing cardiac diagnostics: a deep learning ensemble approach for precise ECG image classification. J Big Data. 2025;12:7. https://doi.org/10.1186/s40537-025-01070-4

work page doi:10.1186/s40537-025-01070-4 2025
[18]

Image based deep learning in 12 -lead ECG diagnosis

Ao R, He G. Image based deep learning in 12 -lead ECG diagnosis. Front Artif Intell. 2023;5:1087370. https://doi.org/10.3389/frai.2022.1087370

work page doi:10.3389/frai.2022.1087370 2023
[19]

Leads, Axis, and Acquisition of the 12 -Lead ECG

Dingler A, Vandeventer S, Borkosky J, Henrichs B, McConachie A, Muthersbaugh HC, et al. Leads, Axis, and Acquisition of the 12 -Lead ECG. In: The 12 -Lead ECG in Acute Coronary Syndromes. Elsevier; 2019

2019
[20]

Recommendations for the Standardization and Interpretation of the Electrocardiogram

Kligfield P, Gettes LS, Bailey JJ, Childers R, Deal BJ, Hancock EW, et al. Recommendations for the Standardization and Interpretation of the Electrocardiogram. J Am Coll Cardiol. 2007;49:1109–27. https://doi.org/10.1016/j.jacc.2007.01.024

work page doi:10.1016/j.jacc.2007.01.024 2007
[21]

Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions

Rabbani SA, El-Tanani M, Sharma S, Rabbani SS, El-Tanani Y , Kumar R, et al. Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions. BioMedInformatics. 2025;5:37. https://doi.org/10.3390/biomedinformatics5030037

work page doi:10.3390/biomedinformatics5030037 2025
[22]

Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study

Engelstein H, Ramon-Gonen R, Sabbag A, Klang E, Sudri K, Cohen-Shelly M, et al. Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study. JMIR AI. 2025;4:e74426–e74426. https://doi.org/10.2196/74426

work page doi:10.2196/74426 2025
[23]

ECG-Doctor: An Interpretable Multimodal ECG Diagnosis Framework Based on Large Language Models

Tian D, Jiang J, Zhang K, Liu C, Yuan Y , Gao M, et al. ECG-Doctor: An Interpretable Multimodal ECG Diagnosis Framework Based on Large Language Models. In: Proceedings of the 34th ACM International Conference on Information and Knowledge Management. Seo ul Republic of Korea: ACM; 2025. p. 2863–73. https://doi.org/10.1145/3746252.3761082

work page doi:10.1145/3746252.3761082 2025
[24]

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Yu H, Guo P, Sano A. ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text. 2024. https://doi.org/10.48550/arXiv.2405.19366. Physiology-Aware CNN and Zero-Shot Multimodal LLMs for ECG Image Classification: A Comparative Study 22

work page doi:10.48550/arxiv.2405.19366 2024
[25]

Zero-shot ecg classification with multimodal learning and test-time clinical knowledge enhancement

Liu C, Wan Z, Ouyang C, Shah A, Bai W, Arcucci R. Zero-Shot ECG Classification with Multimodal Learning and Test -time Clinical Knowledge Enhancement. 2024. https://doi.org/10.48550/arXiv.2403.06659

work page doi:10.48550/arxiv.2403.06659 2024
[26]

Zero -Shot ECG Diagnosis with Large Language Models and Retrieval - Augmented Generation

Yu H, Guo P, Sano A. Zero -Shot ECG Diagnosis with Large Language Models and Retrieval - Augmented Generation
[27]

URL https://spj

Yang K, Hong M, Zhang J, Luo Y , Zhao S, Zhang O, et al. ECG -LM: Understanding Electrocardiogram with a Large Language Model. Health Data Sci. 2025;5:0221. https://doi.org/10.34133/hds.0221

work page doi:10.34133/hds.0221 2025
[28]

https://doi.org/10.1007/978-1-4842-4470-8_7

Ekaba B. Google Colaboratory. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Berkeley, CA: Apress. https://doi.org/10.1007/978-1-4842-4470-8_7

work page doi:10.1007/978-1-4842-4470-8_7
[29]

A large -scale multi -label 12 -lead electrocardiogram database with standardized diagnostic statements

Liu H, Chen D, Chen D, Zhang X, Li H, Bian L, et al. A large -scale multi -label 12 -lead electrocardiogram database with standardized diagnostic statements. Sci Data. 2022;9:272. https://doi.org/10.1038/s41597-022-01403-5

work page doi:10.1038/s41597-022-01403-5 2022
[30]

PTB -XL, a large publicly available electrocardiography dataset

Wagner P, Strodthoff N, Bousseljot R-D, Kreiseler D, Lunze FI, Samek W, et al. PTB -XL, a large publicly available electrocardiography dataset. Sci Data. 2020;7:154. https://doi.org/10.1038/s41597 - 020-0495-6

work page doi:10.1038/s41597 2020
[31]

CNN -FWS: A Model for the Diagnosis of Normal and Abnormal ECG with Feature Adaptive

Zhu J, Lv J, Kong D. CNN -FWS: A Model for the Diagnosis of Normal and Abnormal ECG with Feature Adaptive. Entropy. 2022;24:471. https://doi.org/10.3390/e24040471

work page doi:10.3390/e24040471 2022
[32]

Deep Residual Learning for Image Recognition

He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. 2015. https://doi.org/10.48550/arXiv.1512.03385

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1512.03385 2015
[33]

Densely Connected Convolutional Networks

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI: IEEE

2017
[34]

p. 2261–9. https://doi.org/10.1109/CVPR.2017.243

work page doi:10.1109/cvpr.2017.243 2017
[35]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition
[36]

https://doi.org/10.48550/arXiv.1409.1556

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1409.1556
[37]

Analysis of an adaptive lead weighted ResNet for multiclass classification of 12 -lead ECGs

Zhao Z, Murphy D, Gifford H, Williams S, Darlington A, Relton SD, et al. Analysis of an adaptive lead weighted ResNet for multiclass classification of 12 -lead ECGs. Physiol Meas. 2022;43:034001. https://doi.org/10.1088/1361-6579/ac5b4a

work page doi:10.1088/1361-6579/ac5b4a 2022
[38]

ECG Heartbeat Classification Based on an Improved ResNet-18 Model

Jing E, Zhang H, Li Z, Liu Y , Ji Z, Ganchev I. ECG Heartbeat Classification Based on an Improved ResNet-18 Model. Comput Math Methods Med. 2021;2021:1 –13. https://doi.org/10.1155/2021/6649970

work page doi:10.1155/2021/6649970 2021
[39]

Automatic varied -length ECG classification using a lightweight DenseNet model

Bui TH, Hoang VM, Pham MT. Automatic varied -length ECG classification using a lightweight DenseNet model. Biomed Signal Process Control. 2023;82:104529. https://doi.org/10.1016/j.bspc.2022.104529

work page doi:10.1016/j.bspc.2022.104529 2023
[40]

Deep convolutional neural networks based ECG beats classification to diagnose cardiovascular conditions

Rashed -Al-Mahfuz Md, Moni MA, Lio’ P, Islam SMS, Berkovsky S, Khushi M, et al. Deep convolutional neural networks based ECG beats classification to diagnose cardiovascular conditions. Biomed Eng Lett. 2021;11:147–62. https://doi.org/10.1007/s13534-021-00185-w

work page doi:10.1007/s13534-021-00185-w 2021
[41]

ImageNet: A large-scale hierarchical image database

Deng J, Dong W, Socher R, Li L-J, Kai Li, Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE

2009
[42]

p. 248–55. https://doi.org/10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009

[1] [1]

Artificial intelligence -enhanced electrocardiography in cardiovascular disease management

Siontis KC, Noseworthy PA, Attia ZI, Friedman PA. Artificial intelligence -enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol. 2021;18:465 –78. https://doi.org/10.1038/s41569-020-00503-2

work page doi:10.1038/s41569-020-00503-2 2021

[2] [2]

Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I

Strodthoff N, Wagner P, Schaeffter T, Samek W. Deep Learning for ECG Analysis: Benchmarks and Insights from PTB -XL. IEEE J Biomed Health Inform. 2021;25:1519 –28. https://doi.org/10.1109/JBHI.2020.3022989

work page doi:10.1109/jbhi.2020.3022989 2021

[3] [3]

Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network

Hannun AY , Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019;25:65–9. https://doi.org/10.1038/s41591-018-0268-3

work page doi:10.1038/s41591-018-0268-3 2019

[4] [4]

Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks

Rajpurkar P, Hannun AY , Haghpanahi M, Bourn C, Ng AY . Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks. 2017. https://doi.org/10.48550/arXiv.1707.01836

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.01836 2017

[5] [5]

A lightweight 1D convolutional neural network model for arrhythmia diagnosis from electrocardiogram signal

Saha Tchinda B, Tchiotsop D. A lightweight 1D convolutional neural network model for arrhythmia diagnosis from electrocardiogram signal. Phys Eng Sci Med. 2025;48:577 –89. https://doi.org/10.1007/s13246-025-01525-1

work page doi:10.1007/s13246-025-01525-1 2025

[6] [6]

Deep learning for ECG classification: A comparative study of 1D and 2D representations and multimodal fusion approaches

Narotamo H, Dias M, Santos R, Carreiro A V , Gamboa H, Silveira M. Deep learning for ECG classification: A comparative study of 1D and 2D representations and multimodal fusion approaches. Biomed Signal Process Control. 2024;93:106141. https://doi.org/10.1016/j.bspc.2024.106141

work page doi:10.1016/j.bspc.2024.106141 2024

[7] [7]

Automatic diagnosis of the 12 -lead ECG using a deep neural network

Ribeiro AH, Ribeiro MH, Paixão GMM, Oliveira DM, Gomes PR, Canazart JA, et al. Automatic diagnosis of the 12 -lead ECG using a deep neural network. Nat Commun. 2020;11:1760. https://doi.org/10.1038/s41467-020-15432-4. K. Ahammad et al. 21

work page doi:10.1038/s41467-020-15432-4 2020

[8] [8]

Deep learning approach for active classification of electrocardiogram signals

Rahhal MMA, Bazi Y , AlHichri H, Alajlan N, Melgani F, Yager RR. Deep learning approach for active classification of electrocardiogram signals. Inf Sci. 2016;345:340 –54. https://doi.org/10.1016/j.ins.2016.01.082

work page doi:10.1016/j.ins.2016.01.082 2016

[9] [9]

ECG -based multi-class arrhythmia detection using spatio -temporal attention -based convolutional recurrent neural network

Zhang J, Liu A, Gao M, Chen X, Zhang X, Chen X. ECG -based multi-class arrhythmia detection using spatio -temporal attention -based convolutional recurrent neural network. Artif Intell Med. 2020;106:101856. https://doi.org/10.1016/j.artmed.2020.101856

work page doi:10.1016/j.artmed.2020.101856 2020

[10] [10]

12-Lead ECG signal classification for detecting ECG arrhythmia via an information bottleneck -based multi -scale network

Zhang S, Lian C, Xu B, Su Y , Alhudhaif A. 12-Lead ECG signal classification for detecting ECG arrhythmia via an information bottleneck -based multi -scale network. Inf Sci. 2024;662:120239. https://doi.org/10.1016/j.ins.2024.120239

work page doi:10.1016/j.ins.2024.120239 2024

[11] [11]

ECG signal classification based on deep CNN and BiLSTM

Cheng J, Zou Q, Zhao Y . ECG signal classification based on deep CNN and BiLSTM. BMC Med Inform Decis Mak. 2021;21:365. https://doi.org/10.1186/s12911-021-01736-y

work page doi:10.1186/s12911-021-01736-y 2021

[12] [12]

A Deep-Learning Approach to ECG Classification Based on Adversarial Domain Adaptation

Niu L, Chen C, Liu H, Zhou S, Shu M. A Deep-Learning Approach to ECG Classification Based on Adversarial Domain Adaptation. Healthcare. 2020;8:437. https://doi.org/10.3390/healthcare8040437

work page doi:10.3390/healthcare8040437 2020

[13] [13]

Clinically meaningful interpretability of an AI model for ECG classification

Gliner V , Levy I, Tsutsui K, Acha MR, Schliamser J, Schuster A, et al. Clinically meaningful interpretability of an AI model for ECG classification. Npj Digit Med. 2025;8:109. https://doi.org/10.1038/s41746-025-01467-8

work page doi:10.1038/s41746-025-01467-8 2025

[14] [14]

Impact of ECG data format on the performance of machine learning models for the prediction of myocardial infarction

Bellfield RAA, Ortega -Martorell S, Lip GYH, Oxborough D, Olier I. Impact of ECG data format on the performance of machine learning models for the prediction of myocardial infarction. J Electrocardiol. 2024;84:17–26. https://doi.org/10.1016/j.jelectrocard.2024.03.005

work page doi:10.1016/j.jelectrocard.2024.03.005 2024

[15] [15]

FM-ECG: A fine-grained multi-label framework for ECG image classification

Du N, Cao Q, Yu L, Liu N, Zhong E, Liu Z, et al. FM-ECG: A fine-grained multi-label framework for ECG image classification. Inf Sci. 2021;549:164–77. https://doi.org/10.1016/j.ins.2020.10.014

work page doi:10.1016/j.ins.2020.10.014 2021

[16] [16]

Electrocardiogram image classification for six classes of heart diseases

Oke OA, Cavus N. Electrocardiogram image classification for six classes of heart diseases. Iran J Comput Sci. 2025;8:419–39. https://doi.org/10.1007/s42044-025-00227-x

work page doi:10.1007/s42044-025-00227-x 2025

[17] [17]

Enhancing cardiac diagnostics: a deep learning ensemble approach for precise ECG image classification

Alsayat A, Mahmoud AA, Alanazi S, Mostafa AM, Alshammari N, Alrowaily MA, et al. Enhancing cardiac diagnostics: a deep learning ensemble approach for precise ECG image classification. J Big Data. 2025;12:7. https://doi.org/10.1186/s40537-025-01070-4

work page doi:10.1186/s40537-025-01070-4 2025

[18] [18]

Image based deep learning in 12 -lead ECG diagnosis

Ao R, He G. Image based deep learning in 12 -lead ECG diagnosis. Front Artif Intell. 2023;5:1087370. https://doi.org/10.3389/frai.2022.1087370

work page doi:10.3389/frai.2022.1087370 2023

[19] [19]

Leads, Axis, and Acquisition of the 12 -Lead ECG

Dingler A, Vandeventer S, Borkosky J, Henrichs B, McConachie A, Muthersbaugh HC, et al. Leads, Axis, and Acquisition of the 12 -Lead ECG. In: The 12 -Lead ECG in Acute Coronary Syndromes. Elsevier; 2019

2019

[20] [20]

Recommendations for the Standardization and Interpretation of the Electrocardiogram

Kligfield P, Gettes LS, Bailey JJ, Childers R, Deal BJ, Hancock EW, et al. Recommendations for the Standardization and Interpretation of the Electrocardiogram. J Am Coll Cardiol. 2007;49:1109–27. https://doi.org/10.1016/j.jacc.2007.01.024

work page doi:10.1016/j.jacc.2007.01.024 2007

[21] [21]

Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions

Rabbani SA, El-Tanani M, Sharma S, Rabbani SS, El-Tanani Y , Kumar R, et al. Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions. BioMedInformatics. 2025;5:37. https://doi.org/10.3390/biomedinformatics5030037

work page doi:10.3390/biomedinformatics5030037 2025

[22] [22]

Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study

Engelstein H, Ramon-Gonen R, Sabbag A, Klang E, Sudri K, Cohen-Shelly M, et al. Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study. JMIR AI. 2025;4:e74426–e74426. https://doi.org/10.2196/74426

work page doi:10.2196/74426 2025

[23] [23]

ECG-Doctor: An Interpretable Multimodal ECG Diagnosis Framework Based on Large Language Models

Tian D, Jiang J, Zhang K, Liu C, Yuan Y , Gao M, et al. ECG-Doctor: An Interpretable Multimodal ECG Diagnosis Framework Based on Large Language Models. In: Proceedings of the 34th ACM International Conference on Information and Knowledge Management. Seo ul Republic of Korea: ACM; 2025. p. 2863–73. https://doi.org/10.1145/3746252.3761082

work page doi:10.1145/3746252.3761082 2025

[24] [24]

ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text

Yu H, Guo P, Sano A. ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text. 2024. https://doi.org/10.48550/arXiv.2405.19366. Physiology-Aware CNN and Zero-Shot Multimodal LLMs for ECG Image Classification: A Comparative Study 22

work page doi:10.48550/arxiv.2405.19366 2024

[25] [25]

Zero-shot ecg classification with multimodal learning and test-time clinical knowledge enhancement

Liu C, Wan Z, Ouyang C, Shah A, Bai W, Arcucci R. Zero-Shot ECG Classification with Multimodal Learning and Test -time Clinical Knowledge Enhancement. 2024. https://doi.org/10.48550/arXiv.2403.06659

work page doi:10.48550/arxiv.2403.06659 2024

[26] [26]

Zero -Shot ECG Diagnosis with Large Language Models and Retrieval - Augmented Generation

Yu H, Guo P, Sano A. Zero -Shot ECG Diagnosis with Large Language Models and Retrieval - Augmented Generation

[27] [27]

URL https://spj

Yang K, Hong M, Zhang J, Luo Y , Zhao S, Zhang O, et al. ECG -LM: Understanding Electrocardiogram with a Large Language Model. Health Data Sci. 2025;5:0221. https://doi.org/10.34133/hds.0221

work page doi:10.34133/hds.0221 2025

[28] [28]

https://doi.org/10.1007/978-1-4842-4470-8_7

Ekaba B. Google Colaboratory. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform. Berkeley, CA: Apress. https://doi.org/10.1007/978-1-4842-4470-8_7

work page doi:10.1007/978-1-4842-4470-8_7

[29] [29]

A large -scale multi -label 12 -lead electrocardiogram database with standardized diagnostic statements

Liu H, Chen D, Chen D, Zhang X, Li H, Bian L, et al. A large -scale multi -label 12 -lead electrocardiogram database with standardized diagnostic statements. Sci Data. 2022;9:272. https://doi.org/10.1038/s41597-022-01403-5

work page doi:10.1038/s41597-022-01403-5 2022

[30] [30]

PTB -XL, a large publicly available electrocardiography dataset

Wagner P, Strodthoff N, Bousseljot R-D, Kreiseler D, Lunze FI, Samek W, et al. PTB -XL, a large publicly available electrocardiography dataset. Sci Data. 2020;7:154. https://doi.org/10.1038/s41597 - 020-0495-6

work page doi:10.1038/s41597 2020

[31] [31]

CNN -FWS: A Model for the Diagnosis of Normal and Abnormal ECG with Feature Adaptive

Zhu J, Lv J, Kong D. CNN -FWS: A Model for the Diagnosis of Normal and Abnormal ECG with Feature Adaptive. Entropy. 2022;24:471. https://doi.org/10.3390/e24040471

work page doi:10.3390/e24040471 2022

[32] [32]

Deep Residual Learning for Image Recognition

He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. 2015. https://doi.org/10.48550/arXiv.1512.03385

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1512.03385 2015

[33] [33]

Densely Connected Convolutional Networks

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI: IEEE

2017

[34] [34]

p. 2261–9. https://doi.org/10.1109/CVPR.2017.243

work page doi:10.1109/cvpr.2017.243 2017

[35] [35]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition

[36] [36]

https://doi.org/10.48550/arXiv.1409.1556

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1409.1556

[37] [37]

Analysis of an adaptive lead weighted ResNet for multiclass classification of 12 -lead ECGs

Zhao Z, Murphy D, Gifford H, Williams S, Darlington A, Relton SD, et al. Analysis of an adaptive lead weighted ResNet for multiclass classification of 12 -lead ECGs. Physiol Meas. 2022;43:034001. https://doi.org/10.1088/1361-6579/ac5b4a

work page doi:10.1088/1361-6579/ac5b4a 2022

[38] [38]

ECG Heartbeat Classification Based on an Improved ResNet-18 Model

Jing E, Zhang H, Li Z, Liu Y , Ji Z, Ganchev I. ECG Heartbeat Classification Based on an Improved ResNet-18 Model. Comput Math Methods Med. 2021;2021:1 –13. https://doi.org/10.1155/2021/6649970

work page doi:10.1155/2021/6649970 2021

[39] [39]

Automatic varied -length ECG classification using a lightweight DenseNet model

Bui TH, Hoang VM, Pham MT. Automatic varied -length ECG classification using a lightweight DenseNet model. Biomed Signal Process Control. 2023;82:104529. https://doi.org/10.1016/j.bspc.2022.104529

work page doi:10.1016/j.bspc.2022.104529 2023

[40] [40]

Deep convolutional neural networks based ECG beats classification to diagnose cardiovascular conditions

Rashed -Al-Mahfuz Md, Moni MA, Lio’ P, Islam SMS, Berkovsky S, Khushi M, et al. Deep convolutional neural networks based ECG beats classification to diagnose cardiovascular conditions. Biomed Eng Lett. 2021;11:147–62. https://doi.org/10.1007/s13534-021-00185-w

work page doi:10.1007/s13534-021-00185-w 2021

[41] [41]

ImageNet: A large-scale hierarchical image database

Deng J, Dong W, Socher R, Li L-J, Kai Li, Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE

2009

[42] [42]

p. 248–55. https://doi.org/10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009