arxiv: 2604.11359 · v1 · submitted 2026-04-13 · 💻 cs.AI · cs.LG

Recognition: unknown

CoRe-ECG: Advancing Self-Supervised Representation Learning for 12-Lead ECG via Contrastive and Reconstructive Synergy

Zehao Qin , Xiaojian Lin , Ping Zhang , Hongliang Wu , Xinkang Wang , Guangling Liu , Bo Chen , Wenming Yang

show 1 more author

Guijin Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:33 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords self-supervised learningECGcontrastive learningreconstructive learning12-lead ECGrepresentation learningpretrainingdata augmentation

0 comments

The pith

CoRe-ECG unifies contrastive and reconstructive pretraining to learn stronger representations from unlabeled 12-lead ECG signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CoRe-ECG, a pretraining method that links contrastive learning for instance-level discrimination with reconstructive learning for waveform detail recovery. It adds frequency-based adaptive perturbations and dual masking across space and time to avoid easy shortcuts from lead correlations or non-physiological changes. This matters because labeled ECG data is scarce and costly to obtain, so stronger unsupervised representations can support more accurate analysis on tasks such as arrhythmia detection and other clinical classifications. The design produces representations that outperform separate contrastive or reconstructive baselines on multiple downstream datasets.

Core claim

CoRe-ECG is a unified contrastive and reconstructive pretraining paradigm that establishes a synergistic interaction between global semantic modeling and local structural learning for 12-lead ECG signals. It aligns global representations during reconstruction so that instance-level discriminative signals guide local waveform recovery. Frequency Dynamic Augmentation adaptively perturbs signals according to frequency-domain importance, while Spatio-Temporal Dual Masking breaks linear dependencies across leads to raise reconstructive task difficulty. The resulting framework achieves state-of-the-art performance across multiple downstream ECG datasets.

What carries the argument

The alignment of global representations during local waveform reconstruction, enabled by Frequency Dynamic Augmentation and Spatio-Temporal Dual Masking.

Load-bearing premise

That the interaction between global contrastive signals and local reconstruction, plus the new augmentations and masking, produces representations that are physiologically faithful and more useful than isolated contrastive or reconstructive methods without creating new shortcuts.

What would settle it

A fair comparison in which a contrastive-only or reconstructive-only baseline, trained on the same unlabeled ECG data with matched compute, reaches equal or higher accuracy on the same set of downstream classification and regression tasks.

Figures

Figures reproduced from arXiv: 2604.11359 by Bo Chen, Guangling Liu, Guijin Wang, Hongliang Wu, Ping Zhang, Wenming Yang, Xiaojian Lin, Xinkang Wang, Zehao Qin.

**Figure 2.** Figure 2: Overview of the CoRe-ECG architecture, which consists of two parallel branches: the Contrastive [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: The structure of Spatio-Temporal Dual Masking (STDM). [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of Frequency Dynamic Augmentation (FDA). By modulating frequency components [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Schematic illustration of the downstream fine-tuning phase. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Sensitivity analysis of the Time Full Mask Rate ( [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Visual comparison of reconstruction fidelity between ST-MEM (a) and our method (b). Red lines [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: T-SNE visualization of the learned representations fine-tuned on two datasets. ICBEB2018 with 9 [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

read the original abstract

Accurate interpretation of electrocardiogram (ECG) remains challenging due to the scarcity of labeled data and the high cost of expert annotation. Self-supervised learning (SSL) offers a promising solution by enabling models to learn expressive representations from unlabeled signals. Existing ECG SSL methods typically rely on either contrastive learning or reconstructive learning. However, each approach in isolation provides limited supervisory signals and suffers from additional limitations, including non-physiological distortions introduced by naive augmentations and trivial correlations across multiple leads that models may exploit as shortcuts. In this work, we propose CoRe-ECG, a unified contrastive and reconstructive pretraining paradigm that establishes a synergistic interaction between global semantic modeling and local structural learning. CoRe-ECG aligns global representations during reconstruction, enabling instance-level discriminative signals to guide local waveform recovery. To further enhance pretraining, we introduce Frequency Dynamic Augmentation (FDA) to adaptively perturb ECG signals based on their frequency-domain importance, and Spatio-Temporal Dual Masking (STDM) to break linear dependencies across leads, increasing the difficulty of reconstructive tasks. Our method achieves state-of-the-art performance across multiple downstream ECG datasets. Ablation studies further demonstrate the necessity and complementarity of each component. This approach provides a robust and physiologically meaningful representation learning framework for ECG analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes CoRe-ECG, a unified self-supervised pretraining framework for 12-lead ECG that combines contrastive and reconstructive learning to create synergistic global semantic and local structural signals. It introduces Frequency Dynamic Augmentation (FDA) for frequency-aware perturbations and Spatio-Temporal Dual Masking (STDM) to disrupt lead-wise correlations, claiming these address shortcuts in prior SSL methods and yield state-of-the-art performance on downstream ECG tasks, with ablations confirming component necessity and complementarity.

Significance. If the SOTA results and ablation controls hold under fair, reproducible conditions, the work would advance SSL for physiological signals by showing how hybrid objectives plus targeted augmentations can produce more discriminative yet faithful representations than isolated contrastive or reconstructive baselines, with direct relevance to label-scarce medical domains.

major comments (2)

[§4] §4 (Experiments): The abstract asserts SOTA performance across multiple downstream datasets, yet no quantitative metrics, baseline comparisons, or statistical tests appear to support this; explicit tables with effect sizes and controls for hyperparameter fairness are required to substantiate the central claim.
[§4.3] §4.3 (Ablations): The necessity and complementarity of FDA, STDM, and the global-local alignment are asserted, but the studies must include controls (e.g., equivalent-strength random masking or non-adaptive augmentation) to isolate whether gains derive from the claimed synergistic interaction rather than increased task difficulty alone.

minor comments (2)

[Abstract] Abstract: The specific downstream ECG datasets used for SOTA evaluation should be named to provide immediate context.
[§3.2] §3.2: The mathematical formulation of FDA (frequency-domain importance weighting) would benefit from an explicit equation to clarify the adaptive perturbation process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable suggestions. We will revise the manuscript to address the concerns raised in the major comments.

read point-by-point responses

Referee: [§4] §4 (Experiments): The abstract asserts SOTA performance across multiple downstream datasets, yet no quantitative metrics, baseline comparisons, or statistical tests appear to support this; explicit tables with effect sizes and controls for hyperparameter fairness are required to substantiate the central claim.

Authors: Thank you for pointing this out. We will revise §4 to include explicit tables with quantitative metrics, baseline comparisons, effect sizes, and statistical tests. We will also add a discussion on hyperparameter fairness to substantiate the SOTA claims. revision: yes
Referee: [§4.3] §4.3 (Ablations): The necessity and complementarity of FDA, STDM, and the global-local alignment are asserted, but the studies must include controls (e.g., equivalent-strength random masking or non-adaptive augmentation) to isolate whether gains derive from the claimed synergistic interaction rather than increased task difficulty alone.

Authors: We will enhance the ablation studies in §4.3 by including the suggested controls, such as equivalent-strength random masking and non-adaptive augmentations, to better isolate the contributions of our proposed components. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with external validation

full rationale

The paper introduces a new SSL framework (CoRe-ECG) with FDA and STDM components for ECG representation learning. Its central claims rest on experimental SOTA results and ablation studies across downstream datasets, which serve as independent external benchmarks rather than self-referential definitions or predictions. No mathematical derivation chain exists that reduces any result to its own inputs by construction; there are no equations defining quantities in terms of themselves, no fitted parameters renamed as predictions, and no load-bearing self-citations that close a loop. The synergy between contrastive and reconstructive elements is presented as a design choice validated empirically, not derived tautologically. This is a standard empirical ML paper whose validity hinges on reproducibility of experiments, not on internal circular logic.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Abstract-only review; central claim rests on domain assumptions about SSL benefits and the value of the proposed synergy and augmentations, with two new techniques introduced without independent evidence.

axioms (2)

domain assumption Existing ECG SSL methods provide limited supervisory signals and suffer from non-physiological distortions or lead-correlation shortcuts.
Explicitly stated as motivation in the abstract.
ad hoc to paper Aligning global representations during reconstruction creates synergistic instance-level discriminative signals for local recovery.
Core hypothesis of the CoRe-ECG paradigm.

invented entities (2)

Frequency Dynamic Augmentation (FDA) no independent evidence
purpose: Adaptively perturb ECG signals based on frequency-domain importance to enhance pretraining.
New augmentation technique introduced to address limitations of naive augmentations.
Spatio-Temporal Dual Masking (STDM) no independent evidence
purpose: Break linear dependencies across leads to increase reconstructive task difficulty.
New masking strategy proposed to prevent shortcut learning.

pith-pipeline@v0.9.0 · 5565 in / 1444 out tokens · 77206 ms · 2026-05-10T15:33:26.238575+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 1 canonical work pages

[1]

Ansari, O

Y . Ansari, O. Mourad, K. Qaraqe, E. Serpedin, Deep learning for ECG arrhyth- mia detection and classification: An overview of progress for period 2017–2023, Frontiers in Physiology 14 (2023) 1246746

2017
[2]

Hammad, R

M. Hammad, R. N. V . P. S. Kandala, A. Abdelatey, M. Abdar, M. Zomorodi- Moghadam, R. San Tan, U. R. Acharya, J. Pławiak, R. Tadeusiewicz, V . Makarenkov, et al., Automated detection of shockable ECG signals: A review, Information Sciences 571 (2021) 580–604

2021
[3]

Nezamabadi, N

K. Nezamabadi, N. Sardaripour, B. Haghi, M. Forouzanfar, Unsupervised ECG analysis: A review, IEEE Reviews in Biomedical Engineering 16 (2022) 208– 224

2022
[4]

Alamatsaz, L

N. Alamatsaz, L. Tabatabaei, M. Yazdchi, H. Payan, N. Alamatsaz, F. Nasimi, A lightweight hybrid CNN–LSTM explainable model for ECG-based arrhythmia detection, Biomedical Signal Processing and Control 90 (2024) 105884

2024
[5]

M. S. Islam, K. F. Hasan, S. Sultana, S. Uddin, J. M. W. Quinn, M. A. Moni, et al., HARDC: A novel ECG-based heartbeat classification method using hierarchical attention-based dual structured RNN with dilated CNN, Neural Networks 162 (2023) 271–287

2023
[6]

C. Ji, L. Wang, J. Qin, L. Liu, Y . Han, Z. Wang, MSGformer: A multi-scale grid transformer network for 12-lead ECG arrhythmia detection, Biomedical Signal Processing and Control 87 (2024) 105499

2024
[7]

C. L. Liu, B. Xiao, C. H. Hsieh, Multimodal fusion of spatial–temporal and fre- quency representations for enhanced ECG classification, Information Fusion 118 (2025) 102999

2025
[8]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidi- rectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171–4186. 29

2019
[9]

T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for con- trastive learning of visual representations, in: International Conference on Ma- chine Learning, 2020, pp. 1597–1607

2020
[10]

K. He, X. Chen, S. Xie, Y . Li, P. Dollár, R. Girshick, Masked autoencoders are scalable vision learners, in: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2022, pp. 16000–16009

2022
[11]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in: International Conference on Machine Learning, 2021, pp. 8748–8763

2021
[12]

M. A. Al-Masud, J. M. L. Alcaraz, N. Strodthoff, Benchmarking ECG foundational models: A reality check across clinical tasks, arXiv preprint arXiv:2509.25095 (2025)

work page arXiv 2025
[13]

Zhang, Y

S. Zhang, Y . Du, W. Wang, X. He, F. Cui, L. Zhao, B. Wang, Z. Hu, Z. Wang, Q. Xia, et al., ECGFM: A foundation model for ECG analysis trained on a multi- center million-ECG dataset, Information Fusion 124 (2025) 103363

2025
[14]

Y . Wang, Y . Han, H. Wang, X. Zhang, Contrast everything: A hierarchical con- trastive framework for medical time-series, Advances in Neural Information Pro- cessing Systems 36 (2023) 55694–55717

2023
[15]

K. He, H. Fan, Y . Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738

2020
[16]

Kiyasseh, T

D. Kiyasseh, T. Zhu, D. A. Clifton, Clocs: Contrastive learning of cardiac sig- nals across space, time, and patients, in: International Conference on Machine Learning, 2021, pp. 5606–5615

2021
[17]

W. Liu, S. Pan, Z. Li, S. Chang, Q. Huang, N. Jiang, Lead-fusion Barlow twins: A fused self-supervised learning method for multi-lead electrocardiograms, In- formation Fusion 114 (2025) 102698. 30

2025
[18]

Zhang, S

W. Zhang, S. Geng, S. Hong, A simple self-supervised ECG representation learn- ing method via manipulated temporal–spatial reverse detection, Biomedical Sig- nal Processing and Control 79 (2023) 104194

2023
[19]

W. Liu, H. Zhang, S. Chang, H. Wang, J. He, Q. Huang, Learning representa- tions for multilead electrocardiograms from morphology–rhythm contrast, IEEE Transactions on Instrumentation and Measurement 73 (2024) 1–15

2024
[20]

J. Oh, H. Chung, J. M. Kwon, D. G. Hong, E. Choi, Lead-agnostic self-supervised learning for local and global representations of electrocardiogram, in: Conference on Health, Inference, and Learning, PMLR, 2022, pp. 338–353

2022
[21]

Zhang, W

H. Zhang, W. Liu, J. Shi, S. Chang, H. Wang, J. He, Q. Huang, Maefe: Masked autoencoders family of electrocardiogram for self-supervised pretraining and transfer learning, IEEE Transactions on Instrumentation and Measurement 72 (2022) 1–15

2022
[22]

R. Hu, J. Chen, L. Zhou, Spatiotemporal self-supervised representation learn- ing from multi-lead ECG signals, Biomedical Signal Processing and Control 84 (2023) 104772

2023
[23]

Y . Na, M. Park, Y . Tae, S. Joo, Guiding masked representation learning to capture spatio-temporal relationship of electrocardiogram, in: International Conference on Learning Representations, 2024

2024
[24]

Y . Wei, C. Lian, B. Xu, P. Zhao, H. Yang, Z. Zeng, Bimodal masked autoencoders with internal representation connections for electrocardiogram classification, Pat- tern Recognition 161 (2025) 111311

2025
[25]

J. Jin, H. Wang, H. Li, J. Li, J. Pan, S. Hong, Reading your heart: Learning ECG words and sentences via pre-training ECG language model, in: International Conference on Learning Representations, 2025

2025
[26]

H. Qiu, J. Huang, P. Gao, L. Lu, X. Zhang, S. Lu, Masked AutoDecoder is effec- tive multi-task vision generalist, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14152–14161. 31

2024
[27]

Hondru, F

V . Hondru, F. A. Croitoru, S. Minaee, R. T. Ionescu, N. Sebe, Masked image modeling: A survey, International Journal of Computer Vision 133 (2025) 7154– 7200

2025
[28]

B. Gow, T. Pollard, L. A. Nathanson, A. Johnson, B. Moody, C. Fernandes, N. Greenbaum, J. W. Waks, P. Eslami, T. Carbonati, A. Chaudhari, E. Herbst, D. Moukheiber, S. Berkowitz, R. Mark, S. Horng, MIMIC-IV-ECG: Diagnostic electrocardiogram matched subset (version 1.0), PhysioNet, 2023

2023
[29]

Wagner, N

P. Wagner, N. Strodthoff, R. D. Bousseljot, D. Kreiseler, F. I. Lunze, W. Samek, T. Schaeffter, PTB-XL, a large publicly available electrocardiography dataset, Scientific Data 7 (1) (2020) 1–15

2020
[30]

F. Liu, C. Liu, L. Zhao, X. Zhang, X. Wu, X. Xu, Y . Liu, C. Ma, S. Wei, Z. He, J. Li, E. N. Y . K. Ng, An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection, Journal of Medical Imaging and Health Informatics 8 (7) (2018) 1368–1373

2018
[31]

Zheng, H

J. Zheng, H. Chu, D. Struppa, J. Zhang, S. M. Yacoub, H. El-Askary, A. Chang, L. Ehwerhemuepha, I. Abudayyeh, A. Barrett, et al., Optimal multi-stage arrhyth- mia classification approach, Scientific Reports 10 (1) (2020) 2898

2020
[32]

M. A. Reyna, N. Sadr, E. A. P. Alday, A. Gu, A. J. Shah, C. Robichaux, G. D. Clifford, Will two do? Varying dimensions in electrocardiography: the PhysioNet/Computing in Cardiology Challenge 2021, in: 2021 Computing in Cardiology (CinC), V ol. 48, IEEE, 2021, pp. 1–4

2021
[33]

Grill, F

J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Do- ersch, B. A. Pires, Z. Guo, M. G. Azar, et al., Bootstrap your own latent: A new approach to self-supervised learning, in: Advances in Neural Information Processing Systems, 2020, pp. 21271–21284

2020
[34]

Ghafari, N

A. Ghafari, N. Pourjafari, and A. Ghaffari, Vector-based postprocessing method for improving ECG denoising techniques by re-establishing lead relationships, IEEE Transactions on Instrumentation and Measurement 73 (2023) 1–9. 32

2023
[35]

Y . Ge, H. Zhang, J. Shi, D. Luo, S. Chang, J. He, et al., JAMC: A jigsaw-based au- toencoder with masked contrastive learning for cardiovascular disease diagnosis, Knowledge-Based Systems 311 (2025) 113090

2025
[36]

W. Chen, H. Wang, L. Zhang, M. Zhang, Temporal and spatial self-supervised learning methods for electrocardiograms, Scientific Reports 15 (1) (2025) 6029

2025
[37]

T. Qiu, Y . Xie, H. Niu, Y . Xiong, X. Gao, Enhancing masked time-series model- ing via dropping patches, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 39, No. 19, 2025, pp. 20077–20085. 33

2025