Recognition: unknown
CoRe-ECG: Advancing Self-Supervised Representation Learning for 12-Lead ECG via Contrastive and Reconstructive Synergy
Pith reviewed 2026-05-10 15:33 UTC · model grok-4.3
The pith
CoRe-ECG unifies contrastive and reconstructive pretraining to learn stronger representations from unlabeled 12-lead ECG signals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CoRe-ECG is a unified contrastive and reconstructive pretraining paradigm that establishes a synergistic interaction between global semantic modeling and local structural learning for 12-lead ECG signals. It aligns global representations during reconstruction so that instance-level discriminative signals guide local waveform recovery. Frequency Dynamic Augmentation adaptively perturbs signals according to frequency-domain importance, while Spatio-Temporal Dual Masking breaks linear dependencies across leads to raise reconstructive task difficulty. The resulting framework achieves state-of-the-art performance across multiple downstream ECG datasets.
What carries the argument
The alignment of global representations during local waveform reconstruction, enabled by Frequency Dynamic Augmentation and Spatio-Temporal Dual Masking.
Load-bearing premise
That the interaction between global contrastive signals and local reconstruction, plus the new augmentations and masking, produces representations that are physiologically faithful and more useful than isolated contrastive or reconstructive methods without creating new shortcuts.
What would settle it
A fair comparison in which a contrastive-only or reconstructive-only baseline, trained on the same unlabeled ECG data with matched compute, reaches equal or higher accuracy on the same set of downstream classification and regression tasks.
Figures
read the original abstract
Accurate interpretation of electrocardiogram (ECG) remains challenging due to the scarcity of labeled data and the high cost of expert annotation. Self-supervised learning (SSL) offers a promising solution by enabling models to learn expressive representations from unlabeled signals. Existing ECG SSL methods typically rely on either contrastive learning or reconstructive learning. However, each approach in isolation provides limited supervisory signals and suffers from additional limitations, including non-physiological distortions introduced by naive augmentations and trivial correlations across multiple leads that models may exploit as shortcuts. In this work, we propose CoRe-ECG, a unified contrastive and reconstructive pretraining paradigm that establishes a synergistic interaction between global semantic modeling and local structural learning. CoRe-ECG aligns global representations during reconstruction, enabling instance-level discriminative signals to guide local waveform recovery. To further enhance pretraining, we introduce Frequency Dynamic Augmentation (FDA) to adaptively perturb ECG signals based on their frequency-domain importance, and Spatio-Temporal Dual Masking (STDM) to break linear dependencies across leads, increasing the difficulty of reconstructive tasks. Our method achieves state-of-the-art performance across multiple downstream ECG datasets. Ablation studies further demonstrate the necessity and complementarity of each component. This approach provides a robust and physiologically meaningful representation learning framework for ECG analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CoRe-ECG, a unified self-supervised pretraining framework for 12-lead ECG that combines contrastive and reconstructive learning to create synergistic global semantic and local structural signals. It introduces Frequency Dynamic Augmentation (FDA) for frequency-aware perturbations and Spatio-Temporal Dual Masking (STDM) to disrupt lead-wise correlations, claiming these address shortcuts in prior SSL methods and yield state-of-the-art performance on downstream ECG tasks, with ablations confirming component necessity and complementarity.
Significance. If the SOTA results and ablation controls hold under fair, reproducible conditions, the work would advance SSL for physiological signals by showing how hybrid objectives plus targeted augmentations can produce more discriminative yet faithful representations than isolated contrastive or reconstructive baselines, with direct relevance to label-scarce medical domains.
major comments (2)
- [§4] §4 (Experiments): The abstract asserts SOTA performance across multiple downstream datasets, yet no quantitative metrics, baseline comparisons, or statistical tests appear to support this; explicit tables with effect sizes and controls for hyperparameter fairness are required to substantiate the central claim.
- [§4.3] §4.3 (Ablations): The necessity and complementarity of FDA, STDM, and the global-local alignment are asserted, but the studies must include controls (e.g., equivalent-strength random masking or non-adaptive augmentation) to isolate whether gains derive from the claimed synergistic interaction rather than increased task difficulty alone.
minor comments (2)
- [Abstract] Abstract: The specific downstream ECG datasets used for SOTA evaluation should be named to provide immediate context.
- [§3.2] §3.2: The mathematical formulation of FDA (frequency-domain importance weighting) would benefit from an explicit equation to clarify the adaptive perturbation process.
Simulated Author's Rebuttal
We are grateful to the referee for the thorough review and valuable suggestions. We will revise the manuscript to address the concerns raised in the major comments.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The abstract asserts SOTA performance across multiple downstream datasets, yet no quantitative metrics, baseline comparisons, or statistical tests appear to support this; explicit tables with effect sizes and controls for hyperparameter fairness are required to substantiate the central claim.
Authors: Thank you for pointing this out. We will revise §4 to include explicit tables with quantitative metrics, baseline comparisons, effect sizes, and statistical tests. We will also add a discussion on hyperparameter fairness to substantiate the SOTA claims. revision: yes
-
Referee: [§4.3] §4.3 (Ablations): The necessity and complementarity of FDA, STDM, and the global-local alignment are asserted, but the studies must include controls (e.g., equivalent-strength random masking or non-adaptive augmentation) to isolate whether gains derive from the claimed synergistic interaction rather than increased task difficulty alone.
Authors: We will enhance the ablation studies in §4.3 by including the suggested controls, such as equivalent-strength random masking and non-adaptive augmentations, to better isolate the contributions of our proposed components. revision: yes
Circularity Check
No circularity: empirical method with external validation
full rationale
The paper introduces a new SSL framework (CoRe-ECG) with FDA and STDM components for ECG representation learning. Its central claims rest on experimental SOTA results and ablation studies across downstream datasets, which serve as independent external benchmarks rather than self-referential definitions or predictions. No mathematical derivation chain exists that reduces any result to its own inputs by construction; there are no equations defining quantities in terms of themselves, no fitted parameters renamed as predictions, and no load-bearing self-citations that close a loop. The synergy between contrastive and reconstructive elements is presented as a design choice validated empirically, not derived tautologically. This is a standard empirical ML paper whose validity hinges on reproducibility of experiments, not on internal circular logic.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Existing ECG SSL methods provide limited supervisory signals and suffer from non-physiological distortions or lead-correlation shortcuts.
- ad hoc to paper Aligning global representations during reconstruction creates synergistic instance-level discriminative signals for local recovery.
invented entities (2)
-
Frequency Dynamic Augmentation (FDA)
no independent evidence
-
Spatio-Temporal Dual Masking (STDM)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Ansari, O
Y . Ansari, O. Mourad, K. Qaraqe, E. Serpedin, Deep learning for ECG arrhyth- mia detection and classification: An overview of progress for period 2017–2023, Frontiers in Physiology 14 (2023) 1246746
2017
-
[2]
Hammad, R
M. Hammad, R. N. V . P. S. Kandala, A. Abdelatey, M. Abdar, M. Zomorodi- Moghadam, R. San Tan, U. R. Acharya, J. Pławiak, R. Tadeusiewicz, V . Makarenkov, et al., Automated detection of shockable ECG signals: A review, Information Sciences 571 (2021) 580–604
2021
-
[3]
Nezamabadi, N
K. Nezamabadi, N. Sardaripour, B. Haghi, M. Forouzanfar, Unsupervised ECG analysis: A review, IEEE Reviews in Biomedical Engineering 16 (2022) 208– 224
2022
-
[4]
Alamatsaz, L
N. Alamatsaz, L. Tabatabaei, M. Yazdchi, H. Payan, N. Alamatsaz, F. Nasimi, A lightweight hybrid CNN–LSTM explainable model for ECG-based arrhythmia detection, Biomedical Signal Processing and Control 90 (2024) 105884
2024
-
[5]
M. S. Islam, K. F. Hasan, S. Sultana, S. Uddin, J. M. W. Quinn, M. A. Moni, et al., HARDC: A novel ECG-based heartbeat classification method using hierarchical attention-based dual structured RNN with dilated CNN, Neural Networks 162 (2023) 271–287
2023
-
[6]
C. Ji, L. Wang, J. Qin, L. Liu, Y . Han, Z. Wang, MSGformer: A multi-scale grid transformer network for 12-lead ECG arrhythmia detection, Biomedical Signal Processing and Control 87 (2024) 105499
2024
-
[7]
C. L. Liu, B. Xiao, C. H. Hsieh, Multimodal fusion of spatial–temporal and fre- quency representations for enhanced ECG classification, Information Fusion 118 (2025) 102999
2025
-
[8]
Devlin, M.-W
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidi- rectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171–4186. 29
2019
-
[9]
T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for con- trastive learning of visual representations, in: International Conference on Ma- chine Learning, 2020, pp. 1597–1607
2020
-
[10]
K. He, X. Chen, S. Xie, Y . Li, P. Dollár, R. Girshick, Masked autoencoders are scalable vision learners, in: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2022, pp. 16000–16009
2022
-
[11]
Radford, J
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in: International Conference on Machine Learning, 2021, pp. 8748–8763
2021
- [12]
-
[13]
Zhang, Y
S. Zhang, Y . Du, W. Wang, X. He, F. Cui, L. Zhao, B. Wang, Z. Hu, Z. Wang, Q. Xia, et al., ECGFM: A foundation model for ECG analysis trained on a multi- center million-ECG dataset, Information Fusion 124 (2025) 103363
2025
-
[14]
Y . Wang, Y . Han, H. Wang, X. Zhang, Contrast everything: A hierarchical con- trastive framework for medical time-series, Advances in Neural Information Pro- cessing Systems 36 (2023) 55694–55717
2023
-
[15]
K. He, H. Fan, Y . Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738
2020
-
[16]
Kiyasseh, T
D. Kiyasseh, T. Zhu, D. A. Clifton, Clocs: Contrastive learning of cardiac sig- nals across space, time, and patients, in: International Conference on Machine Learning, 2021, pp. 5606–5615
2021
-
[17]
W. Liu, S. Pan, Z. Li, S. Chang, Q. Huang, N. Jiang, Lead-fusion Barlow twins: A fused self-supervised learning method for multi-lead electrocardiograms, In- formation Fusion 114 (2025) 102698. 30
2025
-
[18]
Zhang, S
W. Zhang, S. Geng, S. Hong, A simple self-supervised ECG representation learn- ing method via manipulated temporal–spatial reverse detection, Biomedical Sig- nal Processing and Control 79 (2023) 104194
2023
-
[19]
W. Liu, H. Zhang, S. Chang, H. Wang, J. He, Q. Huang, Learning representa- tions for multilead electrocardiograms from morphology–rhythm contrast, IEEE Transactions on Instrumentation and Measurement 73 (2024) 1–15
2024
-
[20]
J. Oh, H. Chung, J. M. Kwon, D. G. Hong, E. Choi, Lead-agnostic self-supervised learning for local and global representations of electrocardiogram, in: Conference on Health, Inference, and Learning, PMLR, 2022, pp. 338–353
2022
-
[21]
Zhang, W
H. Zhang, W. Liu, J. Shi, S. Chang, H. Wang, J. He, Q. Huang, Maefe: Masked autoencoders family of electrocardiogram for self-supervised pretraining and transfer learning, IEEE Transactions on Instrumentation and Measurement 72 (2022) 1–15
2022
-
[22]
R. Hu, J. Chen, L. Zhou, Spatiotemporal self-supervised representation learn- ing from multi-lead ECG signals, Biomedical Signal Processing and Control 84 (2023) 104772
2023
-
[23]
Y . Na, M. Park, Y . Tae, S. Joo, Guiding masked representation learning to capture spatio-temporal relationship of electrocardiogram, in: International Conference on Learning Representations, 2024
2024
-
[24]
Y . Wei, C. Lian, B. Xu, P. Zhao, H. Yang, Z. Zeng, Bimodal masked autoencoders with internal representation connections for electrocardiogram classification, Pat- tern Recognition 161 (2025) 111311
2025
-
[25]
J. Jin, H. Wang, H. Li, J. Li, J. Pan, S. Hong, Reading your heart: Learning ECG words and sentences via pre-training ECG language model, in: International Conference on Learning Representations, 2025
2025
-
[26]
H. Qiu, J. Huang, P. Gao, L. Lu, X. Zhang, S. Lu, Masked AutoDecoder is effec- tive multi-task vision generalist, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14152–14161. 31
2024
-
[27]
Hondru, F
V . Hondru, F. A. Croitoru, S. Minaee, R. T. Ionescu, N. Sebe, Masked image modeling: A survey, International Journal of Computer Vision 133 (2025) 7154– 7200
2025
-
[28]
B. Gow, T. Pollard, L. A. Nathanson, A. Johnson, B. Moody, C. Fernandes, N. Greenbaum, J. W. Waks, P. Eslami, T. Carbonati, A. Chaudhari, E. Herbst, D. Moukheiber, S. Berkowitz, R. Mark, S. Horng, MIMIC-IV-ECG: Diagnostic electrocardiogram matched subset (version 1.0), PhysioNet, 2023
2023
-
[29]
Wagner, N
P. Wagner, N. Strodthoff, R. D. Bousseljot, D. Kreiseler, F. I. Lunze, W. Samek, T. Schaeffter, PTB-XL, a large publicly available electrocardiography dataset, Scientific Data 7 (1) (2020) 1–15
2020
-
[30]
F. Liu, C. Liu, L. Zhao, X. Zhang, X. Wu, X. Xu, Y . Liu, C. Ma, S. Wei, Z. He, J. Li, E. N. Y . K. Ng, An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection, Journal of Medical Imaging and Health Informatics 8 (7) (2018) 1368–1373
2018
-
[31]
Zheng, H
J. Zheng, H. Chu, D. Struppa, J. Zhang, S. M. Yacoub, H. El-Askary, A. Chang, L. Ehwerhemuepha, I. Abudayyeh, A. Barrett, et al., Optimal multi-stage arrhyth- mia classification approach, Scientific Reports 10 (1) (2020) 2898
2020
-
[32]
M. A. Reyna, N. Sadr, E. A. P. Alday, A. Gu, A. J. Shah, C. Robichaux, G. D. Clifford, Will two do? Varying dimensions in electrocardiography: the PhysioNet/Computing in Cardiology Challenge 2021, in: 2021 Computing in Cardiology (CinC), V ol. 48, IEEE, 2021, pp. 1–4
2021
-
[33]
Grill, F
J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Do- ersch, B. A. Pires, Z. Guo, M. G. Azar, et al., Bootstrap your own latent: A new approach to self-supervised learning, in: Advances in Neural Information Processing Systems, 2020, pp. 21271–21284
2020
-
[34]
Ghafari, N
A. Ghafari, N. Pourjafari, and A. Ghaffari, Vector-based postprocessing method for improving ECG denoising techniques by re-establishing lead relationships, IEEE Transactions on Instrumentation and Measurement 73 (2023) 1–9. 32
2023
-
[35]
Y . Ge, H. Zhang, J. Shi, D. Luo, S. Chang, J. He, et al., JAMC: A jigsaw-based au- toencoder with masked contrastive learning for cardiovascular disease diagnosis, Knowledge-Based Systems 311 (2025) 113090
2025
-
[36]
W. Chen, H. Wang, L. Zhang, M. Zhang, Temporal and spatial self-supervised learning methods for electrocardiograms, Scientific Reports 15 (1) (2025) 6029
2025
-
[37]
T. Qiu, Y . Xie, H. Niu, Y . Xiong, X. Gao, Enhancing masked time-series model- ing via dropping patches, in: Proceedings of the AAAI Conference on Artificial Intelligence, V ol. 39, No. 19, 2025, pp. 20077–20085. 33
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.