From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG
Pith reviewed 2026-06-29 19:27 UTC · model grok-4.3
The pith
MAR-ECG aligns 12-lead ECG encoders to a 40-node SNOMED-CT cardiac graph for pretraining without paired reports.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MAR-ECG is a masked autoregressive framework that pretrains an ECG encoder by aligning rhythm-pooled features to a curated 40-node SNOMED-CT cardiac graph. Graph-smoothed contrastive learning relaxes hard negatives according to ontology distance, while multi-scale physiological supervision adds patch-level targets derived directly from the input signal. When pretrained on roughly 40,000 public 12-lead recordings and evaluated by frozen linear probing, the resulting representations outperform a strong signal-only baseline and reach competitive accuracy with state-of-the-art ECG-text multimodal models.
What carries the argument
Graph alignment to the 40-node SNOMED-CT cardiac graph via graph-smoothed contrastive learning (GSCL) that modulates supervision strength by ontology distance.
If this is right
- ECG datasets that carry only diagnostic codes become usable for large-scale pretraining.
- Frozen linear probes reach higher accuracy in the low-label regime on rhythm and morphology tasks.
- Performance matches multimodal ECG-text methods despite the complete absence of paired reports.
- Signal-derived rhythm statistics can serve as free auxiliary targets at multiple timescales.
Where Pith is reading between the lines
- The same graph-alignment recipe could be applied to other time-series biosignals that carry coded annotations.
- A larger or hierarchically deeper ontology might yield finer-grained feature organization if more nodes can be reliably populated.
- The aligned embedding space could support nearest-neighbor retrieval of ECGs sharing clinical concepts without additional training.
Load-bearing premise
Ontology distances in the 40-node SNOMED-CT graph correspond to diagnostically relevant similarities among ECG signals.
What would settle it
Randomly rewiring the 40-node graph while preserving node count and degree sequence, then verifying whether the performance advantage over the masked-autoregressive baseline disappears on the five downstream benchmarks.
read the original abstract
The 12-lead electrocardiogram (ECG) is a quasi-periodic, multi-channel signal with diagnostic content spanning timescales from millisecond waveform morphology to multi-second rhythm dynamics. Existing ECG representation learning relies on signal-only self-supervision or ECG-text multimodal alignment, neither of which exploits the structured diagnostic codes attached to every clinical recording. We present \textbf{MAR-ECG}, an ontology-guided masked autoregressive framework that supervises the encoder with a curated 40-node SNOMED-CT cardiac graph through \emph{graph alignment}, eliminating the need for paired clinical reports. MAR-ECG combines two complementary objectives. First, \emph{graph-smoothed contrastive learning} (GSCL) anchors the encoder's rhythm-pooled features to the SNOMED graph, softening supervision targets by ontology distance so that clinically related concepts reinforce one another rather than function as hard negatives. Second, \emph{multi-scale physiological supervision} complements GSCL with signal-derived patch auxiliaries that target rhythm-physiology statistics extracted automatically from the input, extending supervision beyond the patch tier at no annotation cost. Pretrained on ${\sim}40$K publicly available 12-lead ECGs with SNOMED-CT codes and evaluated by frozen linear probing on five downstream classification benchmarks, MAR-ECG consistently outperforms a strong masked-autoregressive baseline, with mean gains in the low-label regime. Despite the absence of paired clinical text, MAR-ECG achieves performance competitive with state-of-the-art multimodal ECG-text methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MAR-ECG, an ontology-guided masked autoregressive framework for learning representations from 12-lead ECG signals. Supervision comes from a curated 40-node SNOMED-CT cardiac graph via graph alignment and graph-smoothed contrastive learning (GSCL), which softens targets by ontology distance, supplemented by multi-scale physiological supervision from signal statistics. The model is pretrained on approximately 40,000 publicly available ECGs and evaluated using frozen linear probing on five downstream classification benchmarks, claiming consistent outperformance over a strong masked-autoregressive baseline (especially in low-label regimes) and competitiveness with state-of-the-art multimodal ECG-text methods without using paired text.
Significance. If the results hold, the work demonstrates that diagnostic ontologies can provide effective supervision for ECG representation learning in the absence of paired clinical text, achieving gains in low-data regimes and matching multimodal performance. The GSCL approach is a creative way to use ontology structure to avoid hard negatives. The reliance on public data and the frozen linear probing evaluation protocol are strengths that support reproducibility and fair comparison.
major comments (1)
- [Abstract] Abstract: The abstract claims consistent outperformance over the masked-autoregressive baseline and competitiveness with multimodal baselines, yet supplies no numerical results, error bars, dataset splits, or ablation details. This absence means the support for the central empirical claim cannot be assessed from the manuscript as presented.
minor comments (1)
- [Abstract] The abstract would be strengthened by including one or two key quantitative results (e.g., mean AUROC gains) to allow readers to gauge the magnitude of the reported improvements.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the significance and reproducibility aspects of our work. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract claims consistent outperformance over the masked-autoregressive baseline and competitiveness with multimodal baselines, yet supplies no numerical results, error bars, dataset splits, or ablation details. This absence means the support for the central empirical claim cannot be assessed from the manuscript as presented.
Authors: We agree that the abstract would benefit from including key quantitative results to make the central empirical claims more immediately assessable. While the full manuscript reports these details (including dataset splits, linear-probing protocol, and ablation studies) in Sections 4 and 5, we will revise the abstract to incorporate specific metrics such as mean performance gains over the masked-autoregressive baseline in low-label regimes, along with brief mention of the five benchmarks and the public pretraining corpus. This revision will be kept within standard abstract length limits while providing the requested numerical support. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's core supervision derives from an externally curated 40-node SNOMED-CT graph (via GSCL and graph alignment) plus signal-derived rhythm-physiology statistics extracted directly from the input ECGs. These targets are independent of the downstream classification benchmarks used for linear probing evaluation. No equation or claim reduces a prediction to a fitted parameter on the evaluation labels, and no load-bearing step relies on self-citation chains or imported uniqueness theorems. The reported gains are presented as empirical results rather than derived by construction from the inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Distances in the curated 40-node SNOMED-CT cardiac graph can be used to soften contrastive targets so that clinically related concepts reinforce rather than repel each other
Reference graph
Works this paper leans on
-
[1]
Knowledge-enhanced multimodal ECG representation learning with arbitrary-lead inputs
Che Liu, Cheng Ouyang, Zhongwei Wan, Haozhe Wang, Wenjia Bai, and Rossella Arcucci. Knowledge-enhanced multimodal ECG representation learning with arbitrary-lead inputs. In Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, Novem- ber
2025
-
[2]
doi: 10.1166/jmihi. 2018.2442. Dominique Makowski, Tam Pham, Zen J. Lau, Jan C. Brammer, François Lespinasse, Hung Pham, Christopher Schölzel, and S. H. Annabel Chen. NeuroKit2: A Python toolbox for neurophysiologi- cal signal processing.Behavior Research Methods, 53(4):1689–1696, February
-
[3]
Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020.Physiological Measurement, 41(12):124003,
Erick Andres Perez Alday, Annie Gu, Amit J Shah, Chad Robichaux, An-Kwok Ian Wong, Chengyu Liu, Feifei Liu, Ali Bahrami Rad, Andoni Elola, Salman Seyedi, Qiao Li, Ashish Sharma, Gari D Clifford, and Matthew A Reyna. Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020.Physiological Measurement, 41(12):124003,
2020
-
[4]
Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I
doi: 10.1109/JBHI.2020.3022989. Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I. Lunze, Wojciech Samek, and Tobias Schaeffter. PTB-XL, a large publicly available electrocardiography dataset. Scientific Data, 7(1):154,
-
[5]
doi: 10.1038/s41597-020-0495-6. Zhongwei Wan, Che Liu, Mi Zhang, Jie Fu, Benyou Wang, Sibo Cheng, Lei Ma, César Quilodrán- Casas, and Rossella Arcucci. Med-UniC: Unifying cross-lingual medical vision-language pre- training by diminishing bias. InAdvances in Neural Information Processing Systems (NeurIPS),
-
[6]
Parent-node filtering
34 Normal sinus rhythm NSR Normal 35 Early repolarisation EarlyR Normal 36 Sinus arrhythmia SinusA Normal •D= 4: a leaf in the most distant root family — two ring hops plus one parent edge on either side. Sub-hierarchy and sibling shortcuts compress some pairs that would otherwise reside at the larger distance class; for example, anterior MI (25) and acut...
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.