Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons
Pith reviewed 2026-05-19 20:59 UTC · model grok-4.3
The pith
A lightweight plug-in module guided by a frozen 10-second ECG model can process longer and variable-length recordings without retraining the backbone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By introducing a lightweight plug-in module that receives guidance from a frozen pretrained 10-second ECG foundation model, the approach achieves both structurally compatible long-sequence processing and semantically informed temporal modeling, enabling effective handling of variable-length ECG inputs without any retraining of the original backbone.
What carries the argument
lightweight plug-in module guided by a frozen pretrained 10-second model for temporal aggregation
If this is right
- The same plug-in works across multiple long-horizon ECG tasks and datasets without changing the frozen backbone.
- It consistently beats sliding-window and pooling baselines while adding only a small number of parameters.
- Variable-length recordings can be processed at inference time once the lightweight module is attached.
Where Pith is reading between the lines
- Similar guided plug-ins could be tested on other medical time-series models that start from short fixed-length pretraining.
- Continuous patient monitoring systems might use this pattern to avoid periodic full retraining when signal lengths change.
- The approach leaves open whether the semantic guidance remains effective for rare events that appear only in very extended recordings.
Load-bearing premise
A small plug-in module can aggregate information over long ECG sequences in a semantically meaningful way when guided only by the frozen short-segment model, without losing clinically relevant details learned during the original 10-second pretraining.
What would settle it
If experiments on a long-horizon ECG dataset show that the plug-in method performs no better than a simple sliding-window or pooling baseline, or if it misses key clinical events preserved by the original 10-second model, the central claim would not hold.
Figures
read the original abstract
Electrocardiogram (ECG) foundation models pretrained on typical diagnostic 10-second ECG segments, have demonstrated strong transferability across a range of clinical applications. However, many real-world applications produce recordings that are typically longer, and are varied in duration during inference time. These 10-second models have no built-in way to combine information across time. Extending them to longer horizons introduces two challenges: structural incompatibilities arising from input-length disparities, and semantic challenges that limit meaningful temporal aggregation. We propose a parameter-efficient framework that extends pretrained ECG foundation models to longer and variable-length ECGs without retraining the backbone. Guided by a frozen pretrained 10-second model, we introduce a lightweight plug-in module that extends the model in two complementary ways: (i) structurally compatible long-sequence processing and (ii) semantically informed temporal modeling. Experiments on multiple long-horizon ECG tasks, datasets, and foundation model backbones demonstrate that our method enables robust long-horizon extension from pretrained snapshot models, consistently outperforming sliding-window and pooling-based baselines with strong parameter efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a parameter-efficient framework to extend pretrained 10-second ECG foundation models to longer and variable-length recordings without retraining the backbone. A lightweight plug-in module, guided by the frozen 10-second model, is introduced to address structural input-length incompatibilities and enable semantically informed temporal aggregation. Experiments across multiple long-horizon ECG tasks, datasets, and backbones are reported to show consistent outperformance over sliding-window and pooling baselines with strong parameter efficiency.
Significance. If the results hold under scrutiny, the work offers a practical route to deploy existing short-segment ECG foundation models on real-world extended recordings (e.g., Holter or telemetry) while preserving the original pretraining investment. The emphasis on parameter efficiency and the dual structural-semantic design are clear strengths that could reduce the need for costly long-sequence retraining.
major comments (2)
- [Method] The central claim that the plug-in performs semantically informed temporal aggregation rests on the frozen 10-second backbone supplying relevant long-range features. Because pretraining occurs exclusively on fixed 10 s segments, it is unclear whether representations encode evolving patterns such as intermittent arrhythmias or ST changes over minutes; this assumption is load-bearing for the semantic-guidance component and requires explicit justification or ablation (e.g., comparison against a non-semantic adapter).
- [Experiments] The experimental section reports consistent outperformance but, consistent with the abstract, supplies no quantitative metrics, confidence intervals, or statistical tests in the summary description. Without these details it is difficult to assess whether the gains are robust or merely reflect baseline weaknesses on the chosen long-horizon tasks.
minor comments (2)
- [Method] Notation for the plug-in module and its interface with the frozen backbone could be clarified with a diagram or pseudocode to aid reproducibility.
- [Abstract] The abstract would benefit from at least one concrete performance delta or efficiency number to convey the magnitude of improvement.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We address each major comment below and indicate the revisions made to the manuscript.
read point-by-point responses
-
Referee: [Method] The central claim that the plug-in performs semantically informed temporal aggregation rests on the frozen 10-second backbone supplying relevant long-range features. Because pretraining occurs exclusively on fixed 10 s segments, it is unclear whether representations encode evolving patterns such as intermittent arrhythmias or ST changes over minutes; this assumption is load-bearing for the semantic-guidance component and requires explicit justification or ablation (e.g., comparison against a non-semantic adapter).
Authors: We agree that the semantic-guidance design requires explicit support. The 10-second backbone produces per-segment embeddings that encode clinically relevant features (e.g., morphology and rhythm descriptors) shown to transfer across tasks in prior work; these embeddings are then used by the plug-in to weight and aggregate information across variable-length sequences. To directly test the contribution of semantic guidance, we have added an ablation that replaces the backbone-guided module with a non-semantic adapter (simple linear projection plus temporal pooling). The new results (Section 4.3, Table 5) show consistent degradation when semantic guidance is removed, confirming that the frozen backbone supplies useful long-range cues even though it was pretrained on fixed 10 s inputs. We have also expanded the method section with a short discussion of this transferability assumption. revision: yes
-
Referee: [Experiments] The experimental section reports consistent outperformance but, consistent with the abstract, supplies no quantitative metrics, confidence intervals, or statistical tests in the summary description. Without these details it is difficult to assess whether the gains are robust or merely reflect baseline weaknesses on the chosen long-horizon tasks.
Authors: The detailed per-task metrics, standard deviations, confidence intervals, and statistical tests (paired t-tests with reported p-values) already appear in Tables 2–4 and the supplementary material. To improve readability of the high-level summary, we have revised the abstract and the first paragraph of the experiments section to include key quantitative highlights (average AUC/F1 gains and confirmation of statistical significance across backbones and datasets). These additions allow readers to gauge robustness without immediately consulting the full tables. revision: yes
Circularity Check
New architectural plug-in module introduces independent extension without reducing to fitted inputs or self-citations
full rationale
The paper's core contribution is the proposal of a lightweight plug-in module that structurally and semantically extends frozen 10-second pretrained ECG foundation models to variable-length inputs. This is presented as an architectural design choice guided by the backbone, with performance validated through experiments across tasks, datasets, and backbones. No equations or derivations are shown that define outputs in terms of themselves, rename fitted parameters as predictions, or rely on load-bearing self-citations whose uniqueness is imported without external verification. The method is self-contained as an empirical engineering extension rather than a closed mathematical chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pretrained 10-second ECG foundation models capture transferable features that can guide extension to longer sequences
invented entities (1)
-
lightweight plug-in module
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a parameter-efficient framework that extends pretrained ECG foundation models to longer and variable-length ECGs without retraining the backbone. Guided by a frozen pretrained 10-second model, we introduce a lightweight plug-in module that extends the model in two complementary ways: (i) structurally compatible long-sequence processing and (ii) semantically informed temporal modeling.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
E[L][t] = E[10s][t mod N[10s]] + Eglobal[⌊t/N[10s]⌋]
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Konstantinos C Siontis, Peter A Noseworthy, Zachi I Attia, and Paul A Friedman. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021
work page 2021
-
[2]
Nils Strodthoff, Patrick Wagner, Tobias Schaeffter, and Wojciech Samek. Deep learning for ecg analysis: Benchmarks and insights from ptb-xl.IEEE journal of biomedical and health informatics, 25(5):1519–1528, 2020
work page 2020
-
[3]
Jun Li, Aaron Aguirre, Junior Moura, Che Liu, Lanhai Zhong, Chenxi Sun, Gari Clifford, Brandon Westover, and Shenda Hong. An electrocardiogram foundation model built on over 10 million recordings with external evaluation across multiple domains.arXiv preprint arXiv:2410.04133, 2024
-
[4]
Xiao Gu, Yuxuan Shu, Jinpei Han, Yuxuan Liu, Zhangdaihong Liu, James Anibal, Veer Sangha, Edward Phillips, Bradley Segal, Yuxuan Liu, Hang Yuan, Fenglin Liu, Kim Branson, Patrick Schwab, Danielle Belgrave, Lei Clifton, Dimitris Spathis, Vasileios Lampos, A. Aldo Faisal, and David A. Clifton. Foundation models for biosignals: A survey. 2025
work page 2025
-
[5]
Ecg-fm: An open electrocardiogram foundation model.JAMIA open, 8(5):ooaf122, 2025
Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, and Bo Wang. Ecg-fm: An open electrocardiogram foundation model.JAMIA open, 8(5):ooaf122, 2025
work page 2025
-
[6]
Xiao Gu, Wei Tang, Jinpei Han, Veer Sangha, Fenglin Liu, Shreyank N Gowda, Antonio H Ribeiro, Patrick Schwab, Kim Branson, Lei Clifton, et al. Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals.Nature Machine Intelligence, 8(2):220–233, 2026
work page 2026
-
[7]
Jun Li, Aaron D Aguirre, Valdery Moura Junior, Jiarui Jin, Che Liu, Lanhai Zhong, Chenxi Sun, Gari Clifford, M Brandon Westover, and Shenda Hong. An electrocardiogram foundation model built on over 10 million recordings.NEJM AI, 2(7):AIoa2401033, 2025
work page 2025
-
[8]
VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors
Li-wei Lehman, Benjamin Moody, Harsh Deep, Feng Wu, Hasan Saeed, Lucas McCullum, Diane Perry, Tristan Struja, Qiao Li, Gari Clifford, and Roger Mark. VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors. InAdvances in Neural Information Processing Systems (NeurIPS), pages 38827–38843, 2023
work page 2023
-
[9]
Benjamin Moody, George Moody, Mauricio Villarroel, Gari D. Clifford, and Ikaro Silva. MIMIC-III Waveform Database Matched Subset.PhysioNet, April 2020. doi: 10.13026/c2294b. URLhttps://doi.org/10.13026/c2294b. Version 1.0
-
[10]
MC-MED, multimodal clinical monitoring in the emergency department.Scientific Data, 12(1):1094, 2025
Aman Kansal, Emma Chen, Boyang Tom Jin, Pranav Rajpurkar, and David A Kim. MC-MED, multimodal clinical monitoring in the emergency department.Scientific Data, 12(1):1094, 2025
work page 2025
-
[11]
Zhijiang Wan, Qianhao Yu, Jia Mao, Wenfeng Duan, and Cheng Ding. Openecg: Benchmarking ecg foundation models with public 1.2 million records.arXiv preprint arXiv:2503.00711, 2025
-
[12]
Jonathan B Moody, Alexis Poitrasson-Rivière, Jennifer M Renaud, Tomoe Hagio, Fares Alahdab, Mouaz H Al-Mallah, Michael D Vanderver, Sascha N Goonewardena, Edward P Ficaro, and Venkatesh L Murthy. A foundation transformer model with self-supervised learning for ecg- based assessment of cardiac and coronary function.NEJM AI, 2(12):AIoa2500164, 2025
work page 2025
-
[13]
Junho Song, Jong-Hwan Jang, DongGyun Hong, Joon myoung Kwon, and Yong-Yeon Jo. Crema: A contrastive regularized masked autoencoder for robust ecg diagnostics across clinical domains, 2025. URLhttps://arxiv.org/abs/2407.07110
-
[14]
Shaoting Zhang, Yishan Du, Wenji Wang, Xianying He, Fangfang Cui, Liang Zhao, Bei Wang, Zhiqiang Hu, Ziqiang Wang, Qing Xia, et al. Ecgfm: A foundation model for ecg analysis trained on a multi-center million-ecg dataset.Information Fusion, page 103363, 2025
work page 2025
-
[15]
Han Yu, Peikun Guo, and Akane Sano. Ecg semantic integrator (esi): A foundation ecg model pretrained with llm-enhanced cardiological text.Transactions on Machine Learning Research (TMLR), 2024. 10
work page 2024
-
[16]
Yuanyuan Tian, Zhiyuan Li, Yanrui Jin, Mengxiao Wang, Xiaoyang Wei, Liqun Zhao, Yunqing Liu, Jinlei Liu, and Chengliang Liu. Foundation model of ecg diagnosis: Diagnostics and explanations of any form and rhythm on ecg.Cell Reports Medicine, 5(12), 2024
work page 2024
-
[17]
Wei-Long Zheng, Edilberto Amorim, Jin Jing, Wendong Ge, Shenda Hong, Ona Wu, Moham- mad Ghassemi, Jong Woo Lee, Adithya Sivaraju, Trudy Pang, et al. Predicting neurological outcome in comatose patients after cardiac arrest with multiscale deep neural networks.Resus- citation, 169:86–94, 2021
work page 2021
-
[18]
Peng Zhang, Fan Lin, Fei Ma, Yuting Chen, Siyi Fang, Haiyan Zheng, Zuwen Xiang, Xiaoyun Yang, and Qiang Li. Automatic screening of patients with atrial fibrillation from 24-h holter recording using deep learning.European Heart Journal-Digital Health, 4(3):216–224, 2023
work page 2023
-
[19]
Suli Wang, Yangshen Deng, Zhenghua Bao, Xinyu Zhan, and Yiqun Duan. Neurottt: Bridging pretraining-downstream task misalignment in eeg foundation models via test-time training. arXiv preprint arXiv:2509.26301, 2025
-
[20]
Rushuang Zhou, Yuanting Zhang, and Yining Dong. H-tuning: Toward low-cost and efficient ECG-based cardiovascular disease detection with pre-trained models. InForty-second Inter- national Conference on Machine Learning, 2025. URLhttps://openreview.net/forum? id=RLu1QIPiVr
work page 2025
-
[21]
Efficient per- sonalized adaptation for physiological signal foundation model
Chenrui Wu, Haishuai Wang, Xiang Zhang, Chengqi Zhang, and Jiajun Bu. Efficient per- sonalized adaptation for physiological signal foundation model. InForty-second Interna- tional Conference on Machine Learning, 2025. URL https://openreview.net/forum? id=55ysNwbOTI
work page 2025
-
[22]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. URL https:...
work page 2021
-
[23]
BEit: BERT pre-training of image transformers
Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. BEit: BERT pre-training of image transformers. InInternational Conference on Learning Representations, 2022. URL https: //openreview.net/forum?id=p-BhZSz59o4
work page 2022
-
[24]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021
work page 2021
-
[25]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InEuropean Conference on Computer Vision (ECCV), pages 709–727, 2022
work page 2022
-
[26]
Soft contrastive learning for time series
Seunghan Lee, Taeyoung Park, and Kibok Lee. Soft contrastive learning for time series. In12th International Conference on Learning Representations, ICLR 2024, 2024
work page 2024
-
[27]
Clocs: Contrastive learning of cardiac signals across space, time, and patients
Dani Kiyasseh, Tingting Zhu, and David A Clifton. Clocs: Contrastive learning of cardiac signals across space, time, and patients. InInternational Conference on Machine Learning, pages 5606–5615. PMLR, 2021
work page 2021
-
[28]
Kaiwen Zha, Peng Cao, Jeany Son, Yuzhe Yang, and Dina Katabi. Rank-n-contrast: learning continuous representations for regression.Advances in Neural Information Processing Systems, 36:17882–17903, 2023
work page 2023
-
[29]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020
work page 2020
-
[30]
Exploring simple siamese representation learning
Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021. 11
work page 2021
-
[31]
Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild.Advances in Neural Information Processing Systems, 36:78240–78260, 2023
work page 2023
-
[32]
Zero- shot ECG classification with multimodal learning and test-time clinical knowledge enhancement
Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, and Rossella Arcucci. Zero- shot ECG classification with multimodal learning and test-time clinical knowledge enhancement. InInternational Conference on Machine Learning (ICML), pages 31949–31963, 2024
work page 2024
-
[33]
Sehun Kim. Learning general representation of 12-lead electrocardiogram with a joint- embedding predictive architecture, 2026. URLhttps://arxiv.org/abs/2410.08559
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[34]
Li-wei Lehman, Benjamin Moody, Lucas McCullum, Hasan Saeed, Harsh Deep, Diane Perry, Tristan Struja, Qiao Li, Gari Clifford, and Roger Mark. VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors.PhysioNet, October 2024. doi: 10.13026/ 8td2-g363. URLhttps://doi.org/10.13026/8td2-g363. Version 1.0
-
[35]
Multimodal Clinical Monitoring in the Emergency Department (MC-MED).PhysioNet, September 2025
Aman Kansal, Emma Chen, Tom Jin, Pranav Rajpurkar, and David Kim. Multimodal Clinical Monitoring in the Emergency Department (MC-MED).PhysioNet, September 2025. doi: 10.13026/wvyw-g663. URLhttps://doi.org/10.13026/wvyw-g663. Version 1.0.1
-
[36]
Xingyao Wang, Caiyun Ma, Xiangyu Zhang, Hongxiang Gao, Gari D. Clifford, and Chengyu Liu. Paroxysmal Atrial Fibrillation Events Detection from Dynamic ECG Recordings: The 4th China Physiological Signal Challenge 2021.PhysioNet, June 2021. doi: 10.13026/ksya-qw89. URLhttps://doi.org/10.13026/ksya-qw89. Version 1.0.0
-
[37]
Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. AdaptFormer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 35:16664–16678, 2022
work page 2022
-
[38]
AdapterFusion: Non-destructive task composition for transfer learning
Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, and Iryna Gurevych. AdapterFusion: Non-destructive task composition for transfer learning. InProceedings of the 16th conference of the European chapter of the association for computational linguistics, pages 487–503, 2021. 12
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.