Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

Anshul Thakur; David A. Clifton; Fredrik K. Gustafsson; Jean-Michel Morel; Jinpei Han; Kangning Cui; Lei Clifton; Mattia Carletti; Patitapaban Palo; Raymond H. Chan

arxiv: 2605.16975 · v1 · pith:BQ6ZY7MCnew · submitted 2026-05-16 · 💻 cs.LG · cs.AI

Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

Wei Tang , Jinpei Han , Kangning Cui , Mattia Carletti , Fredrik K. Gustafsson , Shreyank N Gowda , Patitapaban Palo , Anshul Thakur

show 5 more authors

Lei Clifton Jean-michel Morel Raymond H. Chan David A. Clifton Xiao Gu

This is my paper

Pith reviewed 2026-05-19 20:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords ECG foundation modelslong-horizon signalsparameter-efficient adaptationtemporal aggregationvariable-length inputspretrained model extension

0 comments

The pith

A lightweight plug-in module guided by a frozen 10-second ECG model can process longer and variable-length recordings without retraining the backbone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to extend existing ECG foundation models, which are trained only on fixed 10-second segments, so they can handle real-world recordings that run much longer and vary in duration at inference time. It does this by freezing the original model and attaching a small additional module that handles both the structural mismatch of longer inputs and the need to combine information meaningfully across time. A sympathetic reader would care because many clinical ECG applications involve extended monitoring, yet retraining large foundation models for each new length is expensive and often impractical. The experiments test this across several tasks, datasets, and different pretrained backbones, showing gains over standard ways of handling long sequences.

Core claim

By introducing a lightweight plug-in module that receives guidance from a frozen pretrained 10-second ECG foundation model, the approach achieves both structurally compatible long-sequence processing and semantically informed temporal modeling, enabling effective handling of variable-length ECG inputs without any retraining of the original backbone.

What carries the argument

lightweight plug-in module guided by a frozen pretrained 10-second model for temporal aggregation

If this is right

The same plug-in works across multiple long-horizon ECG tasks and datasets without changing the frozen backbone.
It consistently beats sliding-window and pooling baselines while adding only a small number of parameters.
Variable-length recordings can be processed at inference time once the lightweight module is attached.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar guided plug-ins could be tested on other medical time-series models that start from short fixed-length pretraining.
Continuous patient monitoring systems might use this pattern to avoid periodic full retraining when signal lengths change.
The approach leaves open whether the semantic guidance remains effective for rare events that appear only in very extended recordings.

Load-bearing premise

A small plug-in module can aggregate information over long ECG sequences in a semantically meaningful way when guided only by the frozen short-segment model, without losing clinically relevant details learned during the original 10-second pretraining.

What would settle it

If experiments on a long-horizon ECG dataset show that the plug-in method performs no better than a simple sliding-window or pooling baseline, or if it misses key clinical events preserved by the original 10-second model, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2605.16975 by Anshul Thakur, David A. Clifton, Fredrik K. Gustafsson, Jean-Michel Morel, Jinpei Han, Kangning Cui, Lei Clifton, Mattia Carletti, Patitapaban Palo, Raymond H. Chan, Shreyank N Gowda, Wei Tang, Xiao Gu.

**Figure 1.** Figure 1: Overview of the problem setting and extension strategies. ECG foundation models are typically pretrained on short, fixed-length recordings (e.g., 10 s), which makes direct use on long or variable-length recordings non-trivial. Naive extension strategies either run independent window-level predictions and aggregate the outputs as panel (a), or apply simple aggregation or sequential layers over window repres… view at source ↗

**Figure 2.** Figure 2: Overview of our extension framework. The extension is divided into two complementary parts: a structural extension that enables compatibility with long-horizon recordings, and a semantic extension that supports coherent representation learning over extended temporal horizons. (a) Structural extension (Section 3.1) introduces additional learnable tokens, soft prompts and global positional embeddings, by fre… view at source ↗

**Figure 3.** Figure 3: t-SNE visualizations of feature space on VTaC. Blue and purple denote different classes. The dashed boxes highlight regions with clearer class separation, indicating stronger teacher model. 10 30 60 120 180 60 70 80 90 AUC (%) 84.1 88.3 89.0 89.8 89.5 ECG-only 10 30 60 120 180 76.4 81.5 83.5 84.0 84.4 PPG-only 10 30 60 120 180 83.6 92.3 92.5 92.9 91.8 ECG + PPG Signal Length (s) Token Pooling Bias Tuning L… view at source ↗

**Figure 4.** Figure 4: Generalization across signal modalities on VTaC. We evaluate the proposed framework using CSFM-Tiny under ECG-only, PPG-only, and joint ECG+PPG settings. Across all three modality configurations, the proposed method consistently outperforms the baseline adaptation strategies, indicating that the long-horizon extension is not specific to ECG alone. extension. Adding either the locality-aware objective or th… view at source ↗

read the original abstract

Electrocardiogram (ECG) foundation models pretrained on typical diagnostic 10-second ECG segments, have demonstrated strong transferability across a range of clinical applications. However, many real-world applications produce recordings that are typically longer, and are varied in duration during inference time. These 10-second models have no built-in way to combine information across time. Extending them to longer horizons introduces two challenges: structural incompatibilities arising from input-length disparities, and semantic challenges that limit meaningful temporal aggregation. We propose a parameter-efficient framework that extends pretrained ECG foundation models to longer and variable-length ECGs without retraining the backbone. Guided by a frozen pretrained 10-second model, we introduce a lightweight plug-in module that extends the model in two complementary ways: (i) structurally compatible long-sequence processing and (ii) semantically informed temporal modeling. Experiments on multiple long-horizon ECG tasks, datasets, and foundation model backbones demonstrate that our method enables robust long-horizon extension from pretrained snapshot models, consistently outperforming sliding-window and pooling-based baselines with strong parameter efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a parameter-efficient framework to extend pretrained 10-second ECG foundation models to longer and variable-length recordings without retraining the backbone. A lightweight plug-in module, guided by the frozen 10-second model, is introduced to address structural input-length incompatibilities and enable semantically informed temporal aggregation. Experiments across multiple long-horizon ECG tasks, datasets, and backbones are reported to show consistent outperformance over sliding-window and pooling baselines with strong parameter efficiency.

Significance. If the results hold under scrutiny, the work offers a practical route to deploy existing short-segment ECG foundation models on real-world extended recordings (e.g., Holter or telemetry) while preserving the original pretraining investment. The emphasis on parameter efficiency and the dual structural-semantic design are clear strengths that could reduce the need for costly long-sequence retraining.

major comments (2)

[Method] The central claim that the plug-in performs semantically informed temporal aggregation rests on the frozen 10-second backbone supplying relevant long-range features. Because pretraining occurs exclusively on fixed 10 s segments, it is unclear whether representations encode evolving patterns such as intermittent arrhythmias or ST changes over minutes; this assumption is load-bearing for the semantic-guidance component and requires explicit justification or ablation (e.g., comparison against a non-semantic adapter).
[Experiments] The experimental section reports consistent outperformance but, consistent with the abstract, supplies no quantitative metrics, confidence intervals, or statistical tests in the summary description. Without these details it is difficult to assess whether the gains are robust or merely reflect baseline weaknesses on the chosen long-horizon tasks.

minor comments (2)

[Method] Notation for the plug-in module and its interface with the frozen backbone could be clarified with a diagram or pseudocode to aid reproducibility.
[Abstract] The abstract would benefit from at least one concrete performance delta or efficiency number to convey the magnitude of improvement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: [Method] The central claim that the plug-in performs semantically informed temporal aggregation rests on the frozen 10-second backbone supplying relevant long-range features. Because pretraining occurs exclusively on fixed 10 s segments, it is unclear whether representations encode evolving patterns such as intermittent arrhythmias or ST changes over minutes; this assumption is load-bearing for the semantic-guidance component and requires explicit justification or ablation (e.g., comparison against a non-semantic adapter).

Authors: We agree that the semantic-guidance design requires explicit support. The 10-second backbone produces per-segment embeddings that encode clinically relevant features (e.g., morphology and rhythm descriptors) shown to transfer across tasks in prior work; these embeddings are then used by the plug-in to weight and aggregate information across variable-length sequences. To directly test the contribution of semantic guidance, we have added an ablation that replaces the backbone-guided module with a non-semantic adapter (simple linear projection plus temporal pooling). The new results (Section 4.3, Table 5) show consistent degradation when semantic guidance is removed, confirming that the frozen backbone supplies useful long-range cues even though it was pretrained on fixed 10 s inputs. We have also expanded the method section with a short discussion of this transferability assumption. revision: yes
Referee: [Experiments] The experimental section reports consistent outperformance but, consistent with the abstract, supplies no quantitative metrics, confidence intervals, or statistical tests in the summary description. Without these details it is difficult to assess whether the gains are robust or merely reflect baseline weaknesses on the chosen long-horizon tasks.

Authors: The detailed per-task metrics, standard deviations, confidence intervals, and statistical tests (paired t-tests with reported p-values) already appear in Tables 2–4 and the supplementary material. To improve readability of the high-level summary, we have revised the abstract and the first paragraph of the experiments section to include key quantitative highlights (average AUC/F1 gains and confirmation of statistical significance across backbones and datasets). These additions allow readers to gauge robustness without immediately consulting the full tables. revision: yes

Circularity Check

0 steps flagged

New architectural plug-in module introduces independent extension without reducing to fitted inputs or self-citations

full rationale

The paper's core contribution is the proposal of a lightweight plug-in module that structurally and semantically extends frozen 10-second pretrained ECG foundation models to variable-length inputs. This is presented as an architectural design choice guided by the backbone, with performance validated through experiments across tasks, datasets, and backbones. No equations or derivations are shown that define outputs in terms of themselves, rename fitted parameters as predictions, or rely on load-bearing self-citations whose uniqueness is imported without external verification. The method is self-contained as an empirical engineering extension rather than a closed mathematical chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Central claim rests on the pretrained 10-second model providing effective guidance for the plug-in and on the assumption that structural and semantic extensions can be decoupled without major information loss; no specific free parameters or invented entities detailed in abstract.

axioms (1)

domain assumption Pretrained 10-second ECG foundation models capture transferable features that can guide extension to longer sequences
The method relies on freezing and using the original model as a guide for the plug-in module.

invented entities (1)

lightweight plug-in module no independent evidence
purpose: To provide structurally compatible long-sequence processing and semantically informed temporal modeling
New component introduced to address the two stated challenges without retraining the backbone.

pith-pipeline@v0.9.0 · 5763 in / 1228 out tokens · 37842 ms · 2026-05-19T20:59:04.114150+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a parameter-efficient framework that extends pretrained ECG foundation models to longer and variable-length ECGs without retraining the backbone. Guided by a frozen pretrained 10-second model, we introduce a lightweight plug-in module that extends the model in two complementary ways: (i) structurally compatible long-sequence processing and (ii) semantically informed temporal modeling.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

E[L][t] = E[10s][t mod N[10s]] + Eglobal[⌊t/N[10s]⌋]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

Konstantinos C Siontis, Peter A Noseworthy, Zachi I Attia, and Paul A Friedman. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

work page 2021
[2]

Deep learning for ecg analysis: Benchmarks and insights from ptb-xl.IEEE journal of biomedical and health informatics, 25(5):1519–1528, 2020

Nils Strodthoff, Patrick Wagner, Tobias Schaeffter, and Wojciech Samek. Deep learning for ecg analysis: Benchmarks and insights from ptb-xl.IEEE journal of biomedical and health informatics, 25(5):1519–1528, 2020

work page 2020
[3]

An electrocardiogram foundation model built on over 10 million recordings with external evaluation across multiple domains.arXiv preprint arXiv:2410.04133,

Jun Li, Aaron Aguirre, Junior Moura, Che Liu, Lanhai Zhong, Chenxi Sun, Gari Clifford, Brandon Westover, and Shenda Hong. An electrocardiogram foundation model built on over 10 million recordings with external evaluation across multiple domains.arXiv preprint arXiv:2410.04133, 2024

work page arXiv 2024
[4]

Aldo Faisal, and David A

Xiao Gu, Yuxuan Shu, Jinpei Han, Yuxuan Liu, Zhangdaihong Liu, James Anibal, Veer Sangha, Edward Phillips, Bradley Segal, Yuxuan Liu, Hang Yuan, Fenglin Liu, Kim Branson, Patrick Schwab, Danielle Belgrave, Lei Clifton, Dimitris Spathis, Vasileios Lampos, A. Aldo Faisal, and David A. Clifton. Foundation models for biosignals: A survey. 2025

work page 2025
[5]

Ecg-fm: An open electrocardiogram foundation model.JAMIA open, 8(5):ooaf122, 2025

Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, and Bo Wang. Ecg-fm: An open electrocardiogram foundation model.JAMIA open, 8(5):ooaf122, 2025

work page 2025
[6]

Xiao Gu, Wei Tang, Jinpei Han, Veer Sangha, Fenglin Liu, Shreyank N Gowda, Antonio H Ribeiro, Patrick Schwab, Kim Branson, Lei Clifton, et al. Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals.Nature Machine Intelligence, 8(2):220–233, 2026

work page 2026
[7]

An electrocardiogram foundation model built on over 10 million recordings.NEJM AI, 2(7):AIoa2401033, 2025

Jun Li, Aaron D Aguirre, Valdery Moura Junior, Jiarui Jin, Che Liu, Lanhai Zhong, Chenxi Sun, Gari Clifford, M Brandon Westover, and Shenda Hong. An electrocardiogram foundation model built on over 10 million recordings.NEJM AI, 2(7):AIoa2401033, 2025

work page 2025
[8]

VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors

Li-wei Lehman, Benjamin Moody, Harsh Deep, Feng Wu, Hasan Saeed, Lucas McCullum, Diane Perry, Tristan Struja, Qiao Li, Gari Clifford, and Roger Mark. VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors. InAdvances in Neural Information Processing Systems (NeurIPS), pages 38827–38843, 2023

work page 2023
[9]

Clifford, and Ikaro Silva

Benjamin Moody, George Moody, Mauricio Villarroel, Gari D. Clifford, and Ikaro Silva. MIMIC-III Waveform Database Matched Subset.PhysioNet, April 2020. doi: 10.13026/c2294b. URLhttps://doi.org/10.13026/c2294b. Version 1.0

work page doi:10.13026/c2294b 2020
[10]

MC-MED, multimodal clinical monitoring in the emergency department.Scientific Data, 12(1):1094, 2025

Aman Kansal, Emma Chen, Boyang Tom Jin, Pranav Rajpurkar, and David A Kim. MC-MED, multimodal clinical monitoring in the emergency department.Scientific Data, 12(1):1094, 2025

work page 2025
[11]

Openecg: Benchmarking ecg foundation models with public 1.2 million records.arXiv preprint arXiv:2503.00711, 2025

Zhijiang Wan, Qianhao Yu, Jia Mao, Wenfeng Duan, and Cheng Ding. Openecg: Benchmarking ecg foundation models with public 1.2 million records.arXiv preprint arXiv:2503.00711, 2025

work page arXiv 2025
[12]

A foundation transformer model with self-supervised learning for ecg- based assessment of cardiac and coronary function.NEJM AI, 2(12):AIoa2500164, 2025

Jonathan B Moody, Alexis Poitrasson-Rivière, Jennifer M Renaud, Tomoe Hagio, Fares Alahdab, Mouaz H Al-Mallah, Michael D Vanderver, Sascha N Goonewardena, Edward P Ficaro, and Venkatesh L Murthy. A foundation transformer model with self-supervised learning for ecg- based assessment of cardiac and coronary function.NEJM AI, 2(12):AIoa2500164, 2025

work page 2025
[13]

Crema: A contrastive regularized masked autoencoder for robust ecg diagnostics across clinical domains, 2025

Junho Song, Jong-Hwan Jang, DongGyun Hong, Joon myoung Kwon, and Yong-Yeon Jo. Crema: A contrastive regularized masked autoencoder for robust ecg diagnostics across clinical domains, 2025. URLhttps://arxiv.org/abs/2407.07110

work page arXiv 2025
[14]

Ecgfm: A foundation model for ecg analysis trained on a multi-center million-ecg dataset.Information Fusion, page 103363, 2025

Shaoting Zhang, Yishan Du, Wenji Wang, Xianying He, Fangfang Cui, Liang Zhao, Bei Wang, Zhiqiang Hu, Ziqiang Wang, Qing Xia, et al. Ecgfm: A foundation model for ecg analysis trained on a multi-center million-ecg dataset.Information Fusion, page 103363, 2025

work page 2025
[15]

Ecg semantic integrator (esi): A foundation ecg model pretrained with llm-enhanced cardiological text.Transactions on Machine Learning Research (TMLR), 2024

Han Yu, Peikun Guo, and Akane Sano. Ecg semantic integrator (esi): A foundation ecg model pretrained with llm-enhanced cardiological text.Transactions on Machine Learning Research (TMLR), 2024. 10

work page 2024
[16]

Foundation model of ecg diagnosis: Diagnostics and explanations of any form and rhythm on ecg.Cell Reports Medicine, 5(12), 2024

Yuanyuan Tian, Zhiyuan Li, Yanrui Jin, Mengxiao Wang, Xiaoyang Wei, Liqun Zhao, Yunqing Liu, Jinlei Liu, and Chengliang Liu. Foundation model of ecg diagnosis: Diagnostics and explanations of any form and rhythm on ecg.Cell Reports Medicine, 5(12), 2024

work page 2024
[17]

Predicting neurological outcome in comatose patients after cardiac arrest with multiscale deep neural networks.Resus- citation, 169:86–94, 2021

Wei-Long Zheng, Edilberto Amorim, Jin Jing, Wendong Ge, Shenda Hong, Ona Wu, Moham- mad Ghassemi, Jong Woo Lee, Adithya Sivaraju, Trudy Pang, et al. Predicting neurological outcome in comatose patients after cardiac arrest with multiscale deep neural networks.Resus- citation, 169:86–94, 2021

work page 2021
[18]

Automatic screening of patients with atrial fibrillation from 24-h holter recording using deep learning.European Heart Journal-Digital Health, 4(3):216–224, 2023

Peng Zhang, Fan Lin, Fei Ma, Yuting Chen, Siyi Fang, Haiyan Zheng, Zuwen Xiang, Xiaoyun Yang, and Qiang Li. Automatic screening of patients with atrial fibrillation from 24-h holter recording using deep learning.European Heart Journal-Digital Health, 4(3):216–224, 2023

work page 2023
[19]

Neurottt: Bridging pretraining-downstream task misalignment in eeg foundation models via test-time training

Suli Wang, Yangshen Deng, Zhenghua Bao, Xinyu Zhan, and Yiqun Duan. Neurottt: Bridging pretraining-downstream task misalignment in eeg foundation models via test-time training. arXiv preprint arXiv:2509.26301, 2025

work page arXiv 2025
[20]

H-tuning: Toward low-cost and efficient ECG-based cardiovascular disease detection with pre-trained models

Rushuang Zhou, Yuanting Zhang, and Yining Dong. H-tuning: Toward low-cost and efficient ECG-based cardiovascular disease detection with pre-trained models. InForty-second Inter- national Conference on Machine Learning, 2025. URLhttps://openreview.net/forum? id=RLu1QIPiVr

work page 2025
[21]

Efficient per- sonalized adaptation for physiological signal foundation model

Chenrui Wu, Haishuai Wang, Xiang Zhang, Chengqi Zhang, and Jiajun Bu. Efficient per- sonalized adaptation for physiological signal foundation model. InForty-second Interna- tional Conference on Machine Learning, 2025. URL https://openreview.net/forum? id=55ysNwbOTI

work page 2025
[22]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. URL https:...

work page 2021
[23]

BEit: BERT pre-training of image transformers

Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. BEit: BERT pre-training of image transformers. InInternational Conference on Learning Representations, 2022. URL https: //openreview.net/forum?id=p-BhZSz59o4

work page 2022
[24]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021

work page 2021
[25]

Visual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InEuropean Conference on Computer Vision (ECCV), pages 709–727, 2022

work page 2022
[26]

Soft contrastive learning for time series

Seunghan Lee, Taeyoung Park, and Kibok Lee. Soft contrastive learning for time series. In12th International Conference on Learning Representations, ICLR 2024, 2024

work page 2024
[27]

Clocs: Contrastive learning of cardiac signals across space, time, and patients

Dani Kiyasseh, Tingting Zhu, and David A Clifton. Clocs: Contrastive learning of cardiac signals across space, time, and patients. InInternational Conference on Machine Learning, pages 5606–5615. PMLR, 2021

work page 2021
[28]

Rank-n-contrast: learning continuous representations for regression.Advances in Neural Information Processing Systems, 36:17882–17903, 2023

Kaiwen Zha, Peng Cao, Jeany Son, Yuzhe Yang, and Dina Katabi. Rank-n-contrast: learning continuous representations for regression.Advances in Neural Information Processing Systems, 36:17882–17903, 2023

work page 2023
[29]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

work page 2020
[30]

Exploring simple siamese representation learning

Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021. 11

work page 2021
[31]

Biot: Biosignal transformer for cross-data learning in the wild.Advances in Neural Information Processing Systems, 36:78240–78260, 2023

Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild.Advances in Neural Information Processing Systems, 36:78240–78260, 2023

work page 2023
[32]

Zero- shot ECG classification with multimodal learning and test-time clinical knowledge enhancement

Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, and Rossella Arcucci. Zero- shot ECG classification with multimodal learning and test-time clinical knowledge enhancement. InInternational Conference on Machine Learning (ICML), pages 31949–31963, 2024

work page 2024
[33]

Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture

Sehun Kim. Learning general representation of 12-lead electrocardiogram with a joint- embedding predictive architecture, 2026. URLhttps://arxiv.org/abs/2410.08559

work page internal anchor Pith review Pith/arXiv arXiv 2026
[34]

VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors.PhysioNet, October 2024

Li-wei Lehman, Benjamin Moody, Lucas McCullum, Hasan Saeed, Harsh Deep, Diane Perry, Tristan Struja, Qiao Li, Gari Clifford, and Roger Mark. VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors.PhysioNet, October 2024. doi: 10.13026/ 8td2-g363. URLhttps://doi.org/10.13026/8td2-g363. Version 1.0

work page doi:10.13026/8td2-g363 2024
[35]

Multimodal Clinical Monitoring in the Emergency Department (MC-MED).PhysioNet, September 2025

Aman Kansal, Emma Chen, Tom Jin, Pranav Rajpurkar, and David Kim. Multimodal Clinical Monitoring in the Emergency Department (MC-MED).PhysioNet, September 2025. doi: 10.13026/wvyw-g663. URLhttps://doi.org/10.13026/wvyw-g663. Version 1.0.1

work page doi:10.13026/wvyw-g663 2025
[36]

Clifford, and Chengyu Liu

Xingyao Wang, Caiyun Ma, Xiangyu Zhang, Hongxiang Gao, Gari D. Clifford, and Chengyu Liu. Paroxysmal Atrial Fibrillation Events Detection from Dynamic ECG Recordings: The 4th China Physiological Signal Challenge 2021.PhysioNet, June 2021. doi: 10.13026/ksya-qw89. URLhttps://doi.org/10.13026/ksya-qw89. Version 1.0.0

work page doi:10.13026/ksya-qw89 2021
[37]

AdaptFormer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 35:16664–16678, 2022

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. AdaptFormer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 35:16664–16678, 2022

work page 2022
[38]

AdapterFusion: Non-destructive task composition for transfer learning

Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, and Iryna Gurevych. AdapterFusion: Non-destructive task composition for transfer learning. InProceedings of the 16th conference of the European chapter of the association for computational linguistics, pages 487–503, 2021. 12

work page 2021

[1] [1]

Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

Konstantinos C Siontis, Peter A Noseworthy, Zachi I Attia, and Paul A Friedman. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management.Nature Reviews Cardiology, 18(7):465–478, 2021

work page 2021

[2] [2]

Deep learning for ecg analysis: Benchmarks and insights from ptb-xl.IEEE journal of biomedical and health informatics, 25(5):1519–1528, 2020

Nils Strodthoff, Patrick Wagner, Tobias Schaeffter, and Wojciech Samek. Deep learning for ecg analysis: Benchmarks and insights from ptb-xl.IEEE journal of biomedical and health informatics, 25(5):1519–1528, 2020

work page 2020

[3] [3]

An electrocardiogram foundation model built on over 10 million recordings with external evaluation across multiple domains.arXiv preprint arXiv:2410.04133,

Jun Li, Aaron Aguirre, Junior Moura, Che Liu, Lanhai Zhong, Chenxi Sun, Gari Clifford, Brandon Westover, and Shenda Hong. An electrocardiogram foundation model built on over 10 million recordings with external evaluation across multiple domains.arXiv preprint arXiv:2410.04133, 2024

work page arXiv 2024

[4] [4]

Aldo Faisal, and David A

Xiao Gu, Yuxuan Shu, Jinpei Han, Yuxuan Liu, Zhangdaihong Liu, James Anibal, Veer Sangha, Edward Phillips, Bradley Segal, Yuxuan Liu, Hang Yuan, Fenglin Liu, Kim Branson, Patrick Schwab, Danielle Belgrave, Lei Clifton, Dimitris Spathis, Vasileios Lampos, A. Aldo Faisal, and David A. Clifton. Foundation models for biosignals: A survey. 2025

work page 2025

[5] [5]

Ecg-fm: An open electrocardiogram foundation model.JAMIA open, 8(5):ooaf122, 2025

Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, and Bo Wang. Ecg-fm: An open electrocardiogram foundation model.JAMIA open, 8(5):ooaf122, 2025

work page 2025

[6] [6]

Xiao Gu, Wei Tang, Jinpei Han, Veer Sangha, Fenglin Liu, Shreyank N Gowda, Antonio H Ribeiro, Patrick Schwab, Kim Branson, Lei Clifton, et al. Cardiac health assessment across scenarios and devices using a multimodal foundation model pretrained on data from 1.7 million individuals.Nature Machine Intelligence, 8(2):220–233, 2026

work page 2026

[7] [7]

An electrocardiogram foundation model built on over 10 million recordings.NEJM AI, 2(7):AIoa2401033, 2025

Jun Li, Aaron D Aguirre, Valdery Moura Junior, Jiarui Jin, Che Liu, Lanhai Zhong, Chenxi Sun, Gari Clifford, M Brandon Westover, and Shenda Hong. An electrocardiogram foundation model built on over 10 million recordings.NEJM AI, 2(7):AIoa2401033, 2025

work page 2025

[8] [8]

VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors

Li-wei Lehman, Benjamin Moody, Harsh Deep, Feng Wu, Hasan Saeed, Lucas McCullum, Diane Perry, Tristan Struja, Qiao Li, Gari Clifford, and Roger Mark. VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors. InAdvances in Neural Information Processing Systems (NeurIPS), pages 38827–38843, 2023

work page 2023

[9] [9]

Clifford, and Ikaro Silva

Benjamin Moody, George Moody, Mauricio Villarroel, Gari D. Clifford, and Ikaro Silva. MIMIC-III Waveform Database Matched Subset.PhysioNet, April 2020. doi: 10.13026/c2294b. URLhttps://doi.org/10.13026/c2294b. Version 1.0

work page doi:10.13026/c2294b 2020

[10] [10]

MC-MED, multimodal clinical monitoring in the emergency department.Scientific Data, 12(1):1094, 2025

Aman Kansal, Emma Chen, Boyang Tom Jin, Pranav Rajpurkar, and David A Kim. MC-MED, multimodal clinical monitoring in the emergency department.Scientific Data, 12(1):1094, 2025

work page 2025

[11] [11]

Openecg: Benchmarking ecg foundation models with public 1.2 million records.arXiv preprint arXiv:2503.00711, 2025

Zhijiang Wan, Qianhao Yu, Jia Mao, Wenfeng Duan, and Cheng Ding. Openecg: Benchmarking ecg foundation models with public 1.2 million records.arXiv preprint arXiv:2503.00711, 2025

work page arXiv 2025

[12] [12]

A foundation transformer model with self-supervised learning for ecg- based assessment of cardiac and coronary function.NEJM AI, 2(12):AIoa2500164, 2025

Jonathan B Moody, Alexis Poitrasson-Rivière, Jennifer M Renaud, Tomoe Hagio, Fares Alahdab, Mouaz H Al-Mallah, Michael D Vanderver, Sascha N Goonewardena, Edward P Ficaro, and Venkatesh L Murthy. A foundation transformer model with self-supervised learning for ecg- based assessment of cardiac and coronary function.NEJM AI, 2(12):AIoa2500164, 2025

work page 2025

[13] [13]

Crema: A contrastive regularized masked autoencoder for robust ecg diagnostics across clinical domains, 2025

Junho Song, Jong-Hwan Jang, DongGyun Hong, Joon myoung Kwon, and Yong-Yeon Jo. Crema: A contrastive regularized masked autoencoder for robust ecg diagnostics across clinical domains, 2025. URLhttps://arxiv.org/abs/2407.07110

work page arXiv 2025

[14] [14]

Ecgfm: A foundation model for ecg analysis trained on a multi-center million-ecg dataset.Information Fusion, page 103363, 2025

Shaoting Zhang, Yishan Du, Wenji Wang, Xianying He, Fangfang Cui, Liang Zhao, Bei Wang, Zhiqiang Hu, Ziqiang Wang, Qing Xia, et al. Ecgfm: A foundation model for ecg analysis trained on a multi-center million-ecg dataset.Information Fusion, page 103363, 2025

work page 2025

[15] [15]

Ecg semantic integrator (esi): A foundation ecg model pretrained with llm-enhanced cardiological text.Transactions on Machine Learning Research (TMLR), 2024

Han Yu, Peikun Guo, and Akane Sano. Ecg semantic integrator (esi): A foundation ecg model pretrained with llm-enhanced cardiological text.Transactions on Machine Learning Research (TMLR), 2024. 10

work page 2024

[16] [16]

Foundation model of ecg diagnosis: Diagnostics and explanations of any form and rhythm on ecg.Cell Reports Medicine, 5(12), 2024

Yuanyuan Tian, Zhiyuan Li, Yanrui Jin, Mengxiao Wang, Xiaoyang Wei, Liqun Zhao, Yunqing Liu, Jinlei Liu, and Chengliang Liu. Foundation model of ecg diagnosis: Diagnostics and explanations of any form and rhythm on ecg.Cell Reports Medicine, 5(12), 2024

work page 2024

[17] [17]

Predicting neurological outcome in comatose patients after cardiac arrest with multiscale deep neural networks.Resus- citation, 169:86–94, 2021

Wei-Long Zheng, Edilberto Amorim, Jin Jing, Wendong Ge, Shenda Hong, Ona Wu, Moham- mad Ghassemi, Jong Woo Lee, Adithya Sivaraju, Trudy Pang, et al. Predicting neurological outcome in comatose patients after cardiac arrest with multiscale deep neural networks.Resus- citation, 169:86–94, 2021

work page 2021

[18] [18]

Automatic screening of patients with atrial fibrillation from 24-h holter recording using deep learning.European Heart Journal-Digital Health, 4(3):216–224, 2023

Peng Zhang, Fan Lin, Fei Ma, Yuting Chen, Siyi Fang, Haiyan Zheng, Zuwen Xiang, Xiaoyun Yang, and Qiang Li. Automatic screening of patients with atrial fibrillation from 24-h holter recording using deep learning.European Heart Journal-Digital Health, 4(3):216–224, 2023

work page 2023

[19] [19]

Neurottt: Bridging pretraining-downstream task misalignment in eeg foundation models via test-time training

Suli Wang, Yangshen Deng, Zhenghua Bao, Xinyu Zhan, and Yiqun Duan. Neurottt: Bridging pretraining-downstream task misalignment in eeg foundation models via test-time training. arXiv preprint arXiv:2509.26301, 2025

work page arXiv 2025

[20] [20]

H-tuning: Toward low-cost and efficient ECG-based cardiovascular disease detection with pre-trained models

Rushuang Zhou, Yuanting Zhang, and Yining Dong. H-tuning: Toward low-cost and efficient ECG-based cardiovascular disease detection with pre-trained models. InForty-second Inter- national Conference on Machine Learning, 2025. URLhttps://openreview.net/forum? id=RLu1QIPiVr

work page 2025

[21] [21]

Efficient per- sonalized adaptation for physiological signal foundation model

Chenrui Wu, Haishuai Wang, Xiang Zhang, Chengqi Zhang, and Jiajun Bu. Efficient per- sonalized adaptation for physiological signal foundation model. InForty-second Interna- tional Conference on Machine Learning, 2025. URL https://openreview.net/forum? id=55ysNwbOTI

work page 2025

[22] [22]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. URL https:...

work page 2021

[23] [23]

BEit: BERT pre-training of image transformers

Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. BEit: BERT pre-training of image transformers. InInternational Conference on Learning Representations, 2022. URL https: //openreview.net/forum?id=p-BhZSz59o4

work page 2022

[24] [24]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021

work page 2021

[25] [25]

Visual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InEuropean Conference on Computer Vision (ECCV), pages 709–727, 2022

work page 2022

[26] [26]

Soft contrastive learning for time series

Seunghan Lee, Taeyoung Park, and Kibok Lee. Soft contrastive learning for time series. In12th International Conference on Learning Representations, ICLR 2024, 2024

work page 2024

[27] [27]

Clocs: Contrastive learning of cardiac signals across space, time, and patients

Dani Kiyasseh, Tingting Zhu, and David A Clifton. Clocs: Contrastive learning of cardiac signals across space, time, and patients. InInternational Conference on Machine Learning, pages 5606–5615. PMLR, 2021

work page 2021

[28] [28]

Rank-n-contrast: learning continuous representations for regression.Advances in Neural Information Processing Systems, 36:17882–17903, 2023

Kaiwen Zha, Peng Cao, Jeany Son, Yuzhe Yang, and Dina Katabi. Rank-n-contrast: learning continuous representations for regression.Advances in Neural Information Processing Systems, 36:17882–17903, 2023

work page 2023

[29] [29]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

work page 2020

[30] [30]

Exploring simple siamese representation learning

Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021. 11

work page 2021

[31] [31]

Biot: Biosignal transformer for cross-data learning in the wild.Advances in Neural Information Processing Systems, 36:78240–78260, 2023

Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild.Advances in Neural Information Processing Systems, 36:78240–78260, 2023

work page 2023

[32] [32]

Zero- shot ECG classification with multimodal learning and test-time clinical knowledge enhancement

Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, and Rossella Arcucci. Zero- shot ECG classification with multimodal learning and test-time clinical knowledge enhancement. InInternational Conference on Machine Learning (ICML), pages 31949–31963, 2024

work page 2024

[33] [33]

Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture

Sehun Kim. Learning general representation of 12-lead electrocardiogram with a joint- embedding predictive architecture, 2026. URLhttps://arxiv.org/abs/2410.08559

work page internal anchor Pith review Pith/arXiv arXiv 2026

[34] [34]

VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors.PhysioNet, October 2024

Li-wei Lehman, Benjamin Moody, Lucas McCullum, Hasan Saeed, Harsh Deep, Diane Perry, Tristan Struja, Qiao Li, Gari Clifford, and Roger Mark. VTaC: A benchmark dataset of ventricular tachycardia alarms from ICU monitors.PhysioNet, October 2024. doi: 10.13026/ 8td2-g363. URLhttps://doi.org/10.13026/8td2-g363. Version 1.0

work page doi:10.13026/8td2-g363 2024

[35] [35]

Multimodal Clinical Monitoring in the Emergency Department (MC-MED).PhysioNet, September 2025

Aman Kansal, Emma Chen, Tom Jin, Pranav Rajpurkar, and David Kim. Multimodal Clinical Monitoring in the Emergency Department (MC-MED).PhysioNet, September 2025. doi: 10.13026/wvyw-g663. URLhttps://doi.org/10.13026/wvyw-g663. Version 1.0.1

work page doi:10.13026/wvyw-g663 2025

[36] [36]

Clifford, and Chengyu Liu

Xingyao Wang, Caiyun Ma, Xiangyu Zhang, Hongxiang Gao, Gari D. Clifford, and Chengyu Liu. Paroxysmal Atrial Fibrillation Events Detection from Dynamic ECG Recordings: The 4th China Physiological Signal Challenge 2021.PhysioNet, June 2021. doi: 10.13026/ksya-qw89. URLhttps://doi.org/10.13026/ksya-qw89. Version 1.0.0

work page doi:10.13026/ksya-qw89 2021

[37] [37]

AdaptFormer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 35:16664–16678, 2022

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. AdaptFormer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 35:16664–16678, 2022

work page 2022

[38] [38]

AdapterFusion: Non-destructive task composition for transfer learning

Jonas Pfeiffer, Aishwarya Kamath, Andreas Rücklé, Kyunghyun Cho, and Iryna Gurevych. AdapterFusion: Non-destructive task composition for transfer learning. InProceedings of the 16th conference of the European chapter of the association for computational linguistics, pages 487–503, 2021. 12

work page 2021