ECG Biometrics with ArcFace-Inception: External Validation on MIMIC and HEEDB
Pith reviewed 2026-05-10 18:49 UTC · model grok-4.3
The pith
ECG identity information remains measurable on large external datasets but degrades with time, domain shifts, and gallery size.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A 1D Inception-v1 model trained with ArcFace on 164,440 ECGs from 53,079 patients produces Rank@1 of 0.9506 on the source domain, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC under a unified closed-set leave-one-out protocol. Temporal stress tests at fixed gallery size show Rank@1 falling from 0.7853 to 0.6433 on MIMIC and from 0.6864 to 0.5560 on HEEDB between one- and five-year gaps. Scale analysis on HEEDB reveals monotonic degradation with larger galleries and recovery when more examinations per patient are available, while post-hoc reranking and normalization further raise retrieval rates.
What carries the argument
1D Inception-v1 network trained with ArcFace loss to embed ECG waveforms into identity-discriminative vectors.
If this is right
- ECG identity signatures persist across external domains and multi-year intervals under closed-set conditions.
- Accuracy declines monotonically with increasing gallery size and with longitudinal drift from one to five years.
- Second-stage score processing such as reranking and normalization measurably improves retrieval.
- Additional examinations per patient offset some of the scale-induced performance loss.
Where Pith is reading between the lines
- Domain-adaptation or periodic re-training steps may be needed to maintain performance when hospital data distributions differ.
- The closed-set protocol likely overestimates accuracy in genuine open-set deployments where unknown individuals must be rejected.
- Long-term ECG biometrics may require scheduled re-enrollment to counter physiological drift.
Load-bearing premise
The closed-set leave-one-out protocol on the chosen external cohorts sufficiently represents real-world open-set identification challenges without major selection biases in gallery and probe construction.
What would settle it
An open-set replication on the same MIMIC and HEEDB cohorts that reports Rank@1 below 0.5 would demonstrate that the closed-set results do not translate to settings with unknown identities.
read the original abstract
ECG biometrics has been studied mainly on small cohorts and short inter-session intervals, leaving open how identification behaves under large galleries, external domain shift, and multi-year temporal gaps. We evaluated a 1D Inception-v1 model trained with ArcFace on an internal clinical corpus of 164,440 12-lead ECGs from 53,079 patients and tested it on larger cohorts derived from MIMIC-IV-ECG and HEEDB. The study used a unified closed-set leave-one-out protocol with Rank@K and TAR@FAR metrics, together with scale, temporal-stress, reranking, and confidence analyses. Under general comparability, the system achieved Rank@1 of 0.9506 on ASUGI-DB, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC. In the temporal stress test at constant gallery size, Rank@1 declined from 0.7853 to 0.6433 on MIMIC and from 0.6864 to 0.5560 on HEEDB from 1 to 5 years. Scale analysis on HEEDB showed monotonic degradation as gallery size increased and recovery as more examinations per patient became available. On HEEDB-RR, post-hoc reranking further improved retrieval, with AS-norm reaching Rank@1 = 0.8005 from a 0.7765 baseline. ECG identity information therefore remains measurable under externally validated large-scale closed-set conditions, but its operational quality is strongly affected by domain heterogeneity, longitudinal drift, gallery size, and second-stage score processing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates a 1D Inception-v1 model trained with ArcFace loss on an internal clinical corpus of 164,440 12-lead ECGs from 53,079 patients. It tests the model on larger external cohorts derived from MIMIC-IV-ECG and HEEDB using a unified closed-set leave-one-out protocol, reporting Rank@1 of 0.9506 on the internal ASUGI-DB, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC. Additional analyses examine effects of temporal stress (decline over 1-5 years), gallery scale, and post-hoc reranking (e.g., AS-norm improving Rank@1 to 0.8005 on HEEDB-RR). The central claim is that ECG identity information remains measurable under externally validated large-scale closed-set conditions, though operational quality is affected by domain heterogeneity, longitudinal drift, gallery size, and second-stage processing.
Significance. If the results hold after addressing methodological gaps, this provides a rare large-scale external validation of ECG biometrics beyond small-cohort, short-interval studies. The quantified impacts of temporal drift, gallery size, and reranking offer practical guidance for real-world deployment in clinical settings, strengthening the evidence base for the field's feasibility claims.
major comments (3)
- [Methods (cohort construction)] Methods section on cohort construction: The paper does not specify the exact patient-inclusion criteria or gallery-construction rules used to derive MIMIC-GC and HEEDB-GC from the source databases (e.g., whether only patients with multiple ECGs were retained to enable leave-one-out). This is load-bearing because the reported Rank@1 values (0.8291 and 0.6884) and temporal-drift curves could be inflated by selection toward more stable or frequently recorded subjects, confounding attribution to domain shift versus sampling artifact.
- [Results (evaluation protocol)] Results and protocol description: Only closed-set leave-one-out Rank@K and TAR@FAR are reported, with no open-set experiments or discussion of how unknown identities would be rejected. This directly affects the central claim of measurability 'under externally validated large-scale closed-set conditions,' as real-world identification is typically open-set; the current protocol may not capture the full operational challenges.
- [Experimental details and Results] Experimental setup and reporting: No baseline comparisons to prior ECG biometric methods, no statistical error bars or confidence intervals on the metrics, and insufficient training details (e.g., exact ArcFace margin/scale values or 1D Inception-v1 hyperparameters) are provided. These omissions undermine verification of whether the performance numbers reflect genuine advances or post-hoc choices.
minor comments (2)
- [Abstract] Abstract: 'ASUGI-DB' is referenced for the internal Rank@1 of 0.9506 but is not clearly defined relative to the 'internal clinical corpus'; add a brief clarification or cross-reference to the methods.
- [Throughout] Notation consistency: Ensure 'Rank@K', 'TAR@FAR', and terms like 'gallery size' are defined at first use in the methods and used uniformly in figures/tables.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments point by point below, indicating the revisions we plan to make.
read point-by-point responses
-
Referee: [Methods (cohort construction)] Methods section on cohort construction: The paper does not specify the exact patient-inclusion criteria or gallery-construction rules used to derive MIMIC-GC and HEEDB-GC from the source databases (e.g., whether only patients with multiple ECGs were retained to enable leave-one-out). This is load-bearing because the reported Rank@1 values (0.8291 and 0.6884) and temporal-drift curves could be inflated by selection toward more stable or frequently recorded subjects, confounding attribution to domain shift versus sampling artifact.
Authors: We agree that additional details on cohort construction are necessary for reproducibility and to address potential selection biases. In the revised version, we will provide a detailed description of the patient inclusion criteria, including the requirement for multiple ECGs per patient, the specific rules for constructing the gallery and probe sets in the leave-one-out protocol, and how the cohorts were sampled from the source databases. This will allow readers to assess whether the results are influenced by sampling artifacts. revision: yes
-
Referee: [Results (evaluation protocol)] Results and protocol description: Only closed-set leave-one-out Rank@K and TAR@FAR are reported, with no open-set experiments or discussion of how unknown identities would be rejected. This directly affects the central claim of measurability 'under externally validated large-scale closed-set conditions,' as real-world identification is typically open-set; the current protocol may not capture the full operational challenges.
Authors: Our study is explicitly scoped to closed-set identification to evaluate the persistence of ECG identity signals under external validation and large galleries, as articulated in the title and abstract. We recognize the importance of open-set scenarios in practice. In the revision, we will expand the discussion section to include an analysis of the implications for open-set identification and suggest methods for handling unknown identities, such as using score thresholds or outlier detection. However, conducting full open-set experiments would require substantial additional work beyond the current scope. revision: partial
-
Referee: [Experimental details and Results] Experimental setup and reporting: No baseline comparisons to prior ECG biometric methods, no statistical error bars or confidence intervals on the metrics, and insufficient training details (e.g., exact ArcFace margin/scale values or 1D Inception-v1 hyperparameters) are provided. These omissions undermine verification of whether the performance numbers reflect genuine advances or post-hoc choices.
Authors: We will revise the Methods section to include all relevant training hyperparameters, including the exact ArcFace margin and scale parameters, as well as the specific configuration of the 1D Inception-v1 model. For statistical rigor, we will add bootstrap-derived confidence intervals to the reported metrics. Regarding baseline comparisons, our primary contribution is the external validation on large cohorts rather than outperforming prior methods on internal data; we will include a table comparing our results to key prior studies where protocols align, or explain the difficulties in direct comparison due to dataset differences. revision: yes
Circularity Check
No circularity; purely empirical measurement on held-out cohorts
full rationale
The manuscript reports direct experimental outputs (Rank@1, Rank@K, TAR@FAR) from training an Inception-v1+ArcFace model on an internal corpus and evaluating it via closed-set leave-one-out on external MIMIC-GC and HEEDB-GC cohorts. No equations, fitted parameters renamed as predictions, self-citations invoked as uniqueness theorems, or ansatzes smuggled in appear in the derivation chain. All scale, temporal-stress, and reranking results are protocol-driven measurements on held-out data rather than quantities that reduce to the inputs by construction. The central claim is therefore an empirical observation, not a self-referential derivation.
Axiom & Free-Parameter Ledger
free parameters (2)
- ArcFace margin and scale hyperparameters
- Inception-v1 1D adaptation hyperparameters (depth, kernel sizes, learning rate schedule)
axioms (2)
- domain assumption Patient identity is encoded in 12-lead ECG morphology in a manner learnable by a convolutional network
- domain assumption The closed-set leave-one-out protocol with Rank@K metrics approximates operational biometric performance
Reference graph
Works this paper leans on
-
[1]
Harvard-emory ecg database (heedb) v5.0.https://bdsp.io/content/heedb/5.0/. Ac- cessed: 2026-02-28
work page 2026
-
[2]
Mimic-iv-ecg (waveform database) v0.1.0.https://physionet.org/content/mimic4wdb/ 0.1.0/. Accessed: 2026-02-28. 13
work page 2026
-
[3]
D. A. AlDuwaile and M. S. Islam. Cnn and a single heartbeat for ecg biometric recognition. Entropy, 2021
work page 2021
-
[4]
R. Auckenthaler, M. Carey, and H. Lloyd-Thomas. Score normalization for text-independent speaker verification systems.Digital Signal Processing, 2000
work page 2000
-
[5]
K. J. Chee and D. A. Ramli. Ecg biometrics using transformer’s self-attention.Sensors, 2022
work page 2022
-
[6]
O. Chum, J. Philbin, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. InICCV, 2007
work page 2007
-
[7]
H. P. da Silva et al. Cybhi: A new dataset for off-the-person ecg biometrics.Computer Methods and Programs in Biomedicine, 2014
work page 2014
- [8]
-
[9]
R. Donida Labati et al. Deep-ecg: Cnn for ecg biometric recognition.Pattern Recognition Letters, 2019
work page 2019
-
[10]
M. Donoser and H. Bischof. Diffusion processes for retrieval revisited. InCVPR, 2013
work page 2013
-
[11]
S. Z. Fatemian and D. Hatzinakos. A new ecg feature extractor for biometric recognition. Digital Signal Processing, 2009
work page 2009
-
[12]
A. Fratini et al. Individual identification via ecg analysis.BioMedical Engineering OnLine, 2015
work page 2015
-
[13]
A. L. Goldberger et al. Physionet: Components of a new research resource for complex physiologic signals.Circulation, 2000
work page 2000
-
[14]
N. Ibtehaz et al. Edith: Ecg biometrics aided by deep learning.IEEE Transactions on Emerging Topics in Computational Intelligence, 2022
work page 2022
- [15]
-
[16]
J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. On combining classifiers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998
work page 1998
-
[17]
L. Košćová et al. The harvard-emory ecg database (heedb).Scientific Data, 2025
work page 2025
- [18]
- [19]
-
[20]
I. Odinaka et al. Ecg biometric recognition: A comparative analysis.IEEE Transactions on Information Forensics and Security, 2012
work page 2012
-
[21]
J. R. Pinto, J. S. Cardoso, and A. Lourenco. Evolution, current challenges, and future possibilities in ecg biometrics.IEEE Access, 2018
work page 2018
- [22]
-
[23]
F. Radenovic, G. Tolias, and O. Chum. Fine-tuning cnn image retrieval with no human annotation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. 14
work page 2018
-
[24]
Verification and identification in ECG biometric on large-scale
Arjuna Scagnetto. Verification and identification in ECG biometric on large-scale. arXiv preprint arXiv:2602.02776, 2026
- [25]
-
[26]
C. Szegedy et al. Rethinking the inception architecture for computer vision. InCVPR, 2016
work page 2016
-
[27]
G. Wang, S. Shanker, A. Nag, Y. Lian, and D. John. Ecg biometric authentication using self-supervised learning for IoT edge sensors. arXiv preprint arXiv:2409.05627, 2024. 15 Supplementary Materials 1 Table S1: Corpora and derived experimental datasets used in the study. Corpus Short name Protocol N datasets Purpose ASUGI ASUGI-DB Training / Validation / ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.