pith. sign in

arxiv: 2604.04485 · v1 · submitted 2026-04-06 · 💻 cs.LG · cs.AI

ECG Biometrics with ArcFace-Inception: External Validation on MIMIC and HEEDB

Pith reviewed 2026-05-10 18:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords ECG biometricsArcFaceInception networkexternal validationMIMICtemporal driftdomain shiftclosed-set identification
0
0 comments X

The pith

ECG identity information remains measurable on large external datasets but degrades with time, domain shifts, and gallery size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates a deep learning system for recognizing individuals from their ECG waveforms after training on one large internal clinical collection and testing on two much bigger external hospital databases. It demonstrates that the model can still achieve usable matching rates in controlled closed-set experiments even when the test data come from different sources and span multiple years. Performance nevertheless declines steadily as the interval between recordings lengthens, as the number of candidate identities grows, and when the underlying data distribution changes. Readers care because ECG-based identification could support secure patient matching in medical records without extra hardware, yet only if these practical limits are known and managed.

Core claim

A 1D Inception-v1 model trained with ArcFace on 164,440 ECGs from 53,079 patients produces Rank@1 of 0.9506 on the source domain, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC under a unified closed-set leave-one-out protocol. Temporal stress tests at fixed gallery size show Rank@1 falling from 0.7853 to 0.6433 on MIMIC and from 0.6864 to 0.5560 on HEEDB between one- and five-year gaps. Scale analysis on HEEDB reveals monotonic degradation with larger galleries and recovery when more examinations per patient are available, while post-hoc reranking and normalization further raise retrieval rates.

What carries the argument

1D Inception-v1 network trained with ArcFace loss to embed ECG waveforms into identity-discriminative vectors.

If this is right

  • ECG identity signatures persist across external domains and multi-year intervals under closed-set conditions.
  • Accuracy declines monotonically with increasing gallery size and with longitudinal drift from one to five years.
  • Second-stage score processing such as reranking and normalization measurably improves retrieval.
  • Additional examinations per patient offset some of the scale-induced performance loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Domain-adaptation or periodic re-training steps may be needed to maintain performance when hospital data distributions differ.
  • The closed-set protocol likely overestimates accuracy in genuine open-set deployments where unknown individuals must be rejected.
  • Long-term ECG biometrics may require scheduled re-enrollment to counter physiological drift.

Load-bearing premise

The closed-set leave-one-out protocol on the chosen external cohorts sufficiently represents real-world open-set identification challenges without major selection biases in gallery and probe construction.

What would settle it

An open-set replication on the same MIMIC and HEEDB cohorts that reports Rank@1 below 0.5 would demonstrate that the closed-set results do not translate to settings with unknown identities.

read the original abstract

ECG biometrics has been studied mainly on small cohorts and short inter-session intervals, leaving open how identification behaves under large galleries, external domain shift, and multi-year temporal gaps. We evaluated a 1D Inception-v1 model trained with ArcFace on an internal clinical corpus of 164,440 12-lead ECGs from 53,079 patients and tested it on larger cohorts derived from MIMIC-IV-ECG and HEEDB. The study used a unified closed-set leave-one-out protocol with Rank@K and TAR@FAR metrics, together with scale, temporal-stress, reranking, and confidence analyses. Under general comparability, the system achieved Rank@1 of 0.9506 on ASUGI-DB, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC. In the temporal stress test at constant gallery size, Rank@1 declined from 0.7853 to 0.6433 on MIMIC and from 0.6864 to 0.5560 on HEEDB from 1 to 5 years. Scale analysis on HEEDB showed monotonic degradation as gallery size increased and recovery as more examinations per patient became available. On HEEDB-RR, post-hoc reranking further improved retrieval, with AS-norm reaching Rank@1 = 0.8005 from a 0.7765 baseline. ECG identity information therefore remains measurable under externally validated large-scale closed-set conditions, but its operational quality is strongly affected by domain heterogeneity, longitudinal drift, gallery size, and second-stage score processing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper evaluates a 1D Inception-v1 model trained with ArcFace loss on an internal clinical corpus of 164,440 12-lead ECGs from 53,079 patients. It tests the model on larger external cohorts derived from MIMIC-IV-ECG and HEEDB using a unified closed-set leave-one-out protocol, reporting Rank@1 of 0.9506 on the internal ASUGI-DB, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC. Additional analyses examine effects of temporal stress (decline over 1-5 years), gallery scale, and post-hoc reranking (e.g., AS-norm improving Rank@1 to 0.8005 on HEEDB-RR). The central claim is that ECG identity information remains measurable under externally validated large-scale closed-set conditions, though operational quality is affected by domain heterogeneity, longitudinal drift, gallery size, and second-stage processing.

Significance. If the results hold after addressing methodological gaps, this provides a rare large-scale external validation of ECG biometrics beyond small-cohort, short-interval studies. The quantified impacts of temporal drift, gallery size, and reranking offer practical guidance for real-world deployment in clinical settings, strengthening the evidence base for the field's feasibility claims.

major comments (3)
  1. [Methods (cohort construction)] Methods section on cohort construction: The paper does not specify the exact patient-inclusion criteria or gallery-construction rules used to derive MIMIC-GC and HEEDB-GC from the source databases (e.g., whether only patients with multiple ECGs were retained to enable leave-one-out). This is load-bearing because the reported Rank@1 values (0.8291 and 0.6884) and temporal-drift curves could be inflated by selection toward more stable or frequently recorded subjects, confounding attribution to domain shift versus sampling artifact.
  2. [Results (evaluation protocol)] Results and protocol description: Only closed-set leave-one-out Rank@K and TAR@FAR are reported, with no open-set experiments or discussion of how unknown identities would be rejected. This directly affects the central claim of measurability 'under externally validated large-scale closed-set conditions,' as real-world identification is typically open-set; the current protocol may not capture the full operational challenges.
  3. [Experimental details and Results] Experimental setup and reporting: No baseline comparisons to prior ECG biometric methods, no statistical error bars or confidence intervals on the metrics, and insufficient training details (e.g., exact ArcFace margin/scale values or 1D Inception-v1 hyperparameters) are provided. These omissions undermine verification of whether the performance numbers reflect genuine advances or post-hoc choices.
minor comments (2)
  1. [Abstract] Abstract: 'ASUGI-DB' is referenced for the internal Rank@1 of 0.9506 but is not clearly defined relative to the 'internal clinical corpus'; add a brief clarification or cross-reference to the methods.
  2. [Throughout] Notation consistency: Ensure 'Rank@K', 'TAR@FAR', and terms like 'gallery size' are defined at first use in the methods and used uniformly in figures/tables.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments point by point below, indicating the revisions we plan to make.

read point-by-point responses
  1. Referee: [Methods (cohort construction)] Methods section on cohort construction: The paper does not specify the exact patient-inclusion criteria or gallery-construction rules used to derive MIMIC-GC and HEEDB-GC from the source databases (e.g., whether only patients with multiple ECGs were retained to enable leave-one-out). This is load-bearing because the reported Rank@1 values (0.8291 and 0.6884) and temporal-drift curves could be inflated by selection toward more stable or frequently recorded subjects, confounding attribution to domain shift versus sampling artifact.

    Authors: We agree that additional details on cohort construction are necessary for reproducibility and to address potential selection biases. In the revised version, we will provide a detailed description of the patient inclusion criteria, including the requirement for multiple ECGs per patient, the specific rules for constructing the gallery and probe sets in the leave-one-out protocol, and how the cohorts were sampled from the source databases. This will allow readers to assess whether the results are influenced by sampling artifacts. revision: yes

  2. Referee: [Results (evaluation protocol)] Results and protocol description: Only closed-set leave-one-out Rank@K and TAR@FAR are reported, with no open-set experiments or discussion of how unknown identities would be rejected. This directly affects the central claim of measurability 'under externally validated large-scale closed-set conditions,' as real-world identification is typically open-set; the current protocol may not capture the full operational challenges.

    Authors: Our study is explicitly scoped to closed-set identification to evaluate the persistence of ECG identity signals under external validation and large galleries, as articulated in the title and abstract. We recognize the importance of open-set scenarios in practice. In the revision, we will expand the discussion section to include an analysis of the implications for open-set identification and suggest methods for handling unknown identities, such as using score thresholds or outlier detection. However, conducting full open-set experiments would require substantial additional work beyond the current scope. revision: partial

  3. Referee: [Experimental details and Results] Experimental setup and reporting: No baseline comparisons to prior ECG biometric methods, no statistical error bars or confidence intervals on the metrics, and insufficient training details (e.g., exact ArcFace margin/scale values or 1D Inception-v1 hyperparameters) are provided. These omissions undermine verification of whether the performance numbers reflect genuine advances or post-hoc choices.

    Authors: We will revise the Methods section to include all relevant training hyperparameters, including the exact ArcFace margin and scale parameters, as well as the specific configuration of the 1D Inception-v1 model. For statistical rigor, we will add bootstrap-derived confidence intervals to the reported metrics. Regarding baseline comparisons, our primary contribution is the external validation on large cohorts rather than outperforming prior methods on internal data; we will include a table comparing our results to key prior studies where protocols align, or explain the difficulties in direct comparison due to dataset differences. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical measurement on held-out cohorts

full rationale

The manuscript reports direct experimental outputs (Rank@1, Rank@K, TAR@FAR) from training an Inception-v1+ArcFace model on an internal corpus and evaluating it via closed-set leave-one-out on external MIMIC-GC and HEEDB-GC cohorts. No equations, fitted parameters renamed as predictions, self-citations invoked as uniqueness theorems, or ansatzes smuggled in appear in the derivation chain. All scale, temporal-stress, and reranking results are protocol-driven measurements on held-out data rather than quantities that reduce to the inputs by construction. The central claim is therefore an empirical observation, not a self-referential derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the empirical performance numbers and the domain assumption that ECG morphology encodes stable patient identity. No new physical entities are postulated. Free parameters are the usual deep-learning training choices.

free parameters (2)
  • ArcFace margin and scale hyperparameters
    Chosen during training on the internal corpus to optimize embedding separation; values not stated in abstract.
  • Inception-v1 1D adaptation hyperparameters (depth, kernel sizes, learning rate schedule)
    Fitted on the 164k ECG training set; exact values and selection procedure unknown from abstract.
axioms (2)
  • domain assumption Patient identity is encoded in 12-lead ECG morphology in a manner learnable by a convolutional network
    Invoked by the decision to train an identification model rather than a generic classifier.
  • domain assumption The closed-set leave-one-out protocol with Rank@K metrics approximates operational biometric performance
    Stated as the evaluation framework in the abstract.

pith-pipeline@v0.9.0 · 5603 in / 1679 out tokens · 58378 ms · 2026-05-10T18:49:14.608667+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Ac- cessed: 2026-02-28

    Harvard-emory ecg database (heedb) v5.0.https://bdsp.io/content/heedb/5.0/. Ac- cessed: 2026-02-28

  2. [2]

    Accessed: 2026-02-28

    Mimic-iv-ecg (waveform database) v0.1.0.https://physionet.org/content/mimic4wdb/ 0.1.0/. Accessed: 2026-02-28. 13

  3. [3]

    D. A. AlDuwaile and M. S. Islam. Cnn and a single heartbeat for ecg biometric recognition. Entropy, 2021

  4. [4]

    Auckenthaler, M

    R. Auckenthaler, M. Carey, and H. Lloyd-Thomas. Score normalization for text-independent speaker verification systems.Digital Signal Processing, 2000

  5. [5]

    K. J. Chee and D. A. Ramli. Ecg biometrics using transformer’s self-attention.Sensors, 2022

  6. [6]

    O. Chum, J. Philbin, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. InICCV, 2007

  7. [7]

    H. P. da Silva et al. Cybhi: A new dataset for off-the-person ecg biometrics.Computer Methods and Programs in Biomedicine, 2014

  8. [8]

    Deng et al

    J. Deng et al. Arcface: Additive angular margin loss. InCVPR, 2019

  9. [9]

    Donida Labati et al

    R. Donida Labati et al. Deep-ecg: Cnn for ecg biometric recognition.Pattern Recognition Letters, 2019

  10. [10]

    Donoser and H

    M. Donoser and H. Bischof. Diffusion processes for retrieval revisited. InCVPR, 2013

  11. [11]

    S. Z. Fatemian and D. Hatzinakos. A new ecg feature extractor for biometric recognition. Digital Signal Processing, 2009

  12. [12]

    Fratini et al

    A. Fratini et al. Individual identification via ecg analysis.BioMedical Engineering OnLine, 2015

  13. [13]

    A. L. Goldberger et al. Physionet: Components of a new research resource for complex physiologic signals.Circulation, 2000

  14. [14]

    Ibtehaz et al

    N. Ibtehaz et al. Edith: Ecg biometrics aided by deep learning.IEEE Transactions on Emerging Topics in Computational Intelligence, 2022

  15. [15]

    Iscen, G

    A. Iscen, G. Tolias, Y. Avrithis, and O. Chum. Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. InCVPR, 2017

  16. [16]

    Kittler, M

    J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. On combining classifiers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998

  17. [17]

    Košćová et al

    L. Košćová et al. The harvard-emory ecg database (heedb).Scientific Data, 2025

  18. [18]

    Melzi, R

    P. Melzi, R. Tolosana, and R. Vera-Rodriguez. Ecg biometric recognition: Review, system proposal, and benchmark evaluation.IEEE Transactions on Biometrics, Behavior, and Identity Science, 2023

  19. [19]

    Merone, P

    M. Merone, P. Soda, M. Sansone, and C. Sansone. Ecg databases for biometric systems: A systematic review.Expert Systems with Applications, 2017

  20. [20]

    Odinaka et al

    I. Odinaka et al. Ecg biometric recognition: A comparative analysis.IEEE Transactions on Information Forensics and Security, 2012

  21. [21]

    J. R. Pinto, J. S. Cardoso, and A. Lourenco. Evolution, current challenges, and future possibilities in ecg biometrics.IEEE Access, 2018

  22. [22]

    Poh et al

    N. Poh et al. A survey of score normalization in biometrics.Pattern Recognition, 2012

  23. [23]

    Radenovic, G

    F. Radenovic, G. Tolias, and O. Chum. Fine-tuning cnn image retrieval with no human annotation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. 14

  24. [24]

    Verification and identification in ECG biometric on large-scale

    Arjuna Scagnetto. Verification and identification in ECG biometric on large-scale. arXiv preprint arXiv:2602.02776, 2026

  25. [25]

    Szegedy et al

    C. Szegedy et al. Going deeper with convolutions. InCVPR, 2015

  26. [26]

    Szegedy et al

    C. Szegedy et al. Rethinking the inception architecture for computer vision. InCVPR, 2016

  27. [27]

    G. Wang, S. Shanker, A. Nag, Y. Lian, and D. John. Ecg biometric authentication using self-supervised learning for IoT edge sensors. arXiv preprint arXiv:2409.05627, 2024. 15 Supplementary Materials 1 Table S1: Corpora and derived experimental datasets used in the study. Corpus Short name Protocol N datasets Purpose ASUGI ASUGI-DB Training / Validation / ...