pith. sign in

arxiv: 2605.01548 · v1 · submitted 2026-05-02 · 💻 cs.LG · cs.CV· eess.SP

ECG-biometrics-bench: A Unified Framework for Reproducible Benchmarking of ECG Biometrics

Pith reviewed 2026-05-09 14:52 UTC · model grok-4.3

classification 💻 cs.LG cs.CVeess.SP
keywords ECG biometricsbenchmarking frameworkrandom split fallacytemporal driftopen-set evaluationwearable authenticationsupervised learning
0
0 comments X

The pith

Random splits within ECG sessions inflate biometric performance that falls when time or subjects change.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

ECG biometrics are promoted for continuous authentication on wearables, but reported results often rely on flawed evaluation. This paper presents a unified framework that applies consistent preprocessing and testing across multiple public datasets. It shows that splitting data randomly from the same recording session produces high accuracy that does not carry over when sessions are separated by time or when new people are tested. The drop occurs across different neural network models, suggesting the issue lies with how features are learned rather than with any single design. Partial recovery is possible by enrolling with data from many sessions and fusing templates dynamically.

Core claim

The core claim is that the Random Split Fallacy leads to overly optimistic results in ECG biometrics because intra-session evaluation protocols mask severe degradation from temporal drift and unseen identities. This failure is not specific to particular models like DeepECG, ResNet1D, or CNN-LSTM but seems built into supervised feature-learning approaches. The introduced benchmarking framework allows testing under cross-session and long-term separation conditions, and a heavy enrollment with lightweight authentication using dynamic multi-session template fusion can partially address the aging effect.

What carries the argument

ECG-biometrics-bench, a modular framework that standardizes preprocessing, segmentation, and evaluation protocols including cross-session and long-term temporal separation across seven public datasets.

If this is right

  • Literature results on ECG biometrics performance are likely inflated.
  • The problem affects multiple common neural architectures, indicating a paradigm-level issue.
  • Template fusion from multiple sessions can reduce some of the temporal degradation.
  • Better protocols are required for assessing real deployment readiness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar evaluation issues may affect other biometrics that use physiological signals subject to drift.
  • The framework could encourage the development of methods robust to time-based changes in signals.
  • Real-world systems might need to combine ECG with other modalities to achieve reliable performance.

Load-bearing premise

The performance degradation and its independence from specific models will hold more generally, and the cross-session protocols will match real-world conditions without other confounding factors.

What would settle it

Observing sustained high performance from a supervised model on long-term subject-disjoint tests from the datasets without special enrollment strategies would disprove that the degradation is inherent.

Figures

Figures reproduced from arXiv: 2605.01548 by Milad Parvan.

Figure 1
Figure 1. Figure 1: Visual Examples of Preprocessing Methods view at source ↗
Figure 2
Figure 2. Figure 2: Rank-1 Comparison Between Intra-Session and Inter-Session Evaluation view at source ↗
Figure 3
Figure 3. Figure 3: EER Comparison Between Intra-Session and Inter-Session Evaluation view at source ↗
Figure 4
Figure 4. Figure 4: Verification Degradation Over Time: Closed-Set EER Comparison Between Intra-Session Baseline and view at source ↗
Figure 5
Figure 5. Figure 5: Verification Error Reduction via Multi-Session Enrollment. view at source ↗
Figure 6
Figure 6. Figure 6: Model Generalizability: Closed-Set vs. Open-Set Performance. view at source ↗
Figure 7
Figure 7. Figure 7: Controlled Hyperparameter Ablation (Single-Session Closed-Set). view at source ↗
Figure 8
Figure 8. Figure 8: Architectural Agnosticism. 5 Limitations and Future Work Although this study establishes a comprehensive framework for reproducible ECG biometric evaluation, several limitations remain that warrant further investigation. Primarily due to the significant computational demands of executing over 6,000 unique hyperparameter grid-search configurations across multiple large-scale datasets, including PTB-XL, the … view at source ↗
read the original abstract

Electrocardiogram (ECG) biometrics have emerged as a promising modality for continuous, liveness-aware authentication in wearable systems. However, many prior studies report overly optimistic results due to data leakage (e.g., random splits within the same session). To address this issue, we introduce ECG-biometrics-bench, a modular, reproducible benchmarking framework that standardizes preprocessing, segmentation, and evaluation across seven widely used public ECG datasets spanning clinical, ambulatory, and large-scale cohort settings. The framework supports both closed-set and open-set (i.e., subject-disjoint generalization in this work) evaluation, as well as progressively realistic protocols including cross-session and long-term temporal separation. To facilitate reproducible research in the community, the ECG-biometrics-bench repository will be made publicly accessible on GitHub upon the acceptance of this manuscript. Through a comprehensive multi-dataset analysis, we expose the Random Split Fallacy, demonstrating that intra-session evaluation protocols artificially inflate performance while masking severe degradation caused by temporal drift and unseen identities. Furthermore, by evaluating multiple architectures, including DeepECG, ResNet1D, and CNN-LSTM, we show that these failures are not model-specific but are likely inherent to current supervised feature-learning paradigms. Finally, we demonstrate that performance degradation due to temporal aging can be partially mitigated through a heavy enrollment, lightweight authentication strategy based on dynamic multi-session template fusion. These findings establish a more realistic baseline for ECG biometrics and highlight critical challenges that must be addressed for reliable real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces ECG-biometrics-bench, a modular framework for standardized, reproducible benchmarking of ECG biometrics across seven public datasets (clinical, ambulatory, and large-scale). It evaluates closed-set and open-set protocols under intra-session random splits, cross-session, and long-term temporal separations. The central claims are that intra-session random splits produce inflated performance (the 'Random Split Fallacy') that masks degradation from temporal drift and unseen identities; that this degradation occurs consistently across DeepECG, ResNet1D, and CNN-LSTM and is therefore not model-specific but likely inherent to supervised feature-learning paradigms; and that a heavy-enrollment/lightweight-authentication strategy with dynamic multi-session template fusion can partially mitigate temporal aging effects.

Significance. If the results hold after addressing scope limitations, the work would provide a valuable public benchmarking resource and more realistic baselines for ECG biometrics in wearable authentication. The emphasis on reproducibility via planned public code release, use of public datasets, and multi-dataset analysis is a clear strength. The identification of temporal drift as a core challenge could usefully redirect research toward more robust enrollment and adaptation strategies.

major comments (1)
  1. Abstract: The assertion that observed failures are 'likely inherent to current supervised feature-learning paradigms' is not supported by the reported experiments. Only three deep architectures (DeepECG, ResNet1D, CNN-LSTM) are evaluated; no classical supervised baselines (e.g., fiducial-point or wavelet features with SVM, LDA, or random forest) are run on the same cross-session and long-term splits. The data therefore establish only that the degradation is not specific to these three networks, not that it is paradigm-wide. This extrapolation is load-bearing for the paper's strongest claim and requires either additional baselines or a narrowed statement.
minor comments (1)
  1. The manuscript should explicitly state the exact train/validation/test subject splits and session indices used for each protocol in a table or appendix to enable direct reproduction even before the GitHub release.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The feedback helps clarify the scope of our claims, and we address the major comment below with a commitment to revision.

read point-by-point responses
  1. Referee: Abstract: The assertion that observed failures are 'likely inherent to current supervised feature-learning paradigms' is not supported by the reported experiments. Only three deep architectures (DeepECG, ResNet1D, CNN-LSTM) are evaluated; no classical supervised baselines (e.g., fiducial-point or wavelet features with SVM, LDA, or random forest) are run on the same cross-session and long-term splits. The data therefore establish only that the degradation is not specific to these three networks, not that it is paradigm-wide. This extrapolation is load-bearing for the paper's strongest claim and requires either additional baselines or a narrowed statement.

    Authors: We agree with the referee that the experiments are confined to three deep learning architectures and do not include classical supervised methods such as fiducial or wavelet features paired with SVM, LDA, or random forests. The consistent degradation observed across DeepECG (an ECG-specific model), ResNet1D, and CNN-LSTM under cross-session and long-term protocols indicates that the issue is not an artifact of any single network design. However, we acknowledge that this does not yet establish the phenomenon as inherent to supervised feature-learning paradigms in general. To correct the overstatement, we will revise the abstract, introduction, and discussion to narrow the claim to the evaluated deep supervised approaches, stating that the failures are 'not model-specific among the tested architectures but likely inherent to current deep supervised feature-learning paradigms.' We will also note in the discussion that extending the framework to classical baselines remains valuable future work. These changes will be reflected in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmarking study with independent experimental grounding

full rationale

The paper presents an empirical benchmarking framework evaluated on seven public ECG datasets using three deep architectures under controlled protocols (intra-session, cross-session, long-term). No mathematical derivation, parameter fitting, or self-referential definition exists that reduces outputs to inputs by construction. Claims rest on planned public code and reproducible splits rather than self-citation chains or ansatzes. The generalization to 'supervised feature-learning paradigms' is an interpretive extrapolation beyond tested models but does not constitute a circular reduction per the enumerated patterns. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a methodological benchmarking contribution relying on standard assumptions about public ECG datasets being suitable proxies for real-world use and on conventional supervised learning evaluation practices; no new free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5573 in / 1289 out tokens · 38523 ms · 2026-05-09T14:52:52.818347+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 1 internal anchor

  1. [1]

    Lightweight mobilenetv1+ gru for ecg biometric authentication: Federated and adversarial evaluation,

    D. H. Rai and S. Kafley, “Lightweight mobilenetv1+ gru for ecg biometric authentication: Federated and adversarial evaluation,”arXiv preprint arXiv:2509.20382, 2025

  2. [2]

    Biometric recognition: A systematic review on electrocardiogram data acquisition methods,

    T. M. Pereira, R. C. Conceição, V . Sencadas, and R. Sebastião, “Biometric recognition: A systematic review on electrocardiogram data acquisition methods,”Sensors, vol. 23, no. 3, p. 1507, 2023

  3. [3]

    Intraindividual variability in electrocardiograms,

    B. J. Schijvenaars, G. van Herpen, and J. A. Kors, “Intraindividual variability in electrocardiograms,”Journal of Electrocardiology, vol. 41, no. 3, pp. 190–196, 2008

  4. [4]

    Ecg databases for biometric systems: A systematic review,

    M. Merone, P. Soda, M. Sansone, and C. Sansone, “Ecg databases for biometric systems: A systematic review,” Expert Systems with Applications, vol. 67, pp. 189–202, 2017

  5. [5]

    Ecg biometric recognition: Review, system proposal, and benchmark evaluation,

    P. Melzi, R. Tolosana, and R. Vera-Rodriguez, “Ecg biometric recognition: Review, system proposal, and benchmark evaluation,”IEEE Access, vol. 11, pp. 15 555–15 566, 2023

  6. [6]

    Deep-ecg: Convolutional neural networks for ecg biometric recognition,

    R. D. Labati, E. Muñoz, V . Piuri, R. Sassi, and F. Scotti, “Deep-ecg: Convolutional neural networks for ecg biometric recognition,”Pattern Recognition Letters, vol. 126, pp. 78–85, 2019

  7. [7]

    Novel fiducial and non-fiducial approaches to electrocardiogram-based biometric systems,

    D. Pereira Coutinho, H. Silva, H. Gamboa, A. Fred, and M. Figueiredo, “Novel fiducial and non-fiducial approaches to electrocardiogram-based biometric systems,”IET biometrics, vol. 2, no. 2, pp. 64–75, 2013

  8. [8]

    Comparative analysis of bag-of-words models for ecg-based biometrics,

    I. B. Ciocoiu, “Comparative analysis of bag-of-words models for ecg-based biometrics,”IET Biometrics, vol. 6, no. 6, pp. 495–502, 2017. 22

  9. [9]

    Ecg identification based on matching pursuit,

    Z. Zhao and L. Yang, “Ecg identification based on matching pursuit,” in2011 4th International conference on biomedical engineering and informatics (BMEI), vol. 2. IEEE, 2011, pp. 721–724

  10. [10]

    A wavelet feature extraction method for electrocar- diogram (ecg)-based biometric recognition,

    M. M. Tantawi, K. Revett, A.-B. Salem, and M. F. Tolba, “A wavelet feature extraction method for electrocar- diogram (ecg)-based biometric recognition,”Signal, Image and Video Processing, vol. 9, no. 6, pp. 1271–1280, 2015

  11. [11]

    eigenpulse: Robust human identification from cardiovascular function,

    J. M. Irvine, S. A. Israel, W. T. Scruggs, and W. J. Worek, “eigenpulse: Robust human identification from cardiovascular function,”Pattern Recognition, vol. 41, no. 11, pp. 3427–3435, 2008

  12. [12]

    Heartid: A multiresolution convolutional neural network for ecg-based biometric human identification in smart health applications,

    Q. Zhang, D. Zhou, and X. Zeng, “Heartid: A multiresolution convolutional neural network for ecg-based biometric human identification in smart health applications,”Ieee Access, vol. 5, pp. 11 805–11 816, 2017

  13. [13]

    Ecg based biometric human identification using chaotic encryption,

    M. Jahiruzzaman and A. A. Hossain, “Ecg based biometric human identification using chaotic encryption,” in 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT). IEEE, 2015, pp. 1–5

  14. [14]

    Identification of individuals using electrocardiogram,

    P. Sasikala and R. Wahidabanu, “Identification of individuals using electrocardiogram,”International journal of computer science and network security, vol. 10, no. 12, pp. 147–153, 2010

  15. [15]

    Real-time electrocardiogram streams for continuous authentication,

    C. Camara, P. Peris-Lopez, L. Gonzalez-Manzano, and J. Tapiador, “Real-time electrocardiogram streams for continuous authentication,”Applied Soft Computing, vol. 68, pp. 784–794, 2018

  16. [16]

    Human electrocardiogram for biometrics using dtw and flda,

    N. Venkatesh and S. Jayaraman, “Human electrocardiogram for biometrics using dtw and flda,” in2010 20th International Conference on Pattern Recognition. IEEE, 2010, pp. 3838–3841

  17. [17]

    Survbench: A standardised preprocessing pipeline for multi-modal electronic health record survival analysis,

    M. Mesinovic and T. Zhu, “Survbench: A standardised preprocessing pipeline for multi-modal electronic health record survival analysis,”arXiv preprint arXiv:2511.11935, 2025

  18. [18]

    An open source benchmarked toolbox for cardiovascular waveform and interval analysis,

    A. N. Vest, G. Da Poian, Q. Li, C. Liu, S. Nemati, A. J. Shah, and G. D. Clifford, “An open source benchmarked toolbox for cardiovascular waveform and interval analysis,”Physiological measurement, vol. 39, no. 10, p. 105004, 2018

  19. [19]

    Advancing ecg biometrics through vision transformers: A confidence-driven approach,

    O. D’angelis, L. Bacco, L. V ollero, and M. Merone, “Advancing ecg biometrics through vision transformers: A confidence-driven approach,”IEEE Access, vol. 11, pp. 140 710–140 721, 2023

  20. [20]

    Biometric human identification based on electrocardiogram,

    T. S. Lugovaya, “Biometric human identification based on electrocardiogram,” Ph.D. dissertation, Faculty of Computing Technologies and Informatics, Electrotechnical University "LETI", Saint-Petersburg, Russian Federation, June 2005

  21. [21]

    Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals,

    A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals,”Circulation, vol. 101, no. 23, pp. e215–e220, 2000

  22. [22]

    Heartprint: A dataset of multisession ecg signal with long interval captured from fingers for biometric recognition,

    M. S. Islam, H. Alhichri, Y . Bazi, N. Ammour, N. Alajlan, and R. M. Jomaa, “Heartprint: A dataset of multisession ecg signal with long interval captured from fingers for biometric recognition,”Data, vol. 7, no. 10, p. 141, 2022

  23. [23]

    Check your biosignals here: A new dataset for off-the-person ecg biometrics,

    H. P. Da Silva, A. Lourenço, A. Fred, N. Raposo, and M. Aires-de Sousa, “Check your biosignals here: A new dataset for off-the-person ecg biometrics,”Computer methods and programs in biomedicine, vol. 113, no. 2, pp. 503–514, 2014

  24. [24]

    The impact of the mit-bih arrhythmia database,

    G. B. Moody and R. G. Mark, “The impact of the mit-bih arrhythmia database,”IEEE engineering in medicine and biology magazine, vol. 20, no. 3, pp. 45–50, 2001

  25. [25]

    Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,

    A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,”circulation, vol. 101, no. 23, pp. e215–e220, 2000

  26. [26]

    Nutzung der ekg-signaldatenbank cardiodat der ptb über das internet,

    R. Bousseljot, D. Kreiseler, and A. Schnabel, “Nutzung der ekg-signaldatenbank cardiodat der ptb über das internet,” 1995

  27. [27]

    Ptb-xl, a large publicly available electrocardiography dataset,

    P. Wagner, N. Strodthoff, R.-D. Bousseljot, D. Kreiseler, F. I. Lunze, W. Samek, and T. Schaeffter, “Ptb-xl, a large publicly available electrocardiography dataset,”Scientific data, vol. 7, no. 1, pp. 1–15, 2020

  28. [28]

    Empirical mode decomposition-based biometric identification using gru and lstm deep neural networks on ecg signals,

    H. Zehir, T. Hafs, and S. Daas, “Empirical mode decomposition-based biometric identification using gru and lstm deep neural networks on ecg signals,”Evolving Systems, vol. 15, no. 6, pp. 2193–2209, 2024

  29. [29]

    Ecg biometrics using deep learning and relative score threshold classification,

    D. Belo, N. Bento, H. Silva, A. Fred, and H. Gamboa, “Ecg biometrics using deep learning and relative score threshold classification,”Sensors, vol. 20, no. 15, p. 4078, 2020

  30. [30]

    Towards a continuous biometric system based on ecg signals acquired on the steering wheel,

    J. R. Pinto, J. S. Cardoso, A. Lourenço, and C. Carreiras, “Towards a continuous biometric system based on ecg signals acquired on the steering wheel,”Sensors, vol. 17, no. 10, p. 2228, 2017. 23

  31. [31]

    Ecg biometric authentication based on non-fiducial approach using kernel methods,

    M. Hejazi, S. A. R. Al-Haddad, Y . P. Singh, S. J. Hashim, and A. F. A. Aziz, “Ecg biometric authentication based on non-fiducial approach using kernel methods,”Digital Signal Processing, vol. 52, pp. 72–86, 2016

  32. [32]

    A wavelet-based capsule neural network for ecg biometric identification,

    I. El Boujnouni, H. Zili, A. Tali, T. Tali, and Y . Laaziz, “A wavelet-based capsule neural network for ecg biometric identification,”Biomedical Signal Processing and Control, vol. 76, p. 103692, 2022

  33. [33]

    An ecg signal denoising method using conditional generative adversarial net,

    X. Wang, B. Chen, M. Zeng, Y . Wang, H. Liu, R. Liu, L. Tian, and X. Lu, “An ecg signal denoising method using conditional generative adversarial net,”IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 7, pp. 2929–2940, 2022

  34. [34]

    Person identification with arrhythmic ecg signals using deep convolution neural network,

    A. Al-Jibreen, S. Al-Ahmadi, S. Islam, and A. M. Artoli, “Person identification with arrhythmic ecg signals using deep convolution neural network,”Scientific Reports, vol. 14, no. 1, p. 4431, 2024

  35. [35]

    Using convolutional neural network and a single heartbeat for ecg biometric recognition,

    D. A. AlDuwaile and M. S. Islam, “Using convolutional neural network and a single heartbeat for ecg biometric recognition,”Entropy, vol. 23, no. 6, p. 733, 2021

  36. [36]

    Neurokit2: A python toolbox for neurophysiological signal processing,

    D. Makowski, T. Pham, Z. J. Lau, J. C. Brammer, F. Lespinasse, H. Pham, C. Schölzel, and S. A. Chen, “Neurokit2: A python toolbox for neurophysiological signal processing,”Behavior research methods, pp. 1–8, 2021

  37. [37]

    A real-time qrs detection algorithm,

    J. Pan and W. J. Tompkins, “A real-time qrs detection algorithm,”IEEE transactions on biomedical engineering, no. 3, pp. 230–236, 1985

  38. [38]

    Open source ecg analysis,

    P. Hamilton, “Open source ecg analysis,” inComputers in cardiology. IEEE, 2002, pp. 101–104

  39. [39]

    Real time electrocardiogram qrs detection using combined adaptive threshold,

    I. I. Christov, “Real time electrocardiogram qrs detection using combined adaptive threshold,”Biomedical engineering online, vol. 3, pp. 1–9, 2004

  40. [40]

    An ecg biometric system using hierarchical lstm with attention mechanism,

    D. Jyotishi and S. Dandapat, “An ecg biometric system using hierarchical lstm with attention mechanism,”IEEE Sensors Journal, vol. 22, no. 6, pp. 6052–6061, 2021

  41. [41]

    Multicardionet: Interoperability between ecg and ppg biometrics,

    R. D. Labati, V . Piuri, F. Rundo, and F. Scotti, “Multicardionet: Interoperability between ecg and ppg biometrics,” Pattern Recognition Letters, vol. 175, pp. 1–7, 2023

  42. [42]

    Deep contrastive learning-based model for ecg biometrics,

    N. Ammour, R. M. Jomaa, M. S. Islam, Y . Bazi, H. Alhichri, and N. Alajlan, “Deep contrastive learning-based model for ecg biometrics,”Applied Sciences, vol. 13, no. 5, p. 3070, 2023

  43. [43]

    Ecg biometric authentication using self-supervised learning for iot edge sensors,

    G. Wang, S. Shanker, A. Nag, Y . Lian, and D. John, “Ecg biometric authentication using self-supervised learning for iot edge sensors,”IEEE Journal of Biomedical and Health Informatics, 2024

  44. [44]

    Parallel score fusion of ecg and fingerprint for human authentication based on convolution neural network,

    M. Hammad and K. Wang, “Parallel score fusion of ecg and fingerprint for human authentication based on convolution neural network,”Computers & Security, vol. 81, pp. 107–122, 2019

  45. [45]

    Ecg biometric recognition: A comparative analysis,

    I. Odinaka, P.-H. Lai, A. D. Kaplan, J. A. O’Sullivan, E. J. Sirevaag, and J. W. Rohrbaugh, “Ecg biometric recognition: A comparative analysis,”IEEE Transactions on Information Forensics and Security, vol. 7, no. 6, pp. 1812–1824, 2012

  46. [46]

    Evolution, current challenges, and future possibilities in ecg biometrics,

    J. R. Pinto, J. S. Cardoso, and A. Lourenço, “Evolution, current challenges, and future possibilities in ecg biometrics,”Ieee Access, vol. 6, pp. 34 746–34 776, 2018. 24