ECG-biometrics-bench: A Unified Framework for Reproducible Benchmarking of ECG Biometrics
Pith reviewed 2026-05-09 14:52 UTC · model grok-4.3
The pith
Random splits within ECG sessions inflate biometric performance that falls when time or subjects change.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The core claim is that the Random Split Fallacy leads to overly optimistic results in ECG biometrics because intra-session evaluation protocols mask severe degradation from temporal drift and unseen identities. This failure is not specific to particular models like DeepECG, ResNet1D, or CNN-LSTM but seems built into supervised feature-learning approaches. The introduced benchmarking framework allows testing under cross-session and long-term separation conditions, and a heavy enrollment with lightweight authentication using dynamic multi-session template fusion can partially address the aging effect.
What carries the argument
ECG-biometrics-bench, a modular framework that standardizes preprocessing, segmentation, and evaluation protocols including cross-session and long-term temporal separation across seven public datasets.
If this is right
- Literature results on ECG biometrics performance are likely inflated.
- The problem affects multiple common neural architectures, indicating a paradigm-level issue.
- Template fusion from multiple sessions can reduce some of the temporal degradation.
- Better protocols are required for assessing real deployment readiness.
Where Pith is reading between the lines
- Similar evaluation issues may affect other biometrics that use physiological signals subject to drift.
- The framework could encourage the development of methods robust to time-based changes in signals.
- Real-world systems might need to combine ECG with other modalities to achieve reliable performance.
Load-bearing premise
The performance degradation and its independence from specific models will hold more generally, and the cross-session protocols will match real-world conditions without other confounding factors.
What would settle it
Observing sustained high performance from a supervised model on long-term subject-disjoint tests from the datasets without special enrollment strategies would disprove that the degradation is inherent.
Figures
read the original abstract
Electrocardiogram (ECG) biometrics have emerged as a promising modality for continuous, liveness-aware authentication in wearable systems. However, many prior studies report overly optimistic results due to data leakage (e.g., random splits within the same session). To address this issue, we introduce ECG-biometrics-bench, a modular, reproducible benchmarking framework that standardizes preprocessing, segmentation, and evaluation across seven widely used public ECG datasets spanning clinical, ambulatory, and large-scale cohort settings. The framework supports both closed-set and open-set (i.e., subject-disjoint generalization in this work) evaluation, as well as progressively realistic protocols including cross-session and long-term temporal separation. To facilitate reproducible research in the community, the ECG-biometrics-bench repository will be made publicly accessible on GitHub upon the acceptance of this manuscript. Through a comprehensive multi-dataset analysis, we expose the Random Split Fallacy, demonstrating that intra-session evaluation protocols artificially inflate performance while masking severe degradation caused by temporal drift and unseen identities. Furthermore, by evaluating multiple architectures, including DeepECG, ResNet1D, and CNN-LSTM, we show that these failures are not model-specific but are likely inherent to current supervised feature-learning paradigms. Finally, we demonstrate that performance degradation due to temporal aging can be partially mitigated through a heavy enrollment, lightweight authentication strategy based on dynamic multi-session template fusion. These findings establish a more realistic baseline for ECG biometrics and highlight critical challenges that must be addressed for reliable real-world deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ECG-biometrics-bench, a modular framework for standardized, reproducible benchmarking of ECG biometrics across seven public datasets (clinical, ambulatory, and large-scale). It evaluates closed-set and open-set protocols under intra-session random splits, cross-session, and long-term temporal separations. The central claims are that intra-session random splits produce inflated performance (the 'Random Split Fallacy') that masks degradation from temporal drift and unseen identities; that this degradation occurs consistently across DeepECG, ResNet1D, and CNN-LSTM and is therefore not model-specific but likely inherent to supervised feature-learning paradigms; and that a heavy-enrollment/lightweight-authentication strategy with dynamic multi-session template fusion can partially mitigate temporal aging effects.
Significance. If the results hold after addressing scope limitations, the work would provide a valuable public benchmarking resource and more realistic baselines for ECG biometrics in wearable authentication. The emphasis on reproducibility via planned public code release, use of public datasets, and multi-dataset analysis is a clear strength. The identification of temporal drift as a core challenge could usefully redirect research toward more robust enrollment and adaptation strategies.
major comments (1)
- Abstract: The assertion that observed failures are 'likely inherent to current supervised feature-learning paradigms' is not supported by the reported experiments. Only three deep architectures (DeepECG, ResNet1D, CNN-LSTM) are evaluated; no classical supervised baselines (e.g., fiducial-point or wavelet features with SVM, LDA, or random forest) are run on the same cross-session and long-term splits. The data therefore establish only that the degradation is not specific to these three networks, not that it is paradigm-wide. This extrapolation is load-bearing for the paper's strongest claim and requires either additional baselines or a narrowed statement.
minor comments (1)
- The manuscript should explicitly state the exact train/validation/test subject splits and session indices used for each protocol in a table or appendix to enable direct reproduction even before the GitHub release.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The feedback helps clarify the scope of our claims, and we address the major comment below with a commitment to revision.
read point-by-point responses
-
Referee: Abstract: The assertion that observed failures are 'likely inherent to current supervised feature-learning paradigms' is not supported by the reported experiments. Only three deep architectures (DeepECG, ResNet1D, CNN-LSTM) are evaluated; no classical supervised baselines (e.g., fiducial-point or wavelet features with SVM, LDA, or random forest) are run on the same cross-session and long-term splits. The data therefore establish only that the degradation is not specific to these three networks, not that it is paradigm-wide. This extrapolation is load-bearing for the paper's strongest claim and requires either additional baselines or a narrowed statement.
Authors: We agree with the referee that the experiments are confined to three deep learning architectures and do not include classical supervised methods such as fiducial or wavelet features paired with SVM, LDA, or random forests. The consistent degradation observed across DeepECG (an ECG-specific model), ResNet1D, and CNN-LSTM under cross-session and long-term protocols indicates that the issue is not an artifact of any single network design. However, we acknowledge that this does not yet establish the phenomenon as inherent to supervised feature-learning paradigms in general. To correct the overstatement, we will revise the abstract, introduction, and discussion to narrow the claim to the evaluated deep supervised approaches, stating that the failures are 'not model-specific among the tested architectures but likely inherent to current deep supervised feature-learning paradigms.' We will also note in the discussion that extending the framework to classical baselines remains valuable future work. These changes will be reflected in the revised manuscript. revision: yes
Circularity Check
No circularity: empirical benchmarking study with independent experimental grounding
full rationale
The paper presents an empirical benchmarking framework evaluated on seven public ECG datasets using three deep architectures under controlled protocols (intra-session, cross-session, long-term). No mathematical derivation, parameter fitting, or self-referential definition exists that reduces outputs to inputs by construction. Claims rest on planned public code and reproducible splits rather than self-citation chains or ansatzes. The generalization to 'supervised feature-learning paradigms' is an interpretive extrapolation beyond tested models but does not constitute a circular reduction per the enumerated patterns. This matches the default expectation for non-circular empirical work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Lightweight mobilenetv1+ gru for ecg biometric authentication: Federated and adversarial evaluation,
D. H. Rai and S. Kafley, “Lightweight mobilenetv1+ gru for ecg biometric authentication: Federated and adversarial evaluation,”arXiv preprint arXiv:2509.20382, 2025
-
[2]
Biometric recognition: A systematic review on electrocardiogram data acquisition methods,
T. M. Pereira, R. C. Conceição, V . Sencadas, and R. Sebastião, “Biometric recognition: A systematic review on electrocardiogram data acquisition methods,”Sensors, vol. 23, no. 3, p. 1507, 2023
work page 2023
-
[3]
Intraindividual variability in electrocardiograms,
B. J. Schijvenaars, G. van Herpen, and J. A. Kors, “Intraindividual variability in electrocardiograms,”Journal of Electrocardiology, vol. 41, no. 3, pp. 190–196, 2008
work page 2008
-
[4]
Ecg databases for biometric systems: A systematic review,
M. Merone, P. Soda, M. Sansone, and C. Sansone, “Ecg databases for biometric systems: A systematic review,” Expert Systems with Applications, vol. 67, pp. 189–202, 2017
work page 2017
-
[5]
Ecg biometric recognition: Review, system proposal, and benchmark evaluation,
P. Melzi, R. Tolosana, and R. Vera-Rodriguez, “Ecg biometric recognition: Review, system proposal, and benchmark evaluation,”IEEE Access, vol. 11, pp. 15 555–15 566, 2023
work page 2023
-
[6]
Deep-ecg: Convolutional neural networks for ecg biometric recognition,
R. D. Labati, E. Muñoz, V . Piuri, R. Sassi, and F. Scotti, “Deep-ecg: Convolutional neural networks for ecg biometric recognition,”Pattern Recognition Letters, vol. 126, pp. 78–85, 2019
work page 2019
-
[7]
Novel fiducial and non-fiducial approaches to electrocardiogram-based biometric systems,
D. Pereira Coutinho, H. Silva, H. Gamboa, A. Fred, and M. Figueiredo, “Novel fiducial and non-fiducial approaches to electrocardiogram-based biometric systems,”IET biometrics, vol. 2, no. 2, pp. 64–75, 2013
work page 2013
-
[8]
Comparative analysis of bag-of-words models for ecg-based biometrics,
I. B. Ciocoiu, “Comparative analysis of bag-of-words models for ecg-based biometrics,”IET Biometrics, vol. 6, no. 6, pp. 495–502, 2017. 22
work page 2017
-
[9]
Ecg identification based on matching pursuit,
Z. Zhao and L. Yang, “Ecg identification based on matching pursuit,” in2011 4th International conference on biomedical engineering and informatics (BMEI), vol. 2. IEEE, 2011, pp. 721–724
work page 2011
-
[10]
A wavelet feature extraction method for electrocar- diogram (ecg)-based biometric recognition,
M. M. Tantawi, K. Revett, A.-B. Salem, and M. F. Tolba, “A wavelet feature extraction method for electrocar- diogram (ecg)-based biometric recognition,”Signal, Image and Video Processing, vol. 9, no. 6, pp. 1271–1280, 2015
work page 2015
-
[11]
eigenpulse: Robust human identification from cardiovascular function,
J. M. Irvine, S. A. Israel, W. T. Scruggs, and W. J. Worek, “eigenpulse: Robust human identification from cardiovascular function,”Pattern Recognition, vol. 41, no. 11, pp. 3427–3435, 2008
work page 2008
-
[12]
Q. Zhang, D. Zhou, and X. Zeng, “Heartid: A multiresolution convolutional neural network for ecg-based biometric human identification in smart health applications,”Ieee Access, vol. 5, pp. 11 805–11 816, 2017
work page 2017
-
[13]
Ecg based biometric human identification using chaotic encryption,
M. Jahiruzzaman and A. A. Hossain, “Ecg based biometric human identification using chaotic encryption,” in 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT). IEEE, 2015, pp. 1–5
work page 2015
-
[14]
Identification of individuals using electrocardiogram,
P. Sasikala and R. Wahidabanu, “Identification of individuals using electrocardiogram,”International journal of computer science and network security, vol. 10, no. 12, pp. 147–153, 2010
work page 2010
-
[15]
Real-time electrocardiogram streams for continuous authentication,
C. Camara, P. Peris-Lopez, L. Gonzalez-Manzano, and J. Tapiador, “Real-time electrocardiogram streams for continuous authentication,”Applied Soft Computing, vol. 68, pp. 784–794, 2018
work page 2018
-
[16]
Human electrocardiogram for biometrics using dtw and flda,
N. Venkatesh and S. Jayaraman, “Human electrocardiogram for biometrics using dtw and flda,” in2010 20th International Conference on Pattern Recognition. IEEE, 2010, pp. 3838–3841
work page 2010
-
[17]
M. Mesinovic and T. Zhu, “Survbench: A standardised preprocessing pipeline for multi-modal electronic health record survival analysis,”arXiv preprint arXiv:2511.11935, 2025
work page internal anchor Pith review arXiv 2025
-
[18]
An open source benchmarked toolbox for cardiovascular waveform and interval analysis,
A. N. Vest, G. Da Poian, Q. Li, C. Liu, S. Nemati, A. J. Shah, and G. D. Clifford, “An open source benchmarked toolbox for cardiovascular waveform and interval analysis,”Physiological measurement, vol. 39, no. 10, p. 105004, 2018
work page 2018
-
[19]
Advancing ecg biometrics through vision transformers: A confidence-driven approach,
O. D’angelis, L. Bacco, L. V ollero, and M. Merone, “Advancing ecg biometrics through vision transformers: A confidence-driven approach,”IEEE Access, vol. 11, pp. 140 710–140 721, 2023
work page 2023
-
[20]
Biometric human identification based on electrocardiogram,
T. S. Lugovaya, “Biometric human identification based on electrocardiogram,” Ph.D. dissertation, Faculty of Computing Technologies and Informatics, Electrotechnical University "LETI", Saint-Petersburg, Russian Federation, June 2005
work page 2005
-
[21]
A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals,”Circulation, vol. 101, no. 23, pp. e215–e220, 2000
work page 2000
-
[22]
M. S. Islam, H. Alhichri, Y . Bazi, N. Ammour, N. Alajlan, and R. M. Jomaa, “Heartprint: A dataset of multisession ecg signal with long interval captured from fingers for biometric recognition,”Data, vol. 7, no. 10, p. 141, 2022
work page 2022
-
[23]
Check your biosignals here: A new dataset for off-the-person ecg biometrics,
H. P. Da Silva, A. Lourenço, A. Fred, N. Raposo, and M. Aires-de Sousa, “Check your biosignals here: A new dataset for off-the-person ecg biometrics,”Computer methods and programs in biomedicine, vol. 113, no. 2, pp. 503–514, 2014
work page 2014
-
[24]
The impact of the mit-bih arrhythmia database,
G. B. Moody and R. G. Mark, “The impact of the mit-bih arrhythmia database,”IEEE engineering in medicine and biology magazine, vol. 20, no. 3, pp. 45–50, 2001
work page 2001
-
[25]
A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,”circulation, vol. 101, no. 23, pp. e215–e220, 2000
work page 2000
-
[26]
Nutzung der ekg-signaldatenbank cardiodat der ptb über das internet,
R. Bousseljot, D. Kreiseler, and A. Schnabel, “Nutzung der ekg-signaldatenbank cardiodat der ptb über das internet,” 1995
work page 1995
-
[27]
Ptb-xl, a large publicly available electrocardiography dataset,
P. Wagner, N. Strodthoff, R.-D. Bousseljot, D. Kreiseler, F. I. Lunze, W. Samek, and T. Schaeffter, “Ptb-xl, a large publicly available electrocardiography dataset,”Scientific data, vol. 7, no. 1, pp. 1–15, 2020
work page 2020
-
[28]
H. Zehir, T. Hafs, and S. Daas, “Empirical mode decomposition-based biometric identification using gru and lstm deep neural networks on ecg signals,”Evolving Systems, vol. 15, no. 6, pp. 2193–2209, 2024
work page 2024
-
[29]
Ecg biometrics using deep learning and relative score threshold classification,
D. Belo, N. Bento, H. Silva, A. Fred, and H. Gamboa, “Ecg biometrics using deep learning and relative score threshold classification,”Sensors, vol. 20, no. 15, p. 4078, 2020
work page 2020
-
[30]
Towards a continuous biometric system based on ecg signals acquired on the steering wheel,
J. R. Pinto, J. S. Cardoso, A. Lourenço, and C. Carreiras, “Towards a continuous biometric system based on ecg signals acquired on the steering wheel,”Sensors, vol. 17, no. 10, p. 2228, 2017. 23
work page 2017
-
[31]
Ecg biometric authentication based on non-fiducial approach using kernel methods,
M. Hejazi, S. A. R. Al-Haddad, Y . P. Singh, S. J. Hashim, and A. F. A. Aziz, “Ecg biometric authentication based on non-fiducial approach using kernel methods,”Digital Signal Processing, vol. 52, pp. 72–86, 2016
work page 2016
-
[32]
A wavelet-based capsule neural network for ecg biometric identification,
I. El Boujnouni, H. Zili, A. Tali, T. Tali, and Y . Laaziz, “A wavelet-based capsule neural network for ecg biometric identification,”Biomedical Signal Processing and Control, vol. 76, p. 103692, 2022
work page 2022
-
[33]
An ecg signal denoising method using conditional generative adversarial net,
X. Wang, B. Chen, M. Zeng, Y . Wang, H. Liu, R. Liu, L. Tian, and X. Lu, “An ecg signal denoising method using conditional generative adversarial net,”IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 7, pp. 2929–2940, 2022
work page 2022
-
[34]
Person identification with arrhythmic ecg signals using deep convolution neural network,
A. Al-Jibreen, S. Al-Ahmadi, S. Islam, and A. M. Artoli, “Person identification with arrhythmic ecg signals using deep convolution neural network,”Scientific Reports, vol. 14, no. 1, p. 4431, 2024
work page 2024
-
[35]
Using convolutional neural network and a single heartbeat for ecg biometric recognition,
D. A. AlDuwaile and M. S. Islam, “Using convolutional neural network and a single heartbeat for ecg biometric recognition,”Entropy, vol. 23, no. 6, p. 733, 2021
work page 2021
-
[36]
Neurokit2: A python toolbox for neurophysiological signal processing,
D. Makowski, T. Pham, Z. J. Lau, J. C. Brammer, F. Lespinasse, H. Pham, C. Schölzel, and S. A. Chen, “Neurokit2: A python toolbox for neurophysiological signal processing,”Behavior research methods, pp. 1–8, 2021
work page 2021
-
[37]
A real-time qrs detection algorithm,
J. Pan and W. J. Tompkins, “A real-time qrs detection algorithm,”IEEE transactions on biomedical engineering, no. 3, pp. 230–236, 1985
work page 1985
-
[38]
P. Hamilton, “Open source ecg analysis,” inComputers in cardiology. IEEE, 2002, pp. 101–104
work page 2002
-
[39]
Real time electrocardiogram qrs detection using combined adaptive threshold,
I. I. Christov, “Real time electrocardiogram qrs detection using combined adaptive threshold,”Biomedical engineering online, vol. 3, pp. 1–9, 2004
work page 2004
-
[40]
An ecg biometric system using hierarchical lstm with attention mechanism,
D. Jyotishi and S. Dandapat, “An ecg biometric system using hierarchical lstm with attention mechanism,”IEEE Sensors Journal, vol. 22, no. 6, pp. 6052–6061, 2021
work page 2021
-
[41]
Multicardionet: Interoperability between ecg and ppg biometrics,
R. D. Labati, V . Piuri, F. Rundo, and F. Scotti, “Multicardionet: Interoperability between ecg and ppg biometrics,” Pattern Recognition Letters, vol. 175, pp. 1–7, 2023
work page 2023
-
[42]
Deep contrastive learning-based model for ecg biometrics,
N. Ammour, R. M. Jomaa, M. S. Islam, Y . Bazi, H. Alhichri, and N. Alajlan, “Deep contrastive learning-based model for ecg biometrics,”Applied Sciences, vol. 13, no. 5, p. 3070, 2023
work page 2023
-
[43]
Ecg biometric authentication using self-supervised learning for iot edge sensors,
G. Wang, S. Shanker, A. Nag, Y . Lian, and D. John, “Ecg biometric authentication using self-supervised learning for iot edge sensors,”IEEE Journal of Biomedical and Health Informatics, 2024
work page 2024
-
[44]
M. Hammad and K. Wang, “Parallel score fusion of ecg and fingerprint for human authentication based on convolution neural network,”Computers & Security, vol. 81, pp. 107–122, 2019
work page 2019
-
[45]
Ecg biometric recognition: A comparative analysis,
I. Odinaka, P.-H. Lai, A. D. Kaplan, J. A. O’Sullivan, E. J. Sirevaag, and J. W. Rohrbaugh, “Ecg biometric recognition: A comparative analysis,”IEEE Transactions on Information Forensics and Security, vol. 7, no. 6, pp. 1812–1824, 2012
work page 2012
-
[46]
Evolution, current challenges, and future possibilities in ecg biometrics,
J. R. Pinto, J. S. Cardoso, and A. Lourenço, “Evolution, current challenges, and future possibilities in ecg biometrics,”Ieee Access, vol. 6, pp. 34 746–34 776, 2018. 24
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.