Event-based Liveness Detection using Temporal Ocular Dynamics: An Exploratory Approach
Pith reviewed 2026-05-07 13:48 UTC · model grok-4.3
The pith
Event cameras distinguish live faces from replays by capturing eye movement patterns that displays cannot reproduce.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Event-based representations enable reliable discrimination between genuine and replayed sequences, achieving up to 95.37% top-1 accuracy with a spiking convolutional neural network on an extended dataset of live and replayed ocular event streams.
What carries the argument
Spatio-temporal patterns in the event domain from temporal ocular dynamics, fed into a spiking convolutional neural network for classification.
If this is right
- Event features from eye regions support both ocular motion segmentation and binary liveness decisions.
- The approach operates with low latency because event data is sparse and asynchronous.
- Spiking networks process the event streams effectively for this discrimination task.
- Performance holds across the collected genuine and replayed sequences in the extended dataset.
Where Pith is reading between the lines
- Combining event-based ocular checks with existing RGB pipelines could raise the bar against presentation attacks without added latency.
- Varying the replay hardware, such as using different screen refresh rates, would provide a direct test of how display artifacts affect the event patterns.
- The same event signatures might apply to other fast facial motions like blinks or lip movements for broader anti-spoofing.
Load-bearing premise
Replay attacks on screens cannot faithfully reproduce the precise timing and patterns of natural eye movements due to temporal resampling and display artifacts.
What would settle it
A high-refresh-rate replay system that generates event streams from eye regions matching live saccades closely enough to drop classification accuracy below 70 percent.
Figures
read the original abstract
Face liveness detection has been extensively studied using RGB cameras, achieving strong performance under controlled conditions but often failing to generalize across sensors and attack scenarios. In this work, we explore event cameras as an alternative sensing modality for liveness detection based on temporal ocular dynamics. Event cameras capture sparse, asynchronous changes in brightness with microsecond resolution, enabling precise analysis of fast eye movements such as saccades. Replay attacks cannot faithfully reproduce these dynamics due to temporal resampling and display artifacts, leading to distinctive spatio-temporal patterns in the event domain. We design a data collection protocol to extend RGBE-Gaze with replay-attack recordings, yielding an event-based fake counterpart for liveness detection. We analyze event-driven temporal features from eye regions and evaluate their effectiveness for ocular motion segmentation and liveness classification. Our results show that event-based representations enable reliable discrimination between genuine and replayed sequences, achieving up to 95.37% top-1 accuracy with a spiking convolutional neural network. These preliminary findings highlight the potential of event-based sensing for robust and low-latency liveness detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript explores event cameras as a modality for face liveness detection by analyzing temporal ocular dynamics such as saccades. It extends the RGBE-Gaze dataset with replay-attack recordings, extracts event-driven features from eye regions, and reports that a spiking convolutional neural network achieves up to 95.37% top-1 accuracy in discriminating genuine from replayed sequences. The work is presented as an exploratory study highlighting potential advantages in temporal resolution and low latency.
Significance. If the discrimination holds under broader conditions, the approach could provide a new sensing modality for biometric security that exploits display artifacts in replay attacks. The alignment of spiking networks with asynchronous event data is a natural fit and could enable efficient implementations. The preliminary dataset extension and reported accuracy constitute a useful starting point, but the exploratory framing and missing validation details constrain immediate significance.
major comments (3)
- [Abstract] Abstract: the central claim of reliable discrimination (up to 95.37% top-1 accuracy) is presented without any dataset sizes, number of subjects, train/test splits, cross-validation procedure, or baseline comparisons, leaving the strength of the result impossible to assess.
- [Data collection protocol] Data collection protocol: the assumption that replay attacks necessarily produce distinguishable spatio-temporal patterns due to temporal resampling and display artifacts is load-bearing for the claim, yet no ablation varies replay hardware parameters such as refresh rate (e.g., 60 Hz vs. 240 Hz) or capture method, so the observed separation may be protocol-specific rather than inherent.
- [Results] Results section: the evaluation lacks error analysis, confusion matrices, or comparison against RGB-based liveness detectors, making it unclear whether the reported accuracy reflects true ocular dynamics or low-level sensor/display artifacts.
minor comments (2)
- [Abstract] The abstract uses 'top-1 accuracy' without clarifying the number of classes or whether it refers to per-sequence or per-subject classification.
- [Methods] Notation for event representations (e.g., how eye regions are extracted and represented as inputs to the spiking CNN) would benefit from an explicit definition or diagram.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our exploratory study. We have revised the manuscript to address the concerns about missing details in the abstract, the data collection assumptions, and the evaluation analysis. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of reliable discrimination (up to 95.37% top-1 accuracy) is presented without any dataset sizes, number of subjects, train/test splits, cross-validation procedure, or baseline comparisons, leaving the strength of the result impossible to assess.
Authors: We agree that the abstract should provide sufficient context for assessing the reported accuracy. In the revised version, we have updated the abstract to include the size of the extended RGBE-Gaze dataset (specifying the number of genuine and replay sequences), the number of subjects, the train/test split, and the cross-validation procedure employed. A brief reference to baseline considerations has also been added while preserving the exploratory framing of the work. revision: yes
-
Referee: [Data collection protocol] Data collection protocol: the assumption that replay attacks necessarily produce distinguishable spatio-temporal patterns due to temporal resampling and display artifacts is load-bearing for the claim, yet no ablation varies replay hardware parameters such as refresh rate (e.g., 60 Hz vs. 240 Hz) or capture method, so the observed separation may be protocol-specific rather than inherent.
Authors: We acknowledge that the protocol relies on a specific replay hardware setup without ablations on parameters such as refresh rate or capture method. This is a legitimate point regarding generalizability. We have expanded the discussion section to explicitly note that the observed discrimination is tied to the replay conditions used in the study and to recommend future investigations varying these hardware parameters to determine whether the effect is inherent to event-based sensing. revision: partial
-
Referee: [Results] Results section: the evaluation lacks error analysis, confusion matrices, or comparison against RGB-based liveness detectors, making it unclear whether the reported accuracy reflects true ocular dynamics or low-level sensor/display artifacts.
Authors: We agree that additional evaluation details would strengthen the presentation. The revised results section now includes confusion matrices and an error analysis to better characterize performance. For comparison to RGB-based detectors, we have added a qualitative discussion explaining the potential advantages of event cameras in capturing high-temporal-resolution ocular dynamics (which RGB cannot match due to frame-rate limitations) and clarifying that a full quantitative RGB baseline is outside the current exploratory scope but identified as future work. revision: partial
Circularity Check
No circularity; standard supervised ML pipeline on labeled dataset
full rationale
The paper collects event data by extending RGBE-Gaze with replay recordings, extracts temporal features from eye regions, and trains a spiking CNN for binary liveness classification. Reported accuracy derives from supervised training and evaluation on held-out sequences rather than any self-referential equations, fitted parameters presented as predictions, or load-bearing self-citations. No derivation step reduces the claimed discrimination to its own inputs by construction; the chain is empirical and externally falsifiable via new replay hardware or datasets.
Axiom & Free-Parameter Ledger
free parameters (1)
- spiking CNN hyperparameters and training settings
axioms (1)
- domain assumption Event cameras provide microsecond-resolution asynchronous brightness changes suitable for capturing saccades
Reference graph
Works this paper leans on
-
[1]
Alshaikhli, O
M. Alshaikhli, O. Elharrouss, S. Al-Maadeed, and A. Bouridane. Face-fake-net: The deep learning method for image face anti-spoofing detection. In2021 9th European Workshop on Visual Information Processing (EUVIP), pages 1–6, 2021
2021
-
[2]
A. N. Angelopoulos, J. N. Martel, A. P. Kohli, J. Conradt, and G. Wetzstein. Event-based near-eye gaze tracking beyond 10,000 hz.IEEE Transactions on Visualization and Computer Graphics, 27(5):2577–2586, 2021
2021
-
[3]
Apgar and M
D. Apgar and M. R. Abid. Survey of face liveness detection for unsu- pervised locations. In2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pages 0162–0168, 2021
2021
-
[4]
Atoum, Y
Y . Atoum, Y . Liu, A. Jourabloo, and X. Liu. Face anti-spoofing using patch and depth-based cnns. In2017 IEEE International Joint Conference on Biometrics (IJCB), pages 319–328, 2017
2017
-
[5]
R. W. Baloh, A. W. Sills, W. E. Kumley, and V . Honrubia. Quantitative measurement of saccade amplitude, duration, and velocity.Neurology, 25:1065 – 1065, 1975
1975
-
[6]
Barchid, B
S. Barchid, B. Allaert, A. Aissaoui, J. Mennesson, and C. C. Djeraba. Spiking-fer: Spiking neural network for facial expression recognition with event cameras. InProceedings of the 20th International Con- ference on Content-Based Multimedia Indexing, CBMI ’23, page 1–7. Association for Computing Machinery, 2023
2023
-
[7]
Barua, Y
S. Barua, Y . Miyatani, and A. Veeraraghavan. Direct face detection and video reconstruction from event cameras. In2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–9, 2016
2016
-
[8]
Becattini, L
F. Becattini, L. Berlincioni, L. Cultrera, and A. Del Bimbo. Neuromor- phic face analysis: A survey.Pattern Recognition Letters, 187:42–48, 2025
2025
-
[9]
Berlincioni, L
L. Berlincioni, L. Cultrera, C. Albisani, L. Cresti, A. Leonardo, S. Picchioni, F. Becattini, and A. Del Bimbo. Neuromorphic event- based facial expression recognition. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4109–4119, 2023
2023
-
[10]
Bissarinova, T
U. Bissarinova, T. Rakhimzhanova, D. Kenzhebalin, and H. A. Varol. Faces in event streams (fes): An annotated face dataset for event cameras.Sensors, 24(5), 2024
2024
-
[11]
Bonazzi, S
P. Bonazzi, S. Bian, G. Lippolis, Y . Li, S. Sheik, and M. Magno. Retina : Low-power eye tracking with event camera and spiking hardware. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 5684–5692, 2024
2024
-
[12]
Brandli, R
C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck. A 240 × 180 130 db 3 µs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014
2014
-
[13]
G. Chen, F. Wang, X. Yuan, Z. Li, Z. Liang, and A. Knoll. Neuro- biometric: An eye blink based biometric authentication system using an event-based neuromorphic vision sensor.IEEE/CAA Journal of Automatica Sinica, 8(1):206–218, 2021
2021
-
[14]
Q. Chen, Z. Wang, S.-C. Liu, and C. Gao. 3et: Efficient event-based eye tracking using a change-based convlstm network. In2023 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2023
2023
-
[15]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale.CoRR, abs/2010.11929, 2020
work page internal anchor Pith review arXiv 2010
-
[16]
Gallego, T
G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scara- muzza. Event-based vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180, Jan. 2022
2022
-
[17]
George and S
A. George and S. Marcel. Deep pixel-wise binary supervision for face presentation attack detection.2019 International Conference on Biometrics (ICB), pages 1–8, 2019
2019
-
[18]
S. Groenen, M. Varposhti, and M. Shahsavari. Gazescrnn: Event- based near-eye gaze tracking using a spiking neural network.ArXiv, abs/2503.16012, 2025
-
[19]
Hernandez-Ortega, J
J. Hernandez-Ortega, J. Fierrez, A. Morales, and P. Tome. Time analysis of pulse-based face anti-spoofing in visible and nir. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 657–6578, 2018
2018
-
[20]
Himmi, V
S. Himmi, V . Parret, A. Chhatkuli, and L. V . Gool. Ms-evs: Mul- tispectral event-based vision for deep learning based face detection. In2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 605–614, 2024
2024
-
[21]
Huang, X
H. Huang, X. Lin, H. Ren, Y . Zhou, and B. Cheng. Exploring temporal dynamics in event-based eye tracker. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5145–5154, 2025
2025
-
[22]
Iddrisu, W
K. Iddrisu, W. Shariff, P. Corcoran, N. E. O’Connor, J. Lemley, and S. Little. Event camera-based eye motion analysis: A survey.IEEE Access, 12:136783–136804, 2024
2024
-
[23]
Iddrisu, W
K. Iddrisu, W. Shariff, M. Stec, N. E. O’Connor, and S. Little. Eye movement classification using neuromorphic vision sensors.Journal of eye movement research, 19 1, 2026
2026
-
[24]
Khairnar, S
S. Khairnar, S. Gite, K. Kotecha, and S. D. Thepade. Face liveness detection using artificial intelligence techniques: A systematic litera- ture review and future directions.Big Data and Cognitive Computing, 7(1), 2023
2023
-
[25]
Lagorce, G
X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman. Hots: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
2017
-
[26]
C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager. Temporal convolutional networks for action segmentation and detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1003–1012, 2017
2017
-
[27]
Lenz, S.-H
G. Lenz, S.-H. Ieng, and R. Benosman. Event-based face detection and tracking using the dynamics of eye blinks.Frontiers in Neuroscience, 14, 2020
2020
-
[28]
Lucena, A
O. Lucena, A. Junior, V . H. G. Moia, R. M. de Souza, E. Valle, and R. de Alencar Lotufo. Transfer learning using convolutional neural networks for face anti-spoofing. InInternational Conference on Image Analysis and Recognition, 2017
2017
-
[29]
Mastropasqua, I
N. Mastropasqua, I. Bugueno-Cordova, R. Verschae, D. Acevedo, P. Negri, and M. E. Buemi. Exploring Spatial-Temporal Dynamics in Event-Based Facial Micro-Expression Analysis . In2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 7482–7491, Los Alamitos, CA, USA, Oct. 2025. IEEE Com- puter Society
2025
-
[30]
Mastropasqua, I
N. Mastropasqua, I. Bugueno-Cordova, R. Verschae, D. Acevedo, P. Negri, and M. E. Buemi. Event-based facial microexpression anal- ysis using spiking neural networks. In2025 15th IEEE International Conference on Pattern Recognition Systems (ICPRS), pages 1–7, 2025
2025
-
[31]
Mead and M
C. Mead and M. A. Mahowald. A silicon model of early visual processing.Neural Networks, 1:91–97, 1993
1993
-
[32]
Mentasti, F
S. Mentasti, F. Lattari, R. Santambrogio, G. Careddu, and M. Mat- teucci. Event-based eye tracking for smart eyewear. InProceedings of the 2024 Symposium on Eye Tracking Research and Applications, New York, NY , USA, 2024. Association for Computing Machinery
2024
-
[33]
Mueggler, B
E. Mueggler, B. Huber, and D. Scaramuzza. Event-based, 6-dof pose tracking for high-speed maneuvers. In2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2761–2768, 2014
2014
-
[34]
E. O. Neftci, H. Mostafa, and F. Zenke. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks.IEEE Signal Processing Magazine, 36(6):51–63, 2019
2019
-
[35]
Ramesh and H
B. Ramesh and H. Yang. Boosted kernelized correlation filters for event-based face detection. In2020 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 155–159, 2020
2020
-
[36]
Sharma and A
D. Sharma and A. Selwal. A survey on face presentation attack detection mechanisms: hitherto and future perspectives.Multimedia Systems, 29(3):1527–1577, 2023
2023
-
[37]
L. Sun, W. Huang, and M. Wu. Tir/vis correlation for liveness de- tection in face recognition. InInternational Conference on Computer Analysis of Images and Patterns, 2011
2011
-
[38]
H. M. Truong, V .-T. Ly, H. G. Tran, T.-P. Nguyen, and T. T. Doan. Dual-path enhancements in event-based eye tracking: Augmented robustness and adaptive temporal modeling. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5155– 5163, 2025
2025
-
[39]
Verschae and I
R. Verschae and I. Bugueno-Cordova. Event-based gesture and facial expression recognition: A comparative analysis.IEEE Access, 11:121269–121283, 2023
2023
-
[40]
Verschae and I
R. Verschae and I. Bugueno-Cordova. evtransfer: A transfer learning framework for event-based facial expression recognition.Neurocom- puting, 671:132641, 2026
2026
-
[41]
Y . Wu, H. Han, J. Chen, W. Zhai, Y . Cao, and Z.-j. Zha. Brat: Bidirectional relative positional attention transformer for event-based eye tracking. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5136–5144, 2025
2025
-
[42]
Z. Yu, Y . Qin, X. Li, C. Zhao, Z. Lei, and G. Zhao. Deep learning for face anti-spoofing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5609–5631, 2023
2023
-
[43]
Zhang, B
B. Zhang, B. Tondi, and M. Barni. Adversarial examples for replay at- tacks against cnn-based face recognition with anti-spoofing capability. Computer vision and image understanding, 197:102988, 2020
2020
-
[44]
G. Zhao, Y . Shen, C. Zhang, Z. Shen, Y . Zhou, and H. Wen. RGBE- Gaze: A Large-Scale Event-Based Multimodal Dataset for High Fre- quency Remote Gaze Tracking.IEEE Transactions on Pattern Analysis & Machine Intelligence, 47(01):601–615, Jan. 2025
2025
-
[45]
G. Zhao, Y . Yang, J. Liu, N. Chen, Y . Shen, H. Wen, and G. Lan. Ev-eye: Rethinking high-frequency eye tracking through the lenses of event cameras. InNeural Information Processing Systems, 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.