pith. sign in

arxiv: 2604.26285 · v1 · submitted 2026-04-29 · 💻 cs.CV

Event-based Liveness Detection using Temporal Ocular Dynamics: An Exploratory Approach

Pith reviewed 2026-05-07 13:48 UTC · model grok-4.3

classification 💻 cs.CV
keywords event cameraliveness detectionocular dynamicsspiking neural networkreplay attackface anti-spoofingtemporal featuressaccades
0
0 comments X

The pith

Event cameras distinguish live faces from replays by capturing eye movement patterns that displays cannot reproduce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests event cameras as a sensing method for face liveness detection by focusing on the timing of eye movements such as saccades. Standard RGB methods often fail to generalize, but event sensors record only brightness changes at microsecond scale, producing sparse data that highlights temporal differences. Replay attacks introduce resampling and display artifacts that create distinguishable event patterns in eye regions. The authors extended an existing gaze dataset with replay recordings and trained a spiking convolutional neural network on event features, reaching 95.37 percent top-1 accuracy. The work positions this as a low-latency alternative for robust discrimination between genuine and fake sequences.

Core claim

Event-based representations enable reliable discrimination between genuine and replayed sequences, achieving up to 95.37% top-1 accuracy with a spiking convolutional neural network on an extended dataset of live and replayed ocular event streams.

What carries the argument

Spatio-temporal patterns in the event domain from temporal ocular dynamics, fed into a spiking convolutional neural network for classification.

If this is right

  • Event features from eye regions support both ocular motion segmentation and binary liveness decisions.
  • The approach operates with low latency because event data is sparse and asynchronous.
  • Spiking networks process the event streams effectively for this discrimination task.
  • Performance holds across the collected genuine and replayed sequences in the extended dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combining event-based ocular checks with existing RGB pipelines could raise the bar against presentation attacks without added latency.
  • Varying the replay hardware, such as using different screen refresh rates, would provide a direct test of how display artifacts affect the event patterns.
  • The same event signatures might apply to other fast facial motions like blinks or lip movements for broader anti-spoofing.

Load-bearing premise

Replay attacks on screens cannot faithfully reproduce the precise timing and patterns of natural eye movements due to temporal resampling and display artifacts.

What would settle it

A high-refresh-rate replay system that generates event streams from eye regions matching live saccades closely enough to drop classification accuracy below 70 percent.

Figures

Figures reproduced from arXiv: 2604.26285 by Daniel Acevedo, Ignacio Bugueno-Cordova, Nicolas Mastropasqua, Pablo Negri, Rodrigo Verschae.

Figure 1
Figure 1. Figure 1: Samples from the RGBE-Gaze dataset extended with our replay view at source ↗
Figure 2
Figure 2. Figure 2: Proposed pipeline for event-based liveness detection using temporal ocular dynamics. A challenge–response mechanism enforces real-time responses view at source ↗
Figure 3
Figure 3. Figure 3: Experimental setup for dataset collection. A display is used to show view at source ↗
Figure 4
Figure 4. Figure 4: Normalized Activity profile for a saccade + fixation sample view at source ↗
Figure 5
Figure 5. Figure 5: Normalized difference of positive and negative activity profile of a subsequence from subject 17 during the fixation + saccade experiment. The view at source ↗
Figure 6
Figure 6. Figure 6: F1-score per subject under LOSO cross-validation. TCN uses view at source ↗
Figure 7
Figure 7. Figure 7: Each violin plot shows the distribution of Mean Event Rate, view at source ↗
read the original abstract

Face liveness detection has been extensively studied using RGB cameras, achieving strong performance under controlled conditions but often failing to generalize across sensors and attack scenarios. In this work, we explore event cameras as an alternative sensing modality for liveness detection based on temporal ocular dynamics. Event cameras capture sparse, asynchronous changes in brightness with microsecond resolution, enabling precise analysis of fast eye movements such as saccades. Replay attacks cannot faithfully reproduce these dynamics due to temporal resampling and display artifacts, leading to distinctive spatio-temporal patterns in the event domain. We design a data collection protocol to extend RGBE-Gaze with replay-attack recordings, yielding an event-based fake counterpart for liveness detection. We analyze event-driven temporal features from eye regions and evaluate their effectiveness for ocular motion segmentation and liveness classification. Our results show that event-based representations enable reliable discrimination between genuine and replayed sequences, achieving up to 95.37% top-1 accuracy with a spiking convolutional neural network. These preliminary findings highlight the potential of event-based sensing for robust and low-latency liveness detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript explores event cameras as a modality for face liveness detection by analyzing temporal ocular dynamics such as saccades. It extends the RGBE-Gaze dataset with replay-attack recordings, extracts event-driven features from eye regions, and reports that a spiking convolutional neural network achieves up to 95.37% top-1 accuracy in discriminating genuine from replayed sequences. The work is presented as an exploratory study highlighting potential advantages in temporal resolution and low latency.

Significance. If the discrimination holds under broader conditions, the approach could provide a new sensing modality for biometric security that exploits display artifacts in replay attacks. The alignment of spiking networks with asynchronous event data is a natural fit and could enable efficient implementations. The preliminary dataset extension and reported accuracy constitute a useful starting point, but the exploratory framing and missing validation details constrain immediate significance.

major comments (3)
  1. [Abstract] Abstract: the central claim of reliable discrimination (up to 95.37% top-1 accuracy) is presented without any dataset sizes, number of subjects, train/test splits, cross-validation procedure, or baseline comparisons, leaving the strength of the result impossible to assess.
  2. [Data collection protocol] Data collection protocol: the assumption that replay attacks necessarily produce distinguishable spatio-temporal patterns due to temporal resampling and display artifacts is load-bearing for the claim, yet no ablation varies replay hardware parameters such as refresh rate (e.g., 60 Hz vs. 240 Hz) or capture method, so the observed separation may be protocol-specific rather than inherent.
  3. [Results] Results section: the evaluation lacks error analysis, confusion matrices, or comparison against RGB-based liveness detectors, making it unclear whether the reported accuracy reflects true ocular dynamics or low-level sensor/display artifacts.
minor comments (2)
  1. [Abstract] The abstract uses 'top-1 accuracy' without clarifying the number of classes or whether it refers to per-sequence or per-subject classification.
  2. [Methods] Notation for event representations (e.g., how eye regions are extracted and represented as inputs to the spiking CNN) would benefit from an explicit definition or diagram.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our exploratory study. We have revised the manuscript to address the concerns about missing details in the abstract, the data collection assumptions, and the evaluation analysis. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of reliable discrimination (up to 95.37% top-1 accuracy) is presented without any dataset sizes, number of subjects, train/test splits, cross-validation procedure, or baseline comparisons, leaving the strength of the result impossible to assess.

    Authors: We agree that the abstract should provide sufficient context for assessing the reported accuracy. In the revised version, we have updated the abstract to include the size of the extended RGBE-Gaze dataset (specifying the number of genuine and replay sequences), the number of subjects, the train/test split, and the cross-validation procedure employed. A brief reference to baseline considerations has also been added while preserving the exploratory framing of the work. revision: yes

  2. Referee: [Data collection protocol] Data collection protocol: the assumption that replay attacks necessarily produce distinguishable spatio-temporal patterns due to temporal resampling and display artifacts is load-bearing for the claim, yet no ablation varies replay hardware parameters such as refresh rate (e.g., 60 Hz vs. 240 Hz) or capture method, so the observed separation may be protocol-specific rather than inherent.

    Authors: We acknowledge that the protocol relies on a specific replay hardware setup without ablations on parameters such as refresh rate or capture method. This is a legitimate point regarding generalizability. We have expanded the discussion section to explicitly note that the observed discrimination is tied to the replay conditions used in the study and to recommend future investigations varying these hardware parameters to determine whether the effect is inherent to event-based sensing. revision: partial

  3. Referee: [Results] Results section: the evaluation lacks error analysis, confusion matrices, or comparison against RGB-based liveness detectors, making it unclear whether the reported accuracy reflects true ocular dynamics or low-level sensor/display artifacts.

    Authors: We agree that additional evaluation details would strengthen the presentation. The revised results section now includes confusion matrices and an error analysis to better characterize performance. For comparison to RGB-based detectors, we have added a qualitative discussion explaining the potential advantages of event cameras in capturing high-temporal-resolution ocular dynamics (which RGB cannot match due to frame-rate limitations) and clarifying that a full quantitative RGB baseline is outside the current exploratory scope but identified as future work. revision: partial

Circularity Check

0 steps flagged

No circularity; standard supervised ML pipeline on labeled dataset

full rationale

The paper collects event data by extending RGBE-Gaze with replay recordings, extracts temporal features from eye regions, and trains a spiking CNN for binary liveness classification. Reported accuracy derives from supervised training and evaluation on held-out sequences rather than any self-referential equations, fitted parameters presented as predictions, or load-bearing self-citations. No derivation step reduces the claimed discrimination to its own inputs by construction; the chain is empirical and externally falsifiable via new replay hardware or datasets.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available so ledger is minimal; relies on standard event camera properties and ML training assumptions without explicit free parameters listed.

free parameters (1)
  • spiking CNN hyperparameters and training settings
    Accuracy result depends on model architecture and optimization choices fitted to the collected event data.
axioms (1)
  • domain assumption Event cameras provide microsecond-resolution asynchronous brightness changes suitable for capturing saccades
    Invoked in abstract to justify the sensing modality choice.

pith-pipeline@v0.9.0 · 5497 in / 1059 out tokens · 50197 ms · 2026-05-07T13:48:33.948909+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Alshaikhli, O

    M. Alshaikhli, O. Elharrouss, S. Al-Maadeed, and A. Bouridane. Face-fake-net: The deep learning method for image face anti-spoofing detection. In2021 9th European Workshop on Visual Information Processing (EUVIP), pages 1–6, 2021

  2. [2]

    A. N. Angelopoulos, J. N. Martel, A. P. Kohli, J. Conradt, and G. Wetzstein. Event-based near-eye gaze tracking beyond 10,000 hz.IEEE Transactions on Visualization and Computer Graphics, 27(5):2577–2586, 2021

  3. [3]

    Apgar and M

    D. Apgar and M. R. Abid. Survey of face liveness detection for unsu- pervised locations. In2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pages 0162–0168, 2021

  4. [4]

    Atoum, Y

    Y . Atoum, Y . Liu, A. Jourabloo, and X. Liu. Face anti-spoofing using patch and depth-based cnns. In2017 IEEE International Joint Conference on Biometrics (IJCB), pages 319–328, 2017

  5. [5]

    R. W. Baloh, A. W. Sills, W. E. Kumley, and V . Honrubia. Quantitative measurement of saccade amplitude, duration, and velocity.Neurology, 25:1065 – 1065, 1975

  6. [6]

    Barchid, B

    S. Barchid, B. Allaert, A. Aissaoui, J. Mennesson, and C. C. Djeraba. Spiking-fer: Spiking neural network for facial expression recognition with event cameras. InProceedings of the 20th International Con- ference on Content-Based Multimedia Indexing, CBMI ’23, page 1–7. Association for Computing Machinery, 2023

  7. [7]

    Barua, Y

    S. Barua, Y . Miyatani, and A. Veeraraghavan. Direct face detection and video reconstruction from event cameras. In2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–9, 2016

  8. [8]

    Becattini, L

    F. Becattini, L. Berlincioni, L. Cultrera, and A. Del Bimbo. Neuromor- phic face analysis: A survey.Pattern Recognition Letters, 187:42–48, 2025

  9. [9]

    Berlincioni, L

    L. Berlincioni, L. Cultrera, C. Albisani, L. Cresti, A. Leonardo, S. Picchioni, F. Becattini, and A. Del Bimbo. Neuromorphic event- based facial expression recognition. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 4109–4119, 2023

  10. [10]

    Bissarinova, T

    U. Bissarinova, T. Rakhimzhanova, D. Kenzhebalin, and H. A. Varol. Faces in event streams (fes): An annotated face dataset for event cameras.Sensors, 24(5), 2024

  11. [11]

    Bonazzi, S

    P. Bonazzi, S. Bian, G. Lippolis, Y . Li, S. Sheik, and M. Magno. Retina : Low-power eye tracking with event camera and spiking hardware. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 5684–5692, 2024

  12. [12]

    Brandli, R

    C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck. A 240 × 180 130 db 3 µs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014

  13. [13]

    G. Chen, F. Wang, X. Yuan, Z. Li, Z. Liang, and A. Knoll. Neuro- biometric: An eye blink based biometric authentication system using an event-based neuromorphic vision sensor.IEEE/CAA Journal of Automatica Sinica, 8(1):206–218, 2021

  14. [14]

    Q. Chen, Z. Wang, S.-C. Liu, and C. Gao. 3et: Efficient event-based eye tracking using a change-based convlstm network. In2023 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2023

  15. [15]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale.CoRR, abs/2010.11929, 2020

  16. [16]

    Gallego, T

    G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scara- muzza. Event-based vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180, Jan. 2022

  17. [17]

    George and S

    A. George and S. Marcel. Deep pixel-wise binary supervision for face presentation attack detection.2019 International Conference on Biometrics (ICB), pages 1–8, 2019

  18. [18]

    Groenen, M

    S. Groenen, M. Varposhti, and M. Shahsavari. Gazescrnn: Event- based near-eye gaze tracking using a spiking neural network.ArXiv, abs/2503.16012, 2025

  19. [19]

    Hernandez-Ortega, J

    J. Hernandez-Ortega, J. Fierrez, A. Morales, and P. Tome. Time analysis of pulse-based face anti-spoofing in visible and nir. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 657–6578, 2018

  20. [20]

    Himmi, V

    S. Himmi, V . Parret, A. Chhatkuli, and L. V . Gool. Ms-evs: Mul- tispectral event-based vision for deep learning based face detection. In2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 605–614, 2024

  21. [21]

    Huang, X

    H. Huang, X. Lin, H. Ren, Y . Zhou, and B. Cheng. Exploring temporal dynamics in event-based eye tracker. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5145–5154, 2025

  22. [22]

    Iddrisu, W

    K. Iddrisu, W. Shariff, P. Corcoran, N. E. O’Connor, J. Lemley, and S. Little. Event camera-based eye motion analysis: A survey.IEEE Access, 12:136783–136804, 2024

  23. [23]

    Iddrisu, W

    K. Iddrisu, W. Shariff, M. Stec, N. E. O’Connor, and S. Little. Eye movement classification using neuromorphic vision sensors.Journal of eye movement research, 19 1, 2026

  24. [24]

    Khairnar, S

    S. Khairnar, S. Gite, K. Kotecha, and S. D. Thepade. Face liveness detection using artificial intelligence techniques: A systematic litera- ture review and future directions.Big Data and Cognitive Computing, 7(1), 2023

  25. [25]

    Lagorce, G

    X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman. Hots: A hierarchy of event-based time-surfaces for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

  26. [26]

    C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager. Temporal convolutional networks for action segmentation and detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1003–1012, 2017

  27. [27]

    Lenz, S.-H

    G. Lenz, S.-H. Ieng, and R. Benosman. Event-based face detection and tracking using the dynamics of eye blinks.Frontiers in Neuroscience, 14, 2020

  28. [28]

    Lucena, A

    O. Lucena, A. Junior, V . H. G. Moia, R. M. de Souza, E. Valle, and R. de Alencar Lotufo. Transfer learning using convolutional neural networks for face anti-spoofing. InInternational Conference on Image Analysis and Recognition, 2017

  29. [29]

    Mastropasqua, I

    N. Mastropasqua, I. Bugueno-Cordova, R. Verschae, D. Acevedo, P. Negri, and M. E. Buemi. Exploring Spatial-Temporal Dynamics in Event-Based Facial Micro-Expression Analysis . In2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 7482–7491, Los Alamitos, CA, USA, Oct. 2025. IEEE Com- puter Society

  30. [30]

    Mastropasqua, I

    N. Mastropasqua, I. Bugueno-Cordova, R. Verschae, D. Acevedo, P. Negri, and M. E. Buemi. Event-based facial microexpression anal- ysis using spiking neural networks. In2025 15th IEEE International Conference on Pattern Recognition Systems (ICPRS), pages 1–7, 2025

  31. [31]

    Mead and M

    C. Mead and M. A. Mahowald. A silicon model of early visual processing.Neural Networks, 1:91–97, 1993

  32. [32]

    Mentasti, F

    S. Mentasti, F. Lattari, R. Santambrogio, G. Careddu, and M. Mat- teucci. Event-based eye tracking for smart eyewear. InProceedings of the 2024 Symposium on Eye Tracking Research and Applications, New York, NY , USA, 2024. Association for Computing Machinery

  33. [33]

    Mueggler, B

    E. Mueggler, B. Huber, and D. Scaramuzza. Event-based, 6-dof pose tracking for high-speed maneuvers. In2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2761–2768, 2014

  34. [34]

    E. O. Neftci, H. Mostafa, and F. Zenke. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks.IEEE Signal Processing Magazine, 36(6):51–63, 2019

  35. [35]

    Ramesh and H

    B. Ramesh and H. Yang. Boosted kernelized correlation filters for event-based face detection. In2020 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 155–159, 2020

  36. [36]

    Sharma and A

    D. Sharma and A. Selwal. A survey on face presentation attack detection mechanisms: hitherto and future perspectives.Multimedia Systems, 29(3):1527–1577, 2023

  37. [37]

    L. Sun, W. Huang, and M. Wu. Tir/vis correlation for liveness de- tection in face recognition. InInternational Conference on Computer Analysis of Images and Patterns, 2011

  38. [38]

    H. M. Truong, V .-T. Ly, H. G. Tran, T.-P. Nguyen, and T. T. Doan. Dual-path enhancements in event-based eye tracking: Augmented robustness and adaptive temporal modeling. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5155– 5163, 2025

  39. [39]

    Verschae and I

    R. Verschae and I. Bugueno-Cordova. Event-based gesture and facial expression recognition: A comparative analysis.IEEE Access, 11:121269–121283, 2023

  40. [40]

    Verschae and I

    R. Verschae and I. Bugueno-Cordova. evtransfer: A transfer learning framework for event-based facial expression recognition.Neurocom- puting, 671:132641, 2026

  41. [41]

    Y . Wu, H. Han, J. Chen, W. Zhai, Y . Cao, and Z.-j. Zha. Brat: Bidirectional relative positional attention transformer for event-based eye tracking. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5136–5144, 2025

  42. [42]

    Z. Yu, Y . Qin, X. Li, C. Zhao, Z. Lei, and G. Zhao. Deep learning for face anti-spoofing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5609–5631, 2023

  43. [43]

    Zhang, B

    B. Zhang, B. Tondi, and M. Barni. Adversarial examples for replay at- tacks against cnn-based face recognition with anti-spoofing capability. Computer vision and image understanding, 197:102988, 2020

  44. [44]

    G. Zhao, Y . Shen, C. Zhang, Z. Shen, Y . Zhou, and H. Wen. RGBE- Gaze: A Large-Scale Event-Based Multimodal Dataset for High Fre- quency Remote Gaze Tracking.IEEE Transactions on Pattern Analysis & Machine Intelligence, 47(01):601–615, Jan. 2025

  45. [45]

    G. Zhao, Y . Yang, J. Liu, N. Chen, Y . Shen, H. Wen, and G. Lan. Ev-eye: Rethinking high-frequency eye tracking through the lenses of event cameras. InNeural Information Processing Systems, 2023