pith. the verified trust layer for science. sign in

arxiv: 2604.06782 · v1 · submitted 2026-04-08 · 💻 cs.CV

EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal Modeling

Pith reviewed 2026-05-10 18:09 UTC · model grok-4.3

classification 💻 cs.CV
keywords event-based visionface recognitionspatiotemporal modelinglow-rank adaptationmotion encodingillumination robustnessbiometric privacy
0
0 comments X p. Extension

The pith

Event-based face recognition reaches 94 percent accuracy by transferring structural priors from RGB models and explicitly modeling rigid facial motion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that event cameras can support reliable face recognition by building identity representations around stable facial geometry and the temporal patterns created by rigid head motion rather than photometric appearance. This matters because event streams remain usable in extreme lighting and reveal less reconstructible identity information than conventional images. The authors first collect a small dataset of event recordings under controlled rigid motion, then adapt a pretrained RGB face model to the event domain using low-rank parameter updates. They add a motion prompt encoder to capture time-based features and a modulator that merges those features with the transferred spatial structure. The resulting system outperforms prior event and RGB baselines on identification and error rates while showing greater stability when illumination degrades.

Core claim

Event-based face recognition should model structure-driven spatiotemporal identity representations shaped by rigid facial motion and individual facial geometry. This is realized by applying Low-Rank Adaptation to transfer spatial priors from RGB face models, followed by a Motion Prompt Encoder that extracts temporal dynamics and a Spatiotemporal Modulator that fuses the two streams, yielding 94.19 percent Rank-1 identification and 5.35 percent equal error rate on the constructed EFace dataset together with improved robustness under degraded illumination and lower template reconstructability.

What carries the argument

The EventFace framework, which transfers structural facial priors via Low-Rank Adaptation, encodes temporal dynamics with a Motion Prompt Encoder, and fuses them through a Spatiotemporal Modulator.

If this is right

  • Event cameras become viable for face recognition tasks where lighting varies sharply because the model relies on geometry and motion rather than intensity values.
  • Privacy improves because the learned event templates are harder to invert into recognizable images than RGB templates.
  • Small-scale rigid-motion event collections suffice for training once RGB priors are adapted, lowering the data barrier for new sensing modalities.
  • The same structure-plus-motion design can be tested on other event-based biometric tasks such as gait or gesture recognition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach implies that event-based systems could operate at lower power and higher speed than frame-based cameras in always-on authentication scenarios.
  • Extending the motion encoder to handle non-rigid expressions would test whether the current rigid-motion focus limits applicability to real-world conversations.
  • Combining the transferred priors with raw event polarity or timestamp statistics could further tighten the spatiotemporal representation without additional labeled data.

Load-bearing premise

Structural priors learned from RGB face images can be transferred to event streams through low-rank adaptation and that a small dataset of rigid-motion event recordings supplies enough variation to learn generalizable identity representations.

What would settle it

Performance of the transferred model falling below a non-adapted event baseline on a larger dataset recorded during natural, non-rigid facial motion would falsify the transferability premise.

Figures

Figures reproduced from arXiv: 2604.06782 by Massimo Tistarelli, Qingguo Meng, Xingbo Dong, Zhe Jin.

Figure 1
Figure 1. Figure 1: (a) Traditional RGB face recognition is sensitive to lighting variations [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the EFace dataset acquisition setup. Subjects perform [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed EventFace framework. The training process is structured into two progressive stages. In Stage I, the pretrained RGB backbone [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Spatial attention visualization via Grad-CAM. Heatmaps highlight [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of the Spatiotemporal Interleaved WKV (ST-WKV). [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of the input event-frame count on recognition performance [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: CMC, DET, and ROC curves of EventFace and competing methods on the EFace benchmark. EventFace achieves the best overall identification and [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Cosine similarity distributions under the degraded illumination [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of matching examples under the degraded illumination [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of face reconstruction from leaked feature templates. [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Privacy evaluation under a worst-case raw data leakage scenario. We [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
read the original abstract

Event cameras offer a promising sensing modality for face recognition due to their inherent advantages in illumination robustness and privacy-friendliness. However, because event streams lack the stable photometric appearance relied upon by conventional RGB-based face recognition systems, we argue that event-based face recognition should model structure-driven spatiotemporal identity representations shaped by rigid facial motion and individual facial geometry. Since dedicated datasets for event-based face recognition remain lacking, we construct EFace, a small-scale event-based face dataset captured under rigid facial motion. To learn effectively from this limited event data, we further propose EventFace, a framework for event-based face recognition that integrates spatial structure and temporal dynamics for identity modeling. Specifically, we employ Low-Rank Adaptation (LoRA) to transfer structural facial priors from pretrained RGB face models to the event domain, thereby establishing a reliable spatial basis for identity modeling. Building on this foundation, we further introduce a Motion Prompt Encoder (MPE) to explicitly encode temporal features and a Spatiotemporal Modulator (STM) to fuse them with spatial features, thereby enhancing the representation of identity-relevant event patterns. Extensive experiments demonstrate that EventFace achieves the best performance among the evaluated baselines, with a Rank-1 identification rate of 94.19% and an equal error rate (EER) of 5.35%. Results further indicate that EventFace exhibits stronger robustness under degraded illumination than the competing methods. In addition, the learned representations exhibit reduced template reconstructability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces EventFace, a framework for event-based face recognition that transfers structural facial priors from pretrained RGB models using Low-Rank Adaptation (LoRA), then augments them with a Motion Prompt Encoder (MPE) to capture temporal dynamics and a Spatiotemporal Modulator (STM) to fuse spatial and temporal features. It also constructs the EFace dataset under rigid facial motion and reports that EventFace achieves 94.19% Rank-1 identification and 5.35% EER on this dataset while showing stronger robustness to degraded illumination than baselines; the learned representations are additionally claimed to exhibit reduced template reconstructability.

Significance. If the empirical claims hold under fuller validation, the work would provide a concrete approach to bridging the modality gap in event-based face recognition by exploiting rigid-motion structure rather than photometric appearance, which aligns with event cameras' strengths in illumination invariance and privacy. The release of EFace is a constructive contribution, and the privacy angle (reduced reconstructability) is a useful secondary finding. However, the small scale and rigid-motion constraint of the dataset limit the strength of any generalization argument.

major comments (3)
  1. [Experimental section] Experimental section (performance claims and EFace dataset description): The central superiority claim (94.19% Rank-1, 5.35% EER, and illumination robustness) is presented without any reported details on the number of subjects, total sequences, train/test split sizes, subject demographics, or how illumination degradation was controlled and quantified. These omissions are load-bearing because the entire evaluation rests on a newly collected, small-scale, rigid-motion dataset; without them it is impossible to determine whether the metrics reflect genuine spatiotemporal identity modeling or capture-specific artifacts.
  2. [Framework description] Framework description (LoRA transfer and MPE/STM modules): The argument that LoRA reliably transfers RGB structural priors to event streams, which then serve as the foundation for MPE and STM, is asserted without ablation studies isolating the contribution of the transferred priors versus the new modules, or any analysis of the residual modality gap. This is central to the paper's novelty claim that structure-driven modeling compensates for data scarcity.
  3. [Results on robustness] Results on robustness: The statement that EventFace exhibits stronger robustness under degraded illumination than competing methods lacks quantitative tables or figures showing per-condition metrics, the exact illumination levels tested, or statistical significance tests; this directly underpins the illumination-robustness advantage highlighted in the abstract.
minor comments (2)
  1. [Abstract] The abstract mentions 'reduced template reconstructability' as an additional benefit but provides no quantitative metric or comparison; a brief privacy evaluation would strengthen the claim.
  2. [Method] Notation for the MPE and STM modules could be clarified with explicit equations or pseudocode to make the fusion mechanism reproducible.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We agree that additional details and analyses will strengthen the paper and will incorporate revisions to address the concerns about experimental reporting, ablations, and robustness evaluation. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Experimental section] Experimental section (performance claims and EFace dataset description): The central superiority claim (94.19% Rank-1, 5.35% EER, and illumination robustness) is presented without any reported details on the number of subjects, total sequences, train/test split sizes, subject demographics, or how illumination degradation was controlled and quantified. These omissions are load-bearing because the entire evaluation rests on a newly collected, small-scale, rigid-motion dataset; without them it is impossible to determine whether the metrics reflect genuine spatiotemporal identity modeling or capture-specific artifacts.

    Authors: We acknowledge that the current manuscript provides insufficient explicit details on the EFace dataset and experimental protocol, making it difficult to fully evaluate the results. In the revised version, we will expand Section 4 with a complete description including the exact number of subjects, total sequences, train/test split sizes and ratios, subject demographics, and a step-by-step account of how illumination degradation was controlled, simulated, and quantified (including specific parameters). We will also explicitly discuss the small-scale and rigid-motion nature of the dataset as a limitation. revision: yes

  2. Referee: [Framework description] Framework description (LoRA transfer and MPE/STM modules): The argument that LoRA reliably transfers RGB structural priors to event streams, which then serve as the foundation for MPE and STM, is asserted without ablation studies isolating the contribution of the transferred priors versus the new modules, or any analysis of the residual modality gap. This is central to the paper's novelty claim that structure-driven modeling compensates for data scarcity.

    Authors: We agree that dedicated ablations are needed to substantiate the role of the LoRA-transferred priors and the added modules. Although the manuscript compares the full model against external baselines, it lacks internal ablations. In the revision, we will add experiments ablating LoRA (comparing with and without transferred priors), MPE, and STM individually, along with an analysis of the residual modality gap via performance metrics on event-only inputs. These additions will directly support the novelty argument regarding structure-driven compensation for limited event data. revision: yes

  3. Referee: [Results on robustness] Results on robustness: The statement that EventFace exhibits stronger robustness under degraded illumination than competing methods lacks quantitative tables or figures showing per-condition metrics, the exact illumination levels tested, or statistical significance tests; this directly underpins the illumination-robustness advantage highlighted in the abstract.

    Authors: We recognize that the robustness claim requires more granular evidence than currently provided. The manuscript summarizes the advantage but omits breakdowns. In the revised version, we will include a new table or figure with per-condition Rank-1 and EER metrics across the tested illumination levels, specify the exact degradation parameters used, and report statistical significance tests (e.g., paired t-tests or Wilcoxon tests) comparing EventFace to baselines. This will provide quantitative support for the illumination-robustness advantage. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent dataset and modules

full rationale

The paper constructs a new small-scale EFace dataset under rigid motion and proposes EventFace with LoRA transfer from RGB priors plus MPE and STM modules. All reported results (94.19% Rank-1, 5.35% EER, illumination robustness) are direct empirical measurements on this dataset against baselines. No equations, first-principles derivations, or predictions are presented that reduce to fitted parameters or self-citations by construction. The central claims rest on experimental comparison rather than any self-definitional or load-bearing self-referential step.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that RGB-trained facial structure priors remain useful after LoRA adaptation to event data and that rigid-motion event streams contain sufficient identity signal. No free parameters or invented physical entities are introduced in the abstract.

axioms (2)
  • domain assumption Pretrained RGB face models contain transferable structural priors for facial identity that can be adapted to event data via LoRA.
    Invoked to justify the spatial basis for identity modeling.
  • domain assumption Rigid facial motion produces event patterns that are identity-discriminative when combined with spatial structure.
    Underpins the design of the Motion Prompt Encoder and Spatiotemporal Modulator.

pith-pipeline@v0.9.0 · 5564 in / 1573 out tokens · 61808 ms · 2026-05-10T18:09:00.379431+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages

  1. [1]

    A 128×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,

    P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,”IEEE J. Solid- State Circuits, vol. 43, no. 2, pp. 566–576, Feb. 2008

  2. [2]

    A 240× 180 130 dB 3µs latency global shutter spatiotemporal vision sensor,

    C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck, “A 240× 180 130 dB 3µs latency global shutter spatiotemporal vision sensor,” IEEE J. Solid-State Circuits, vol. 49, no. 10, pp. 2333–2341, Oct. 2014

  3. [3]

    Event- based vision: A survey,

    G. Gallego, T. Delbr ¨uck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidiset al., “Event- based vision: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, Jan. 2020

  4. [4]

    EventVGGT: Exploring Cross-Modal Distil- lation for Consistent Event-based Depth Estimation,

    Y . Ren, J. Zhu, K. Chen, Z. Li, J. Ou, Z. Cao, T. Hua, P. Shi, Y . Fu, W. Zhaoet al., “EventVGGT: Exploring Cross-Modal Distil- lation for Consistent Event-based Depth Estimation,”arXiv preprint arXiv:2603.09385, 2026

  5. [5]

    Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation,

    J. Zhu, T. Pan, Z. Cao, Y . Liu, J. T. Kwok, and H. Xiong, “Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation,” inProc. Int. Conf. Comput. Vis., 2025, pp. 5146–5155

  6. [6]

    When person re- identification meets event camera: a benchmark dataset and an attribute- guided re-identification framework,

    X. Wang, Q. Zhu, S. Wu, B. Jiang, and S. Zhang, “When person re- identification meets event camera: a benchmark dataset and an attribute- guided re-identification framework,” inProc. Conf. Assoc. Advance. Artif. Intell., vol. 40, no. 12, 2026, pp. 10 172–10 180

  7. [7]

    Bullying10k: A large-scale neuromorphic dataset towards privacy-preserving bullying recognition,

    Y . Dong, Y . Li, D. Zhao, G. Shen, and Y . Zeng, “Bullying10k: A large-scale neuromorphic dataset towards privacy-preserving bullying recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 36, pp. 1923– 1937, 2023

  8. [8]

    A survey of face recognition,

    X. Wang, J. Peng, S. Zhang, B. Chen, Y . Wang, and Y . Guo, “A survey of face recognition,”arXiv preprint arXiv:2212.13038, 2022

  9. [9]

    A survey on deep learning based face recogni- tion,

    G. Guo and N. Zhang, “A survey on deep learning based face recogni- tion,”Comput. Vis. Image Underst., vol. 189, p. 102805, 2019

  10. [10]

    Facenet: A unified em- bedding for face recognition and clustering,

    F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified em- bedding for face recognition and clustering,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2015, pp. 815–823

  11. [11]

    Arcface: Additive angular margin loss for deep face recognition,

    J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4690–4699

  12. [12]

    Adaface: Quality adaptive margin for face recognition,

    M. Kim, A. K. Jain, and X. Liu, “Adaface: Quality adaptive margin for face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 18 750–18 759

  13. [13]

    Deep learning for face recognition: a critical analysis,

    A. J. Shepley, “Deep learning for face recognition: a critical analysis,” arXiv preprint arXiv:1907.12739, 2019

  14. [14]

    Low-light face recognition for mo- bile robots,

    C. Baek, J. W. Song, and K. Kong, “Low-light face recognition for mo- bile robots,” inProc. Int. Tech. Conf. Circuits/Syst., Comput., Commun. (ITC-CSCC), 2025, pp. 1–5

  15. [15]

    Ensuring privacy in face recognition: a survey on data generation, inference and storage,

    Z. Sun and Z. Liu, “Ensuring privacy in face recognition: a survey on data generation, inference and storage,”Discov. Appl. Sci., vol. 7, no. 5, p. 441, 2025

  16. [16]

    Controllable inversion of black-box face recognition models via diffusion,

    M. Kansy, A. Ra ¨el, G. Mignone, J. Naruniec, C. Schroers, M. Gross, and R. M. Weber, “Controllable inversion of black-box face recognition models via diffusion,” inProc. Int. Conf. Comput. Vis., 2023, pp. 3167– 3177

  17. [17]

    Vec2face: Unveil human faces from their blackbox features in face recognition,

    C. N. Duong, T.-D. Truong, K. Luu, K. G. Quach, H. Bui, and K. Roy, “Vec2face: Unveil human faces from their blackbox features in face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6132–6141

  18. [18]

    Neuromorphic facial analysis with cross-modal supervision,

    F. Becattini, L. Cultrera, L. Berlincioni, C. Ferrari, A. Leonardo, and A. Del Bimbo, “Neuromorphic facial analysis with cross-modal supervision,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 205–223

  19. [19]

    Lora: Low-rank adaptation of large language models

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

  20. [20]

    Deep face recognition: A survey,

    M. Wang and W. Deng, “Deep face recognition: A survey,”Neurocom- puting, vol. 429, pp. 215–244, 2021

  21. [21]

    Deep learning face representation from predicting 10,000 classes,

    Y . Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1891–1898

  22. [22]

    Deepface: Closing the gap to human-level performance in face verification,

    Y . Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face verification,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1701–1708

  23. [23]

    Transface: Calibrating transformer training for face recognition from a data-centric perspective,

    J. Dan, Y . Liu, H. Xie, J. Deng, H. Xie, X. Xie, and B. Sun, “Transface: Calibrating transformer training for face recognition from a data-centric perspective,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 20 642–20 653

  24. [24]

    Transface++: Rethinking the face recognition paradigm with a focus on accuracy, efficiency, and security,

    J. Dan, Y . Liu, B. Sun, J. Deng, and S. Luo, “Transface++: Rethinking the face recognition paradigm with a focus on accuracy, efficiency, and security,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 2, pp. 1243–1261, Feb. 2026

  25. [25]

    Ms-celeb-1m: A dataset and benchmark for large-scale face recognition,

    Y . Guo, L. Zhang, Y . Hu, X. He, and J. Gao, “Ms-celeb-1m: A dataset and benchmark for large-scale face recognition,” inProc. Eur. Conf. Comput. Vis., 2016, pp. 87–102

  26. [26]

    Vggface2: A dataset for recognising faces across pose and age,

    Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2: A dataset for recognising faces across pose and age,” inProc. 13th IEEE Int. Conf. Autom. Face Gesture Recognit., 2018, pp. 67–74

  27. [27]

    The megaface benchmark: 1 million faces for recognition at scale,

    I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard, “The megaface benchmark: 1 million faces for recognition at scale,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4873– 4882

  28. [28]

    Deep learning face representa- tion by joint identification-verification,

    Y . Sun, Y . Chen, X. Wang, and X. Tang, “Deep learning face representa- tion by joint identification-verification,”Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014

  29. [29]

    Sphereface: Deep hypersphere embedding for face recognition,

    W. Liu, Y . Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 212–220

  30. [30]

    Cosface: Large margin cosine loss for deep face recognition,

    H. Wang, Y . Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5265–5274

  31. [31]

    Low-facenet: face recognition-driven low-light image enhancement,

    Y . Fan, Y . Wang, D. Liang, Y . Chen, H. Xie, F. L. Wang, J. Li, and M. Wei, “Low-facenet: face recognition-driven low-light image enhancement,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–13, Mar. 2024

  32. [32]

    On the reconstruction of face images from deep face templates,

    G. Mai, K. Cao, P. C. Yuen, and A. K. Jain, “On the reconstruction of face images from deep face templates,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 5, pp. 1188–1202, May 2019

  33. [33]

    IDFace: Face Template Protection for Efficient and Secure Identification,

    S. Kim, S. Paik, C. Hwang, D. Kim, J. Shin, and J. H. Seo, “IDFace: Face Template Protection for Efficient and Secure Identification,” inProc. Int. Conf. Comput. Vis., 2025, pp. 13 995–14 005

  34. [34]

    Stable hash generation for efficient privacy-preserving face identification,

    D. Osorio-Roig, C. Rathgeb, P. Drozdowski, and C. Busch, “Stable hash generation for efficient privacy-preserving face identification,”IEEE Trans. Biometrics, Behav., Identity Sci., vol. 4, no. 3, pp. 333–348, 2021

  35. [35]

    HEBI: Homomor- phically encrypted biometric indexing,

    P. Bauspieß, M. Grimmer, C. Fougner, D. Le Vasseur, T. T. St ¨ocklin, C. Rathgeb, J. Kolberg, A. Costache, and C. Busch, “HEBI: Homomor- phically encrypted biometric indexing,” inProc. IEEE Int. Joint Conf. Biometrics, 2023, pp. 1–10

  36. [36]

    Privacy-preserving face recognition using trainable feature subtraction,

    Y . Mi, Z. Zhong, Y . Huang, J. Ji, J. Xu, J. Wang, S. Wang, S. Ding, and S. Zhou, “Privacy-preserving face recognition using trainable feature subtraction,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 297–307

  37. [37]

    PRO-face: A generic framework for privacy-preserving recognizable obfuscation of face images,

    L. Yuan, L. Liu, X. Pu, Z. Li, H. Li, and X. Gao, “PRO-face: A generic framework for privacy-preserving recognizable obfuscation of face images,” inProc. 30th ACM Int. Conf. Multimedia, 2022, pp. 1661– 1669

  38. [38]

    PRO-face C: Privacy-preserving recognition of obfuscated face via feature compensation,

    L. Yuan, W. Chen, X. Pu, Y . Zhang, H. Li, Y . Zhang, X. Gao, and T. Ebrahimi, “PRO-face C: Privacy-preserving recognition of obfuscated face via feature compensation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 4930–4944, Apr. 2024

  39. [39]

    Privacy-preserving adversarial facial features,

    Z. Wang, H. Wang, S. Jin, W. Zhang, J. Hu, Y . Wang, P. Sun, W. Yuan, K. Liu, and K. Ren, “Privacy-preserving adversarial facial features,” in JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14 Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 8212– 8221

  40. [40]

    Recent event camera innovations: A survey,

    B. Chakravarthi, A. A. Verma, K. Daniilidis, C. Fermuller, and Y . Yang, “Recent event camera innovations: A survey,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 342–376

  41. [41]

    evtransfer: A transfer learning framework for event-based facial expression recognition,

    R. Verschae and I. Bugueno-Cordova, “evtransfer: A transfer learning framework for event-based facial expression recognition,”arXiv preprint arXiv:2508.03609, 2025

  42. [42]

    Spatio-temporal transformers for action unit classification with event cameras,

    L. Cultrera, F. Becattini, L. Berlincioni, C. Ferrari, and A. Del Bimbo, “Spatio-temporal transformers for action unit classification with event cameras,”Comput. Vis. Image Underst., p. 104578, 2025

  43. [43]

    Exploring spatial-temporal dynamics in event- based facial micro-expression analysis,

    N. Mastropasqua, I. Bugueno-Cordova, R. Verschae, D. Acevedo, P. Ne- gri, and M. E. Buemi, “Exploring spatial-temporal dynamics in event- based facial micro-expression analysis,” inProc. Int. Conf. Comput. Vis., 2025, pp. 4723–4732

  44. [44]

    Spiking-fer: spiking neural network for facial expression recognition with event cameras,

    S. Barchid, B. Allaert, A. Aissaoui, J. Mennesson, and C. C. Djeraba, “Spiking-fer: spiking neural network for facial expression recognition with event cameras,” inProc. 20th Int. Conf. Content-Based Multimedia Indexing, 2023, pp. 1–7

  45. [45]

    Real-time multi-task facial analytics with event cameras,

    C. Ryan, A. Elrasad, W. Shariff, J. Lemley, P. Kielty, P. Hurney, and P. Corcoran, “Real-time multi-task facial analytics with event cameras,” IEEE Access, vol. 11, pp. 76 964–76 976, 2023

  46. [46]

    Event-based facial keypoint alignment via cross-modal fusion attention and self-supervised multi-event repre- sentation learning,

    D. Kang, J. Kim, and D. Kang, “Event-based facial keypoint alignment via cross-modal fusion attention and self-supervised multi-event repre- sentation learning,”arXiv preprint arXiv:2509.24968, 2025

  47. [47]

    Evaluation of convolutional networks for event camera face pose alignment,

    B. B. Oral, A. C ¸ akıcı, and A. Savran, “Evaluation of convolutional networks for event camera face pose alignment,”Acad. Platform J. Eng. Smart Syst., vol. 13, no. 2, pp. 22–30, 2025

  48. [48]

    Evaluating image-based face and eye tracking with event cameras,

    K. Iddrisu, W. Shariff, N. E. O’Connor, J. Lemley, and S. Little, “Evaluating image-based face and eye tracking with event cameras,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 224–240

  49. [49]

    Event-based multi-task facial landmark and blink detection,

    P. Kielty, C. Ryan, W. Shariff, J. Lemley, and P. Corcoran, “Event-based multi-task facial landmark and blink detection,”IEEE Access, 2025

  50. [50]

    Event camera data pre-training,

    Y . Yang, L. Pan, and L. Liu, “Event camera data pre-training,” inProc. Int. Conf. Comput. Vis., 2023, pp. 10 699–10 709

  51. [51]

    Masked event modeling: Self-supervised pretraining for event cameras,

    S. Klenk, D. Bonello, L. Koestler, N. Araslanov, and D. Cremers, “Masked event modeling: Self-supervised pretraining for event cameras,” inProc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024, pp. 2378– 2388

  52. [52]

    Learning to exploit multiple vision modalities by using grafted networks,

    Y . Hu, T. Delbruck, and S.-C. Liu, “Learning to exploit multiple vision modalities by using grafted networks,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 85–101

  53. [53]

    Eventclip: Adapting clip for event-based object recognition.arXiv preprint arXiv:2306.06354, 2023

    Z. Wu, X. Liu, and I. Gilitschenski, “Eventclip: Adapting clip for event- based object recognition,”arXiv preprint arXiv:2306.06354, 2023

  54. [54]

    Velora: A low-rank adaptation approach for efficient rgb-event based recognition,

    L. Chen, H. Yang, P. Shao, H. Song, X. Wang, Z. Zhao, Y . Wang, and Y . Tian, “Velora: A low-rank adaptation approach for efficient rgb-event based recognition,”arXiv preprint arXiv:2412.20064, 2024

  55. [55]

    Spiking transfer learning from rgb image to neuromorphic event stream,

    Q. Zhan, G. Liu, X. Xie, R. Tao, M. Zhang, and H. Tang, “Spiking transfer learning from rgb image to neuromorphic event stream,”IEEE Trans. Image Process., vol. 3, pp. 4274–4287, Jul. 2024

  56. [56]

    Leveraging rgb images for pre-training of event-based hand pose estimation,

    R. Liu, T. Ohkawa, T. H. E. Tse, M. Zhang, A. Yao, and Y . Sato, “Leveraging rgb images for pre-training of event-based hand pose estimation,”arXiv preprint arXiv:2509.16949, 2025

  57. [57]

    Low-rank adaptation for foundation models: A comprehensive review.arXiv preprint arXiv:2501.00365,

    M. Yang, J. Chen, J. Tao, Y . Zhang, J. Liu, J. Zhang, Q. Ma, H. Verma, R. Zhang, M. Zhouet al., “Low-rank adaptation for foundation models: A comprehensive review,”arXiv preprint arXiv:2501.00365, 2024

  58. [58]

    Grad-cam: Visual explanations from deep networks via gradient-based localization,

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” inProc. Int. Conf. Comput. Vis., 2017, pp. 618–626

  59. [59]

    Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures,

    Y . Duan, W. Wang, Z. Chen, X. Zhu, L. Lu, T. Lu, Y . Qiao, H. Li, J. Dai, and W. Wang, “Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures,” inProc. Int. Conf. Learn. Representations, 2024

  60. [60]

    Restore-rwkv: Efficient and effective medical image restoration with rwkv,

    Z. Yang, J. Li, H. Zhang, D. Zhao, B. Wei, and Y . Xu, “Restore-rwkv: Efficient and effective medical image restoration with rwkv,”IEEE J. Biomed. Health Inform., vol. 30, no. 1, Jan. 2026

  61. [61]

    Webface260m: A benchmark unveiling the power of million-scale deep face recognition,

    Z. Zhu, G. Huang, J. Deng, Y . Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Duet al., “Webface260m: A benchmark unveiling the power of million-scale deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 492–10 502

  62. [62]

    Magface: A universal representation for face recognition and quality assessment,

    Q. Meng, S. Zhao, Z. Huang, and F. Zhou, “Magface: A universal representation for face recognition and quality assessment,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 225– 14 234

  63. [63]

    Sphereface2: Binary classification is all you need for deep face recognition,

    Y . Wen, W. Liu, A. Weller, B. Raj, and R. Singh, “Sphereface2: Binary classification is all you need for deep face recognition,” inProc. Int. Conf. Learn. Representations, 2022

  64. [64]

    Uniface: Unified cross- entropy loss for deep face recognition,

    J. Zhou, X. Jia, Q. Li, L. Shen, and J. Duan, “Uniface: Unified cross- entropy loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 20 730–20 739

  65. [65]

    Unitsface: Unified threshold integrated sample-to-sample loss for face recognition,

    X. Jia, J. Zhou, L. Shen, J. Duanet al., “Unitsface: Unified threshold integrated sample-to-sample loss for face recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 36, pp. 32 732–32 747, 2023

  66. [66]

    RVface: Reliable vector guided softmax loss for face recognition,

    X. Wang, S. Wang, Y . Liang, L. Gu, and Z. Lei, “RVface: Reliable vector guided softmax loss for face recognition,”IEEE Trans. Image Process., vol. 31, pp. 2337–2351, Mar. 2022

  67. [67]

    Topofr: A closer look at topology alignment on face recognition,

    J. Dan, Y . Liu, J. Deng, H. Xie, S. Li, B. Sun, and S. Luo, “Topofr: A closer look at topology alignment on face recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 37, pp. 37 213–37 240, 2024

  68. [68]

    Face reconstruction from partially leaked facial embeddings,

    H. O. Shahreza and S. Marcel, “Face reconstruction from partially leaked facial embeddings,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2024, pp. 4930–4934

  69. [69]

    High speed and high dynamic range video with an event camera,

    H. Rebecq, R. Ranftl, V . Koltun, and D. Scaramuzza, “High speed and high dynamic range video with an event camera,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 6, pp. 1964–1980, Dec. 2019

  70. [70]

    Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,

    P. R. G. Cadena, Y . Qian, C. Wang, and M. Yang, “Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2023, pp. 4150–4158. Qingguo Mengreceived his B.Eng. degree in com- puter science and technology from Henan Poly- technic Universi...