arxiv: 2604.06782 · v1 · submitted 2026-04-08 · 💻 cs.CV

EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal Modeling

Qingguo Meng , Xingbo Dong , Zhe Jin , Massimo Tistarelli This is my paper

Pith reviewed 2026-05-10 18:09 UTC · model grok-4.3

classification 💻 cs.CV

keywords event-based visionface recognitionspatiotemporal modelinglow-rank adaptationmotion encodingillumination robustnessbiometric privacy

0 comments p. Extension

The pith

Event-based face recognition reaches 94 percent accuracy by transferring structural priors from RGB models and explicitly modeling rigid facial motion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that event cameras can support reliable face recognition by building identity representations around stable facial geometry and the temporal patterns created by rigid head motion rather than photometric appearance. This matters because event streams remain usable in extreme lighting and reveal less reconstructible identity information than conventional images. The authors first collect a small dataset of event recordings under controlled rigid motion, then adapt a pretrained RGB face model to the event domain using low-rank parameter updates. They add a motion prompt encoder to capture time-based features and a modulator that merges those features with the transferred spatial structure. The resulting system outperforms prior event and RGB baselines on identification and error rates while showing greater stability when illumination degrades.

Core claim

Event-based face recognition should model structure-driven spatiotemporal identity representations shaped by rigid facial motion and individual facial geometry. This is realized by applying Low-Rank Adaptation to transfer spatial priors from RGB face models, followed by a Motion Prompt Encoder that extracts temporal dynamics and a Spatiotemporal Modulator that fuses the two streams, yielding 94.19 percent Rank-1 identification and 5.35 percent equal error rate on the constructed EFace dataset together with improved robustness under degraded illumination and lower template reconstructability.

What carries the argument

The EventFace framework, which transfers structural facial priors via Low-Rank Adaptation, encodes temporal dynamics with a Motion Prompt Encoder, and fuses them through a Spatiotemporal Modulator.

If this is right

Event cameras become viable for face recognition tasks where lighting varies sharply because the model relies on geometry and motion rather than intensity values.
Privacy improves because the learned event templates are harder to invert into recognizable images than RGB templates.
Small-scale rigid-motion event collections suffice for training once RGB priors are adapted, lowering the data barrier for new sensing modalities.
The same structure-plus-motion design can be tested on other event-based biometric tasks such as gait or gesture recognition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach implies that event-based systems could operate at lower power and higher speed than frame-based cameras in always-on authentication scenarios.
Extending the motion encoder to handle non-rigid expressions would test whether the current rigid-motion focus limits applicability to real-world conversations.
Combining the transferred priors with raw event polarity or timestamp statistics could further tighten the spatiotemporal representation without additional labeled data.

Load-bearing premise

Structural priors learned from RGB face images can be transferred to event streams through low-rank adaptation and that a small dataset of rigid-motion event recordings supplies enough variation to learn generalizable identity representations.

What would settle it

Performance of the transferred model falling below a non-adapted event baseline on a larger dataset recorded during natural, non-rigid facial motion would falsify the transferability premise.

Figures

Figures reproduced from arXiv: 2604.06782 by Massimo Tistarelli, Qingguo Meng, Xingbo Dong, Zhe Jin.

**Figure 2.** Figure 2: Overview of the EFace dataset acquisition setup. Subjects perform [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed EventFace framework. The training process is structured into two progressive stages. In Stage I, the pretrained RGB backbone [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Spatial attention visualization via Grad-CAM. Heatmaps highlight [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of the Spatiotemporal Interleaved WKV (ST-WKV). [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of the input event-frame count on recognition performance [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: CMC, DET, and ROC curves of EventFace and competing methods on the EFace benchmark. EventFace achieves the best overall identification and [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Cosine similarity distributions under the degraded illumination [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of matching examples under the degraded illumination [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Visualization of face reconstruction from leaked feature templates. [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: Privacy evaluation under a worst-case raw data leakage scenario. We [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

read the original abstract

Event cameras offer a promising sensing modality for face recognition due to their inherent advantages in illumination robustness and privacy-friendliness. However, because event streams lack the stable photometric appearance relied upon by conventional RGB-based face recognition systems, we argue that event-based face recognition should model structure-driven spatiotemporal identity representations shaped by rigid facial motion and individual facial geometry. Since dedicated datasets for event-based face recognition remain lacking, we construct EFace, a small-scale event-based face dataset captured under rigid facial motion. To learn effectively from this limited event data, we further propose EventFace, a framework for event-based face recognition that integrates spatial structure and temporal dynamics for identity modeling. Specifically, we employ Low-Rank Adaptation (LoRA) to transfer structural facial priors from pretrained RGB face models to the event domain, thereby establishing a reliable spatial basis for identity modeling. Building on this foundation, we further introduce a Motion Prompt Encoder (MPE) to explicitly encode temporal features and a Spatiotemporal Modulator (STM) to fuse them with spatial features, thereby enhancing the representation of identity-relevant event patterns. Extensive experiments demonstrate that EventFace achieves the best performance among the evaluated baselines, with a Rank-1 identification rate of 94.19% and an equal error rate (EER) of 5.35%. Results further indicate that EventFace exhibits stronger robustness under degraded illumination than the competing methods. In addition, the learned representations exhibit reduced template reconstructability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EventFace adds a new small rigid-motion dataset and a LoRA-plus-motion-prompt architecture for event face recognition, with claimed 94% rank-1 numbers that still need checking for generalization.

read the letter

The paper introduces the EFace dataset of event streams from faces under rigid motion and the EventFace method that transfers RGB face structure via LoRA, then adds a Motion Prompt Encoder for temporal signals and a Spatiotemporal Modulator to blend them. This targets the core problem that event data has no stable intensity appearance, so identity has to come from geometry and motion patterns instead. The approach is new for this niche and fills a gap where prior event work was mostly on other tasks. It reports clear gains over baselines at 94.19% rank-1 and 5.35% EER, plus better behavior under degraded light and lower template reconstructability for privacy. Those are concrete results worth noting. The main soft spot is the data itself. Small scale plus rigid-motion capture limits pose, expression, and natural head movement, so the high numbers could partly reflect capture artifacts rather than robust identity features. The LoRA transfer helps with data scarcity but still assumes RGB priors map cleanly to event streams, which is plausible but not automatic. The abstract gives no subject count, cross-validation details, or statistical tests, so the full paper has to show those are handled fairly or the robustness claim stays provisional. This is for people already working on event cameras or privacy-focused biometrics who need a starting dataset and baseline method. It is coherent on its own terms and engages the right literature without circular claims. I would send it to peer review so the experimental section gets proper scrutiny on dataset size and generalization.

Referee Report

3 major / 2 minor

Summary. The paper introduces EventFace, a framework for event-based face recognition that transfers structural facial priors from pretrained RGB models using Low-Rank Adaptation (LoRA), then augments them with a Motion Prompt Encoder (MPE) to capture temporal dynamics and a Spatiotemporal Modulator (STM) to fuse spatial and temporal features. It also constructs the EFace dataset under rigid facial motion and reports that EventFace achieves 94.19% Rank-1 identification and 5.35% EER on this dataset while showing stronger robustness to degraded illumination than baselines; the learned representations are additionally claimed to exhibit reduced template reconstructability.

Significance. If the empirical claims hold under fuller validation, the work would provide a concrete approach to bridging the modality gap in event-based face recognition by exploiting rigid-motion structure rather than photometric appearance, which aligns with event cameras' strengths in illumination invariance and privacy. The release of EFace is a constructive contribution, and the privacy angle (reduced reconstructability) is a useful secondary finding. However, the small scale and rigid-motion constraint of the dataset limit the strength of any generalization argument.

major comments (3)

[Experimental section] Experimental section (performance claims and EFace dataset description): The central superiority claim (94.19% Rank-1, 5.35% EER, and illumination robustness) is presented without any reported details on the number of subjects, total sequences, train/test split sizes, subject demographics, or how illumination degradation was controlled and quantified. These omissions are load-bearing because the entire evaluation rests on a newly collected, small-scale, rigid-motion dataset; without them it is impossible to determine whether the metrics reflect genuine spatiotemporal identity modeling or capture-specific artifacts.
[Framework description] Framework description (LoRA transfer and MPE/STM modules): The argument that LoRA reliably transfers RGB structural priors to event streams, which then serve as the foundation for MPE and STM, is asserted without ablation studies isolating the contribution of the transferred priors versus the new modules, or any analysis of the residual modality gap. This is central to the paper's novelty claim that structure-driven modeling compensates for data scarcity.
[Results on robustness] Results on robustness: The statement that EventFace exhibits stronger robustness under degraded illumination than competing methods lacks quantitative tables or figures showing per-condition metrics, the exact illumination levels tested, or statistical significance tests; this directly underpins the illumination-robustness advantage highlighted in the abstract.

minor comments (2)

[Abstract] The abstract mentions 'reduced template reconstructability' as an additional benefit but provides no quantitative metric or comparison; a brief privacy evaluation would strengthen the claim.
[Method] Notation for the MPE and STM modules could be clarified with explicit equations or pseudocode to make the fusion mechanism reproducible.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We agree that additional details and analyses will strengthen the paper and will incorporate revisions to address the concerns about experimental reporting, ablations, and robustness evaluation. Our point-by-point responses follow.

read point-by-point responses

Referee: [Experimental section] Experimental section (performance claims and EFace dataset description): The central superiority claim (94.19% Rank-1, 5.35% EER, and illumination robustness) is presented without any reported details on the number of subjects, total sequences, train/test split sizes, subject demographics, or how illumination degradation was controlled and quantified. These omissions are load-bearing because the entire evaluation rests on a newly collected, small-scale, rigid-motion dataset; without them it is impossible to determine whether the metrics reflect genuine spatiotemporal identity modeling or capture-specific artifacts.

Authors: We acknowledge that the current manuscript provides insufficient explicit details on the EFace dataset and experimental protocol, making it difficult to fully evaluate the results. In the revised version, we will expand Section 4 with a complete description including the exact number of subjects, total sequences, train/test split sizes and ratios, subject demographics, and a step-by-step account of how illumination degradation was controlled, simulated, and quantified (including specific parameters). We will also explicitly discuss the small-scale and rigid-motion nature of the dataset as a limitation. revision: yes
Referee: [Framework description] Framework description (LoRA transfer and MPE/STM modules): The argument that LoRA reliably transfers RGB structural priors to event streams, which then serve as the foundation for MPE and STM, is asserted without ablation studies isolating the contribution of the transferred priors versus the new modules, or any analysis of the residual modality gap. This is central to the paper's novelty claim that structure-driven modeling compensates for data scarcity.

Authors: We agree that dedicated ablations are needed to substantiate the role of the LoRA-transferred priors and the added modules. Although the manuscript compares the full model against external baselines, it lacks internal ablations. In the revision, we will add experiments ablating LoRA (comparing with and without transferred priors), MPE, and STM individually, along with an analysis of the residual modality gap via performance metrics on event-only inputs. These additions will directly support the novelty argument regarding structure-driven compensation for limited event data. revision: yes
Referee: [Results on robustness] Results on robustness: The statement that EventFace exhibits stronger robustness under degraded illumination than competing methods lacks quantitative tables or figures showing per-condition metrics, the exact illumination levels tested, or statistical significance tests; this directly underpins the illumination-robustness advantage highlighted in the abstract.

Authors: We recognize that the robustness claim requires more granular evidence than currently provided. The manuscript summarizes the advantage but omits breakdowns. In the revised version, we will include a new table or figure with per-condition Rank-1 and EER metrics across the tested illumination levels, specify the exact degradation parameters used, and report statistical significance tests (e.g., paired t-tests or Wilcoxon tests) comparing EventFace to baselines. This will provide quantitative support for the illumination-robustness advantage. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent dataset and modules

full rationale

The paper constructs a new small-scale EFace dataset under rigid motion and proposes EventFace with LoRA transfer from RGB priors plus MPE and STM modules. All reported results (94.19% Rank-1, 5.35% EER, illumination robustness) are direct empirical measurements on this dataset against baselines. No equations, first-principles derivations, or predictions are presented that reduce to fitted parameters or self-citations by construction. The central claims rest on experimental comparison rather than any self-definitional or load-bearing self-referential step.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that RGB-trained facial structure priors remain useful after LoRA adaptation to event data and that rigid-motion event streams contain sufficient identity signal. No free parameters or invented physical entities are introduced in the abstract.

axioms (2)

domain assumption Pretrained RGB face models contain transferable structural priors for facial identity that can be adapted to event data via LoRA.
Invoked to justify the spatial basis for identity modeling.
domain assumption Rigid facial motion produces event patterns that are identity-discriminative when combined with spatial structure.
Underpins the design of the Motion Prompt Encoder and Spatiotemporal Modulator.

pith-pipeline@v0.9.0 · 5564 in / 1573 out tokens · 61808 ms · 2026-05-10T18:09:00.379431+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we employ Low-Rank Adaptation (LoRA) to transfer structural facial priors from pretrained RGB face models... Motion Prompt Encoder (MPE) to explicitly encode temporal features and a Spatiotemporal Modulator (STM) to fuse them
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and 8-tick period unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

T=4 consecutive frames... accumulation interval ΔT=50 ms... total temporal receptive field of 200 ms

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages

[1]

A 128×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,

P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,”IEEE J. Solid- State Circuits, vol. 43, no. 2, pp. 566–576, Feb. 2008

work page 2008
[2]

A 240× 180 130 dB 3µs latency global shutter spatiotemporal vision sensor,

C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck, “A 240× 180 130 dB 3µs latency global shutter spatiotemporal vision sensor,” IEEE J. Solid-State Circuits, vol. 49, no. 10, pp. 2333–2341, Oct. 2014

work page 2014
[3]

Event- based vision: A survey,

G. Gallego, T. Delbr ¨uck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidiset al., “Event- based vision: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, Jan. 2020

work page 2020
[4]

EventVGGT: Exploring Cross-Modal Distil- lation for Consistent Event-based Depth Estimation,

Y . Ren, J. Zhu, K. Chen, Z. Li, J. Ou, Z. Cao, T. Hua, P. Shi, Y . Fu, W. Zhaoet al., “EventVGGT: Exploring Cross-Modal Distil- lation for Consistent Event-based Depth Estimation,”arXiv preprint arXiv:2603.09385, 2026

work page arXiv 2026
[5]

Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation,

J. Zhu, T. Pan, Z. Cao, Y . Liu, J. T. Kwok, and H. Xiong, “Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation,” inProc. Int. Conf. Comput. Vis., 2025, pp. 5146–5155

work page 2025
[6]

When person re- identification meets event camera: a benchmark dataset and an attribute- guided re-identification framework,

X. Wang, Q. Zhu, S. Wu, B. Jiang, and S. Zhang, “When person re- identification meets event camera: a benchmark dataset and an attribute- guided re-identification framework,” inProc. Conf. Assoc. Advance. Artif. Intell., vol. 40, no. 12, 2026, pp. 10 172–10 180

work page 2026
[7]

Bullying10k: A large-scale neuromorphic dataset towards privacy-preserving bullying recognition,

Y . Dong, Y . Li, D. Zhao, G. Shen, and Y . Zeng, “Bullying10k: A large-scale neuromorphic dataset towards privacy-preserving bullying recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 36, pp. 1923– 1937, 2023

work page 1923
[8]

A survey of face recognition,

X. Wang, J. Peng, S. Zhang, B. Chen, Y . Wang, and Y . Guo, “A survey of face recognition,”arXiv preprint arXiv:2212.13038, 2022

work page arXiv 2022
[9]

A survey on deep learning based face recogni- tion,

G. Guo and N. Zhang, “A survey on deep learning based face recogni- tion,”Comput. Vis. Image Underst., vol. 189, p. 102805, 2019

work page 2019
[10]

Facenet: A unified em- bedding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified em- bedding for face recognition and clustering,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2015, pp. 815–823

work page 2015
[11]

Arcface: Additive angular margin loss for deep face recognition,

J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4690–4699

work page 2019
[12]

Adaface: Quality adaptive margin for face recognition,

M. Kim, A. K. Jain, and X. Liu, “Adaface: Quality adaptive margin for face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 18 750–18 759

work page 2022
[13]

Deep learning for face recognition: a critical analysis,

A. J. Shepley, “Deep learning for face recognition: a critical analysis,” arXiv preprint arXiv:1907.12739, 2019

work page arXiv 1907
[14]

Low-light face recognition for mo- bile robots,

C. Baek, J. W. Song, and K. Kong, “Low-light face recognition for mo- bile robots,” inProc. Int. Tech. Conf. Circuits/Syst., Comput., Commun. (ITC-CSCC), 2025, pp. 1–5

work page 2025
[15]

Ensuring privacy in face recognition: a survey on data generation, inference and storage,

Z. Sun and Z. Liu, “Ensuring privacy in face recognition: a survey on data generation, inference and storage,”Discov. Appl. Sci., vol. 7, no. 5, p. 441, 2025

work page 2025
[16]

Controllable inversion of black-box face recognition models via diffusion,

M. Kansy, A. Ra ¨el, G. Mignone, J. Naruniec, C. Schroers, M. Gross, and R. M. Weber, “Controllable inversion of black-box face recognition models via diffusion,” inProc. Int. Conf. Comput. Vis., 2023, pp. 3167– 3177

work page 2023
[17]

Vec2face: Unveil human faces from their blackbox features in face recognition,

C. N. Duong, T.-D. Truong, K. Luu, K. G. Quach, H. Bui, and K. Roy, “Vec2face: Unveil human faces from their blackbox features in face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6132–6141

work page 2020
[18]

Neuromorphic facial analysis with cross-modal supervision,

F. Becattini, L. Cultrera, L. Berlincioni, C. Ferrari, A. Leonardo, and A. Del Bimbo, “Neuromorphic facial analysis with cross-modal supervision,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 205–223

work page 2024
[19]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

work page 2022
[20]

Deep face recognition: A survey,

M. Wang and W. Deng, “Deep face recognition: A survey,”Neurocom- puting, vol. 429, pp. 215–244, 2021

work page 2021
[21]

Deep learning face representation from predicting 10,000 classes,

Y . Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1891–1898

work page 2014
[22]

Deepface: Closing the gap to human-level performance in face verification,

Y . Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face verification,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1701–1708

work page 2014
[23]

Transface: Calibrating transformer training for face recognition from a data-centric perspective,

J. Dan, Y . Liu, H. Xie, J. Deng, H. Xie, X. Xie, and B. Sun, “Transface: Calibrating transformer training for face recognition from a data-centric perspective,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 20 642–20 653

work page 2023
[24]

Transface++: Rethinking the face recognition paradigm with a focus on accuracy, efficiency, and security,

J. Dan, Y . Liu, B. Sun, J. Deng, and S. Luo, “Transface++: Rethinking the face recognition paradigm with a focus on accuracy, efficiency, and security,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 2, pp. 1243–1261, Feb. 2026

work page 2026
[25]

Ms-celeb-1m: A dataset and benchmark for large-scale face recognition,

Y . Guo, L. Zhang, Y . Hu, X. He, and J. Gao, “Ms-celeb-1m: A dataset and benchmark for large-scale face recognition,” inProc. Eur. Conf. Comput. Vis., 2016, pp. 87–102

work page 2016
[26]

Vggface2: A dataset for recognising faces across pose and age,

Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2: A dataset for recognising faces across pose and age,” inProc. 13th IEEE Int. Conf. Autom. Face Gesture Recognit., 2018, pp. 67–74

work page 2018
[27]

The megaface benchmark: 1 million faces for recognition at scale,

I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard, “The megaface benchmark: 1 million faces for recognition at scale,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4873– 4882

work page 2016
[28]

Deep learning face representa- tion by joint identification-verification,

Y . Sun, Y . Chen, X. Wang, and X. Tang, “Deep learning face representa- tion by joint identification-verification,”Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014

work page 2014
[29]

Sphereface: Deep hypersphere embedding for face recognition,

W. Liu, Y . Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 212–220

work page 2017
[30]

Cosface: Large margin cosine loss for deep face recognition,

H. Wang, Y . Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5265–5274

work page 2018
[31]

Low-facenet: face recognition-driven low-light image enhancement,

Y . Fan, Y . Wang, D. Liang, Y . Chen, H. Xie, F. L. Wang, J. Li, and M. Wei, “Low-facenet: face recognition-driven low-light image enhancement,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–13, Mar. 2024

work page 2024
[32]

On the reconstruction of face images from deep face templates,

G. Mai, K. Cao, P. C. Yuen, and A. K. Jain, “On the reconstruction of face images from deep face templates,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 5, pp. 1188–1202, May 2019

work page 2019
[33]

IDFace: Face Template Protection for Efficient and Secure Identification,

S. Kim, S. Paik, C. Hwang, D. Kim, J. Shin, and J. H. Seo, “IDFace: Face Template Protection for Efficient and Secure Identification,” inProc. Int. Conf. Comput. Vis., 2025, pp. 13 995–14 005

work page 2025
[34]

Stable hash generation for efficient privacy-preserving face identification,

D. Osorio-Roig, C. Rathgeb, P. Drozdowski, and C. Busch, “Stable hash generation for efficient privacy-preserving face identification,”IEEE Trans. Biometrics, Behav., Identity Sci., vol. 4, no. 3, pp. 333–348, 2021

work page 2021
[35]

HEBI: Homomor- phically encrypted biometric indexing,

P. Bauspieß, M. Grimmer, C. Fougner, D. Le Vasseur, T. T. St ¨ocklin, C. Rathgeb, J. Kolberg, A. Costache, and C. Busch, “HEBI: Homomor- phically encrypted biometric indexing,” inProc. IEEE Int. Joint Conf. Biometrics, 2023, pp. 1–10

work page 2023
[36]

Privacy-preserving face recognition using trainable feature subtraction,

Y . Mi, Z. Zhong, Y . Huang, J. Ji, J. Xu, J. Wang, S. Wang, S. Ding, and S. Zhou, “Privacy-preserving face recognition using trainable feature subtraction,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 297–307

work page 2024
[37]

PRO-face: A generic framework for privacy-preserving recognizable obfuscation of face images,

L. Yuan, L. Liu, X. Pu, Z. Li, H. Li, and X. Gao, “PRO-face: A generic framework for privacy-preserving recognizable obfuscation of face images,” inProc. 30th ACM Int. Conf. Multimedia, 2022, pp. 1661– 1669

work page 2022
[38]

PRO-face C: Privacy-preserving recognition of obfuscated face via feature compensation,

L. Yuan, W. Chen, X. Pu, Y . Zhang, H. Li, Y . Zhang, X. Gao, and T. Ebrahimi, “PRO-face C: Privacy-preserving recognition of obfuscated face via feature compensation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 4930–4944, Apr. 2024

work page 2024
[39]

Privacy-preserving adversarial facial features,

Z. Wang, H. Wang, S. Jin, W. Zhang, J. Hu, Y . Wang, P. Sun, W. Yuan, K. Liu, and K. Ren, “Privacy-preserving adversarial facial features,” in JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14 Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 8212– 8221

work page 2021
[40]

Recent event camera innovations: A survey,

B. Chakravarthi, A. A. Verma, K. Daniilidis, C. Fermuller, and Y . Yang, “Recent event camera innovations: A survey,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 342–376

work page 2024
[41]

evtransfer: A transfer learning framework for event-based facial expression recognition,

R. Verschae and I. Bugueno-Cordova, “evtransfer: A transfer learning framework for event-based facial expression recognition,”arXiv preprint arXiv:2508.03609, 2025

work page arXiv 2025
[42]

Spatio-temporal transformers for action unit classification with event cameras,

L. Cultrera, F. Becattini, L. Berlincioni, C. Ferrari, and A. Del Bimbo, “Spatio-temporal transformers for action unit classification with event cameras,”Comput. Vis. Image Underst., p. 104578, 2025

work page 2025
[43]

Exploring spatial-temporal dynamics in event- based facial micro-expression analysis,

N. Mastropasqua, I. Bugueno-Cordova, R. Verschae, D. Acevedo, P. Ne- gri, and M. E. Buemi, “Exploring spatial-temporal dynamics in event- based facial micro-expression analysis,” inProc. Int. Conf. Comput. Vis., 2025, pp. 4723–4732

work page 2025
[44]

Spiking-fer: spiking neural network for facial expression recognition with event cameras,

S. Barchid, B. Allaert, A. Aissaoui, J. Mennesson, and C. C. Djeraba, “Spiking-fer: spiking neural network for facial expression recognition with event cameras,” inProc. 20th Int. Conf. Content-Based Multimedia Indexing, 2023, pp. 1–7

work page 2023
[45]

Real-time multi-task facial analytics with event cameras,

C. Ryan, A. Elrasad, W. Shariff, J. Lemley, P. Kielty, P. Hurney, and P. Corcoran, “Real-time multi-task facial analytics with event cameras,” IEEE Access, vol. 11, pp. 76 964–76 976, 2023

work page 2023
[46]

Event-based facial keypoint alignment via cross-modal fusion attention and self-supervised multi-event repre- sentation learning,

D. Kang, J. Kim, and D. Kang, “Event-based facial keypoint alignment via cross-modal fusion attention and self-supervised multi-event repre- sentation learning,”arXiv preprint arXiv:2509.24968, 2025

work page arXiv 2025
[47]

Evaluation of convolutional networks for event camera face pose alignment,

B. B. Oral, A. C ¸ akıcı, and A. Savran, “Evaluation of convolutional networks for event camera face pose alignment,”Acad. Platform J. Eng. Smart Syst., vol. 13, no. 2, pp. 22–30, 2025

work page 2025
[48]

Evaluating image-based face and eye tracking with event cameras,

K. Iddrisu, W. Shariff, N. E. O’Connor, J. Lemley, and S. Little, “Evaluating image-based face and eye tracking with event cameras,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 224–240

work page 2024
[49]

Event-based multi-task facial landmark and blink detection,

P. Kielty, C. Ryan, W. Shariff, J. Lemley, and P. Corcoran, “Event-based multi-task facial landmark and blink detection,”IEEE Access, 2025

work page 2025
[50]

Event camera data pre-training,

Y . Yang, L. Pan, and L. Liu, “Event camera data pre-training,” inProc. Int. Conf. Comput. Vis., 2023, pp. 10 699–10 709

work page 2023
[51]

Masked event modeling: Self-supervised pretraining for event cameras,

S. Klenk, D. Bonello, L. Koestler, N. Araslanov, and D. Cremers, “Masked event modeling: Self-supervised pretraining for event cameras,” inProc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024, pp. 2378– 2388

work page 2024
[52]

Learning to exploit multiple vision modalities by using grafted networks,

Y . Hu, T. Delbruck, and S.-C. Liu, “Learning to exploit multiple vision modalities by using grafted networks,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 85–101

work page 2020
[53]

Eventclip: Adapting clip for event-based object recognition.arXiv preprint arXiv:2306.06354, 2023

Z. Wu, X. Liu, and I. Gilitschenski, “Eventclip: Adapting clip for event- based object recognition,”arXiv preprint arXiv:2306.06354, 2023

work page arXiv 2023
[54]

Velora: A low-rank adaptation approach for efficient rgb-event based recognition,

L. Chen, H. Yang, P. Shao, H. Song, X. Wang, Z. Zhao, Y . Wang, and Y . Tian, “Velora: A low-rank adaptation approach for efficient rgb-event based recognition,”arXiv preprint arXiv:2412.20064, 2024

work page arXiv 2024
[55]

Spiking transfer learning from rgb image to neuromorphic event stream,

Q. Zhan, G. Liu, X. Xie, R. Tao, M. Zhang, and H. Tang, “Spiking transfer learning from rgb image to neuromorphic event stream,”IEEE Trans. Image Process., vol. 3, pp. 4274–4287, Jul. 2024

work page 2024
[56]

Leveraging rgb images for pre-training of event-based hand pose estimation,

R. Liu, T. Ohkawa, T. H. E. Tse, M. Zhang, A. Yao, and Y . Sato, “Leveraging rgb images for pre-training of event-based hand pose estimation,”arXiv preprint arXiv:2509.16949, 2025

work page arXiv 2025
[57]

Low-rank adaptation for foundation models: A comprehensive review.arXiv preprint arXiv:2501.00365,

M. Yang, J. Chen, J. Tao, Y . Zhang, J. Liu, J. Zhang, Q. Ma, H. Verma, R. Zhang, M. Zhouet al., “Low-rank adaptation for foundation models: A comprehensive review,”arXiv preprint arXiv:2501.00365, 2024

work page arXiv 2024
[58]

Grad-cam: Visual explanations from deep networks via gradient-based localization,

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” inProc. Int. Conf. Comput. Vis., 2017, pp. 618–626

work page 2017
[59]

Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures,

Y . Duan, W. Wang, Z. Chen, X. Zhu, L. Lu, T. Lu, Y . Qiao, H. Li, J. Dai, and W. Wang, “Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures,” inProc. Int. Conf. Learn. Representations, 2024

work page 2024
[60]

Restore-rwkv: Efficient and effective medical image restoration with rwkv,

Z. Yang, J. Li, H. Zhang, D. Zhao, B. Wei, and Y . Xu, “Restore-rwkv: Efficient and effective medical image restoration with rwkv,”IEEE J. Biomed. Health Inform., vol. 30, no. 1, Jan. 2026

work page 2026
[61]

Webface260m: A benchmark unveiling the power of million-scale deep face recognition,

Z. Zhu, G. Huang, J. Deng, Y . Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Duet al., “Webface260m: A benchmark unveiling the power of million-scale deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 492–10 502

work page 2021
[62]

Magface: A universal representation for face recognition and quality assessment,

Q. Meng, S. Zhao, Z. Huang, and F. Zhou, “Magface: A universal representation for face recognition and quality assessment,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 225– 14 234

work page 2021
[63]

Sphereface2: Binary classification is all you need for deep face recognition,

Y . Wen, W. Liu, A. Weller, B. Raj, and R. Singh, “Sphereface2: Binary classification is all you need for deep face recognition,” inProc. Int. Conf. Learn. Representations, 2022

work page 2022
[64]

Uniface: Unified cross- entropy loss for deep face recognition,

J. Zhou, X. Jia, Q. Li, L. Shen, and J. Duan, “Uniface: Unified cross- entropy loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 20 730–20 739

work page 2023
[65]

Unitsface: Unified threshold integrated sample-to-sample loss for face recognition,

X. Jia, J. Zhou, L. Shen, J. Duanet al., “Unitsface: Unified threshold integrated sample-to-sample loss for face recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 36, pp. 32 732–32 747, 2023

work page 2023
[66]

RVface: Reliable vector guided softmax loss for face recognition,

X. Wang, S. Wang, Y . Liang, L. Gu, and Z. Lei, “RVface: Reliable vector guided softmax loss for face recognition,”IEEE Trans. Image Process., vol. 31, pp. 2337–2351, Mar. 2022

work page 2022
[67]

Topofr: A closer look at topology alignment on face recognition,

J. Dan, Y . Liu, J. Deng, H. Xie, S. Li, B. Sun, and S. Luo, “Topofr: A closer look at topology alignment on face recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 37, pp. 37 213–37 240, 2024

work page 2024
[68]

Face reconstruction from partially leaked facial embeddings,

H. O. Shahreza and S. Marcel, “Face reconstruction from partially leaked facial embeddings,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2024, pp. 4930–4934

work page 2024
[69]

High speed and high dynamic range video with an event camera,

H. Rebecq, R. Ranftl, V . Koltun, and D. Scaramuzza, “High speed and high dynamic range video with an event camera,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 6, pp. 1964–1980, Dec. 2019

work page 1964
[70]

Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,

P. R. G. Cadena, Y . Qian, C. Wang, and M. Yang, “Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2023, pp. 4150–4158. Qingguo Mengreceived his B.Eng. degree in com- puter science and technology from Henan Poly- technic Universi...

work page 2023