EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal Modeling
Pith reviewed 2026-05-10 18:09 UTC · model grok-4.3
The pith
Event-based face recognition reaches 94 percent accuracy by transferring structural priors from RGB models and explicitly modeling rigid facial motion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Event-based face recognition should model structure-driven spatiotemporal identity representations shaped by rigid facial motion and individual facial geometry. This is realized by applying Low-Rank Adaptation to transfer spatial priors from RGB face models, followed by a Motion Prompt Encoder that extracts temporal dynamics and a Spatiotemporal Modulator that fuses the two streams, yielding 94.19 percent Rank-1 identification and 5.35 percent equal error rate on the constructed EFace dataset together with improved robustness under degraded illumination and lower template reconstructability.
What carries the argument
The EventFace framework, which transfers structural facial priors via Low-Rank Adaptation, encodes temporal dynamics with a Motion Prompt Encoder, and fuses them through a Spatiotemporal Modulator.
If this is right
- Event cameras become viable for face recognition tasks where lighting varies sharply because the model relies on geometry and motion rather than intensity values.
- Privacy improves because the learned event templates are harder to invert into recognizable images than RGB templates.
- Small-scale rigid-motion event collections suffice for training once RGB priors are adapted, lowering the data barrier for new sensing modalities.
- The same structure-plus-motion design can be tested on other event-based biometric tasks such as gait or gesture recognition.
Where Pith is reading between the lines
- The approach implies that event-based systems could operate at lower power and higher speed than frame-based cameras in always-on authentication scenarios.
- Extending the motion encoder to handle non-rigid expressions would test whether the current rigid-motion focus limits applicability to real-world conversations.
- Combining the transferred priors with raw event polarity or timestamp statistics could further tighten the spatiotemporal representation without additional labeled data.
Load-bearing premise
Structural priors learned from RGB face images can be transferred to event streams through low-rank adaptation and that a small dataset of rigid-motion event recordings supplies enough variation to learn generalizable identity representations.
What would settle it
Performance of the transferred model falling below a non-adapted event baseline on a larger dataset recorded during natural, non-rigid facial motion would falsify the transferability premise.
Figures
read the original abstract
Event cameras offer a promising sensing modality for face recognition due to their inherent advantages in illumination robustness and privacy-friendliness. However, because event streams lack the stable photometric appearance relied upon by conventional RGB-based face recognition systems, we argue that event-based face recognition should model structure-driven spatiotemporal identity representations shaped by rigid facial motion and individual facial geometry. Since dedicated datasets for event-based face recognition remain lacking, we construct EFace, a small-scale event-based face dataset captured under rigid facial motion. To learn effectively from this limited event data, we further propose EventFace, a framework for event-based face recognition that integrates spatial structure and temporal dynamics for identity modeling. Specifically, we employ Low-Rank Adaptation (LoRA) to transfer structural facial priors from pretrained RGB face models to the event domain, thereby establishing a reliable spatial basis for identity modeling. Building on this foundation, we further introduce a Motion Prompt Encoder (MPE) to explicitly encode temporal features and a Spatiotemporal Modulator (STM) to fuse them with spatial features, thereby enhancing the representation of identity-relevant event patterns. Extensive experiments demonstrate that EventFace achieves the best performance among the evaluated baselines, with a Rank-1 identification rate of 94.19% and an equal error rate (EER) of 5.35%. Results further indicate that EventFace exhibits stronger robustness under degraded illumination than the competing methods. In addition, the learned representations exhibit reduced template reconstructability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EventFace, a framework for event-based face recognition that transfers structural facial priors from pretrained RGB models using Low-Rank Adaptation (LoRA), then augments them with a Motion Prompt Encoder (MPE) to capture temporal dynamics and a Spatiotemporal Modulator (STM) to fuse spatial and temporal features. It also constructs the EFace dataset under rigid facial motion and reports that EventFace achieves 94.19% Rank-1 identification and 5.35% EER on this dataset while showing stronger robustness to degraded illumination than baselines; the learned representations are additionally claimed to exhibit reduced template reconstructability.
Significance. If the empirical claims hold under fuller validation, the work would provide a concrete approach to bridging the modality gap in event-based face recognition by exploiting rigid-motion structure rather than photometric appearance, which aligns with event cameras' strengths in illumination invariance and privacy. The release of EFace is a constructive contribution, and the privacy angle (reduced reconstructability) is a useful secondary finding. However, the small scale and rigid-motion constraint of the dataset limit the strength of any generalization argument.
major comments (3)
- [Experimental section] Experimental section (performance claims and EFace dataset description): The central superiority claim (94.19% Rank-1, 5.35% EER, and illumination robustness) is presented without any reported details on the number of subjects, total sequences, train/test split sizes, subject demographics, or how illumination degradation was controlled and quantified. These omissions are load-bearing because the entire evaluation rests on a newly collected, small-scale, rigid-motion dataset; without them it is impossible to determine whether the metrics reflect genuine spatiotemporal identity modeling or capture-specific artifacts.
- [Framework description] Framework description (LoRA transfer and MPE/STM modules): The argument that LoRA reliably transfers RGB structural priors to event streams, which then serve as the foundation for MPE and STM, is asserted without ablation studies isolating the contribution of the transferred priors versus the new modules, or any analysis of the residual modality gap. This is central to the paper's novelty claim that structure-driven modeling compensates for data scarcity.
- [Results on robustness] Results on robustness: The statement that EventFace exhibits stronger robustness under degraded illumination than competing methods lacks quantitative tables or figures showing per-condition metrics, the exact illumination levels tested, or statistical significance tests; this directly underpins the illumination-robustness advantage highlighted in the abstract.
minor comments (2)
- [Abstract] The abstract mentions 'reduced template reconstructability' as an additional benefit but provides no quantitative metric or comparison; a brief privacy evaluation would strengthen the claim.
- [Method] Notation for the MPE and STM modules could be clarified with explicit equations or pseudocode to make the fusion mechanism reproducible.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We agree that additional details and analyses will strengthen the paper and will incorporate revisions to address the concerns about experimental reporting, ablations, and robustness evaluation. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Experimental section] Experimental section (performance claims and EFace dataset description): The central superiority claim (94.19% Rank-1, 5.35% EER, and illumination robustness) is presented without any reported details on the number of subjects, total sequences, train/test split sizes, subject demographics, or how illumination degradation was controlled and quantified. These omissions are load-bearing because the entire evaluation rests on a newly collected, small-scale, rigid-motion dataset; without them it is impossible to determine whether the metrics reflect genuine spatiotemporal identity modeling or capture-specific artifacts.
Authors: We acknowledge that the current manuscript provides insufficient explicit details on the EFace dataset and experimental protocol, making it difficult to fully evaluate the results. In the revised version, we will expand Section 4 with a complete description including the exact number of subjects, total sequences, train/test split sizes and ratios, subject demographics, and a step-by-step account of how illumination degradation was controlled, simulated, and quantified (including specific parameters). We will also explicitly discuss the small-scale and rigid-motion nature of the dataset as a limitation. revision: yes
-
Referee: [Framework description] Framework description (LoRA transfer and MPE/STM modules): The argument that LoRA reliably transfers RGB structural priors to event streams, which then serve as the foundation for MPE and STM, is asserted without ablation studies isolating the contribution of the transferred priors versus the new modules, or any analysis of the residual modality gap. This is central to the paper's novelty claim that structure-driven modeling compensates for data scarcity.
Authors: We agree that dedicated ablations are needed to substantiate the role of the LoRA-transferred priors and the added modules. Although the manuscript compares the full model against external baselines, it lacks internal ablations. In the revision, we will add experiments ablating LoRA (comparing with and without transferred priors), MPE, and STM individually, along with an analysis of the residual modality gap via performance metrics on event-only inputs. These additions will directly support the novelty argument regarding structure-driven compensation for limited event data. revision: yes
-
Referee: [Results on robustness] Results on robustness: The statement that EventFace exhibits stronger robustness under degraded illumination than competing methods lacks quantitative tables or figures showing per-condition metrics, the exact illumination levels tested, or statistical significance tests; this directly underpins the illumination-robustness advantage highlighted in the abstract.
Authors: We recognize that the robustness claim requires more granular evidence than currently provided. The manuscript summarizes the advantage but omits breakdowns. In the revised version, we will include a new table or figure with per-condition Rank-1 and EER metrics across the tested illumination levels, specify the exact degradation parameters used, and report statistical significance tests (e.g., paired t-tests or Wilcoxon tests) comparing EventFace to baselines. This will provide quantitative support for the illumination-robustness advantage. revision: yes
Circularity Check
No circularity: empirical framework with independent dataset and modules
full rationale
The paper constructs a new small-scale EFace dataset under rigid motion and proposes EventFace with LoRA transfer from RGB priors plus MPE and STM modules. All reported results (94.19% Rank-1, 5.35% EER, illumination robustness) are direct empirical measurements on this dataset against baselines. No equations, first-principles derivations, or predictions are presented that reduce to fitted parameters or self-citations by construction. The central claims rest on experimental comparison rather than any self-definitional or load-bearing self-referential step.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Pretrained RGB face models contain transferable structural priors for facial identity that can be adapted to event data via LoRA.
- domain assumption Rigid facial motion produces event patterns that are identity-discriminative when combined with spatial structure.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we employ Low-Rank Adaptation (LoRA) to transfer structural facial priors from pretrained RGB face models... Motion Prompt Encoder (MPE) to explicitly encode temporal features and a Spatiotemporal Modulator (STM) to fuse them
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and 8-tick period unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
T=4 consecutive frames... accumulation interval ΔT=50 ms... total temporal receptive field of 200 ms
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A 128×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,
P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 dB 15µs latency asynchronous temporal contrast vision sensor,”IEEE J. Solid- State Circuits, vol. 43, no. 2, pp. 566–576, Feb. 2008
work page 2008
-
[2]
A 240× 180 130 dB 3µs latency global shutter spatiotemporal vision sensor,
C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck, “A 240× 180 130 dB 3µs latency global shutter spatiotemporal vision sensor,” IEEE J. Solid-State Circuits, vol. 49, no. 10, pp. 2333–2341, Oct. 2014
work page 2014
-
[3]
Event- based vision: A survey,
G. Gallego, T. Delbr ¨uck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidiset al., “Event- based vision: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, Jan. 2020
work page 2020
-
[4]
EventVGGT: Exploring Cross-Modal Distil- lation for Consistent Event-based Depth Estimation,
Y . Ren, J. Zhu, K. Chen, Z. Li, J. Ou, Z. Cao, T. Hua, P. Shi, Y . Fu, W. Zhaoet al., “EventVGGT: Exploring Cross-Modal Distil- lation for Consistent Event-based Depth Estimation,”arXiv preprint arXiv:2603.09385, 2026
-
[5]
J. Zhu, T. Pan, Z. Cao, Y . Liu, J. T. Kwok, and H. Xiong, “Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation,” inProc. Int. Conf. Comput. Vis., 2025, pp. 5146–5155
work page 2025
-
[6]
X. Wang, Q. Zhu, S. Wu, B. Jiang, and S. Zhang, “When person re- identification meets event camera: a benchmark dataset and an attribute- guided re-identification framework,” inProc. Conf. Assoc. Advance. Artif. Intell., vol. 40, no. 12, 2026, pp. 10 172–10 180
work page 2026
-
[7]
Bullying10k: A large-scale neuromorphic dataset towards privacy-preserving bullying recognition,
Y . Dong, Y . Li, D. Zhao, G. Shen, and Y . Zeng, “Bullying10k: A large-scale neuromorphic dataset towards privacy-preserving bullying recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 36, pp. 1923– 1937, 2023
work page 1923
-
[8]
X. Wang, J. Peng, S. Zhang, B. Chen, Y . Wang, and Y . Guo, “A survey of face recognition,”arXiv preprint arXiv:2212.13038, 2022
-
[9]
A survey on deep learning based face recogni- tion,
G. Guo and N. Zhang, “A survey on deep learning based face recogni- tion,”Comput. Vis. Image Underst., vol. 189, p. 102805, 2019
work page 2019
-
[10]
Facenet: A unified em- bedding for face recognition and clustering,
F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified em- bedding for face recognition and clustering,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2015, pp. 815–823
work page 2015
-
[11]
Arcface: Additive angular margin loss for deep face recognition,
J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4690–4699
work page 2019
-
[12]
Adaface: Quality adaptive margin for face recognition,
M. Kim, A. K. Jain, and X. Liu, “Adaface: Quality adaptive margin for face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 18 750–18 759
work page 2022
-
[13]
Deep learning for face recognition: a critical analysis,
A. J. Shepley, “Deep learning for face recognition: a critical analysis,” arXiv preprint arXiv:1907.12739, 2019
-
[14]
Low-light face recognition for mo- bile robots,
C. Baek, J. W. Song, and K. Kong, “Low-light face recognition for mo- bile robots,” inProc. Int. Tech. Conf. Circuits/Syst., Comput., Commun. (ITC-CSCC), 2025, pp. 1–5
work page 2025
-
[15]
Ensuring privacy in face recognition: a survey on data generation, inference and storage,
Z. Sun and Z. Liu, “Ensuring privacy in face recognition: a survey on data generation, inference and storage,”Discov. Appl. Sci., vol. 7, no. 5, p. 441, 2025
work page 2025
-
[16]
Controllable inversion of black-box face recognition models via diffusion,
M. Kansy, A. Ra ¨el, G. Mignone, J. Naruniec, C. Schroers, M. Gross, and R. M. Weber, “Controllable inversion of black-box face recognition models via diffusion,” inProc. Int. Conf. Comput. Vis., 2023, pp. 3167– 3177
work page 2023
-
[17]
Vec2face: Unveil human faces from their blackbox features in face recognition,
C. N. Duong, T.-D. Truong, K. Luu, K. G. Quach, H. Bui, and K. Roy, “Vec2face: Unveil human faces from their blackbox features in face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6132–6141
work page 2020
-
[18]
Neuromorphic facial analysis with cross-modal supervision,
F. Becattini, L. Cultrera, L. Berlincioni, C. Ferrari, A. Leonardo, and A. Del Bimbo, “Neuromorphic facial analysis with cross-modal supervision,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 205–223
work page 2024
-
[19]
Lora: Low-rank adaptation of large language models
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022
work page 2022
-
[20]
Deep face recognition: A survey,
M. Wang and W. Deng, “Deep face recognition: A survey,”Neurocom- puting, vol. 429, pp. 215–244, 2021
work page 2021
-
[21]
Deep learning face representation from predicting 10,000 classes,
Y . Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1891–1898
work page 2014
-
[22]
Deepface: Closing the gap to human-level performance in face verification,
Y . Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face verification,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1701–1708
work page 2014
-
[23]
Transface: Calibrating transformer training for face recognition from a data-centric perspective,
J. Dan, Y . Liu, H. Xie, J. Deng, H. Xie, X. Xie, and B. Sun, “Transface: Calibrating transformer training for face recognition from a data-centric perspective,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 20 642–20 653
work page 2023
-
[24]
J. Dan, Y . Liu, B. Sun, J. Deng, and S. Luo, “Transface++: Rethinking the face recognition paradigm with a focus on accuracy, efficiency, and security,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 2, pp. 1243–1261, Feb. 2026
work page 2026
-
[25]
Ms-celeb-1m: A dataset and benchmark for large-scale face recognition,
Y . Guo, L. Zhang, Y . Hu, X. He, and J. Gao, “Ms-celeb-1m: A dataset and benchmark for large-scale face recognition,” inProc. Eur. Conf. Comput. Vis., 2016, pp. 87–102
work page 2016
-
[26]
Vggface2: A dataset for recognising faces across pose and age,
Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2: A dataset for recognising faces across pose and age,” inProc. 13th IEEE Int. Conf. Autom. Face Gesture Recognit., 2018, pp. 67–74
work page 2018
-
[27]
The megaface benchmark: 1 million faces for recognition at scale,
I. Kemelmacher-Shlizerman, S. M. Seitz, D. Miller, and E. Brossard, “The megaface benchmark: 1 million faces for recognition at scale,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4873– 4882
work page 2016
-
[28]
Deep learning face representa- tion by joint identification-verification,
Y . Sun, Y . Chen, X. Wang, and X. Tang, “Deep learning face representa- tion by joint identification-verification,”Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014
work page 2014
-
[29]
Sphereface: Deep hypersphere embedding for face recognition,
W. Liu, Y . Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 212–220
work page 2017
-
[30]
Cosface: Large margin cosine loss for deep face recognition,
H. Wang, Y . Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5265–5274
work page 2018
-
[31]
Low-facenet: face recognition-driven low-light image enhancement,
Y . Fan, Y . Wang, D. Liang, Y . Chen, H. Xie, F. L. Wang, J. Li, and M. Wei, “Low-facenet: face recognition-driven low-light image enhancement,”IEEE Trans. Instrum. Meas., vol. 73, pp. 1–13, Mar. 2024
work page 2024
-
[32]
On the reconstruction of face images from deep face templates,
G. Mai, K. Cao, P. C. Yuen, and A. K. Jain, “On the reconstruction of face images from deep face templates,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 5, pp. 1188–1202, May 2019
work page 2019
-
[33]
IDFace: Face Template Protection for Efficient and Secure Identification,
S. Kim, S. Paik, C. Hwang, D. Kim, J. Shin, and J. H. Seo, “IDFace: Face Template Protection for Efficient and Secure Identification,” inProc. Int. Conf. Comput. Vis., 2025, pp. 13 995–14 005
work page 2025
-
[34]
Stable hash generation for efficient privacy-preserving face identification,
D. Osorio-Roig, C. Rathgeb, P. Drozdowski, and C. Busch, “Stable hash generation for efficient privacy-preserving face identification,”IEEE Trans. Biometrics, Behav., Identity Sci., vol. 4, no. 3, pp. 333–348, 2021
work page 2021
-
[35]
HEBI: Homomor- phically encrypted biometric indexing,
P. Bauspieß, M. Grimmer, C. Fougner, D. Le Vasseur, T. T. St ¨ocklin, C. Rathgeb, J. Kolberg, A. Costache, and C. Busch, “HEBI: Homomor- phically encrypted biometric indexing,” inProc. IEEE Int. Joint Conf. Biometrics, 2023, pp. 1–10
work page 2023
-
[36]
Privacy-preserving face recognition using trainable feature subtraction,
Y . Mi, Z. Zhong, Y . Huang, J. Ji, J. Xu, J. Wang, S. Wang, S. Ding, and S. Zhou, “Privacy-preserving face recognition using trainable feature subtraction,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 297–307
work page 2024
-
[37]
PRO-face: A generic framework for privacy-preserving recognizable obfuscation of face images,
L. Yuan, L. Liu, X. Pu, Z. Li, H. Li, and X. Gao, “PRO-face: A generic framework for privacy-preserving recognizable obfuscation of face images,” inProc. 30th ACM Int. Conf. Multimedia, 2022, pp. 1661– 1669
work page 2022
-
[38]
PRO-face C: Privacy-preserving recognition of obfuscated face via feature compensation,
L. Yuan, W. Chen, X. Pu, Y . Zhang, H. Li, Y . Zhang, X. Gao, and T. Ebrahimi, “PRO-face C: Privacy-preserving recognition of obfuscated face via feature compensation,”IEEE Trans. Inf. Forensics Security, vol. 19, pp. 4930–4944, Apr. 2024
work page 2024
-
[39]
Privacy-preserving adversarial facial features,
Z. Wang, H. Wang, S. Jin, W. Zhang, J. Hu, Y . Wang, P. Sun, W. Yuan, K. Liu, and K. Ren, “Privacy-preserving adversarial facial features,” in JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14 Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 8212– 8221
work page 2021
-
[40]
Recent event camera innovations: A survey,
B. Chakravarthi, A. A. Verma, K. Daniilidis, C. Fermuller, and Y . Yang, “Recent event camera innovations: A survey,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 342–376
work page 2024
-
[41]
evtransfer: A transfer learning framework for event-based facial expression recognition,
R. Verschae and I. Bugueno-Cordova, “evtransfer: A transfer learning framework for event-based facial expression recognition,”arXiv preprint arXiv:2508.03609, 2025
-
[42]
Spatio-temporal transformers for action unit classification with event cameras,
L. Cultrera, F. Becattini, L. Berlincioni, C. Ferrari, and A. Del Bimbo, “Spatio-temporal transformers for action unit classification with event cameras,”Comput. Vis. Image Underst., p. 104578, 2025
work page 2025
-
[43]
Exploring spatial-temporal dynamics in event- based facial micro-expression analysis,
N. Mastropasqua, I. Bugueno-Cordova, R. Verschae, D. Acevedo, P. Ne- gri, and M. E. Buemi, “Exploring spatial-temporal dynamics in event- based facial micro-expression analysis,” inProc. Int. Conf. Comput. Vis., 2025, pp. 4723–4732
work page 2025
-
[44]
Spiking-fer: spiking neural network for facial expression recognition with event cameras,
S. Barchid, B. Allaert, A. Aissaoui, J. Mennesson, and C. C. Djeraba, “Spiking-fer: spiking neural network for facial expression recognition with event cameras,” inProc. 20th Int. Conf. Content-Based Multimedia Indexing, 2023, pp. 1–7
work page 2023
-
[45]
Real-time multi-task facial analytics with event cameras,
C. Ryan, A. Elrasad, W. Shariff, J. Lemley, P. Kielty, P. Hurney, and P. Corcoran, “Real-time multi-task facial analytics with event cameras,” IEEE Access, vol. 11, pp. 76 964–76 976, 2023
work page 2023
-
[46]
D. Kang, J. Kim, and D. Kang, “Event-based facial keypoint alignment via cross-modal fusion attention and self-supervised multi-event repre- sentation learning,”arXiv preprint arXiv:2509.24968, 2025
-
[47]
Evaluation of convolutional networks for event camera face pose alignment,
B. B. Oral, A. C ¸ akıcı, and A. Savran, “Evaluation of convolutional networks for event camera face pose alignment,”Acad. Platform J. Eng. Smart Syst., vol. 13, no. 2, pp. 22–30, 2025
work page 2025
-
[48]
Evaluating image-based face and eye tracking with event cameras,
K. Iddrisu, W. Shariff, N. E. O’Connor, J. Lemley, and S. Little, “Evaluating image-based face and eye tracking with event cameras,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 224–240
work page 2024
-
[49]
Event-based multi-task facial landmark and blink detection,
P. Kielty, C. Ryan, W. Shariff, J. Lemley, and P. Corcoran, “Event-based multi-task facial landmark and blink detection,”IEEE Access, 2025
work page 2025
-
[50]
Event camera data pre-training,
Y . Yang, L. Pan, and L. Liu, “Event camera data pre-training,” inProc. Int. Conf. Comput. Vis., 2023, pp. 10 699–10 709
work page 2023
-
[51]
Masked event modeling: Self-supervised pretraining for event cameras,
S. Klenk, D. Bonello, L. Koestler, N. Araslanov, and D. Cremers, “Masked event modeling: Self-supervised pretraining for event cameras,” inProc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024, pp. 2378– 2388
work page 2024
-
[52]
Learning to exploit multiple vision modalities by using grafted networks,
Y . Hu, T. Delbruck, and S.-C. Liu, “Learning to exploit multiple vision modalities by using grafted networks,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 85–101
work page 2020
-
[53]
Eventclip: Adapting clip for event-based object recognition.arXiv preprint arXiv:2306.06354, 2023
Z. Wu, X. Liu, and I. Gilitschenski, “Eventclip: Adapting clip for event- based object recognition,”arXiv preprint arXiv:2306.06354, 2023
-
[54]
Velora: A low-rank adaptation approach for efficient rgb-event based recognition,
L. Chen, H. Yang, P. Shao, H. Song, X. Wang, Z. Zhao, Y . Wang, and Y . Tian, “Velora: A low-rank adaptation approach for efficient rgb-event based recognition,”arXiv preprint arXiv:2412.20064, 2024
-
[55]
Spiking transfer learning from rgb image to neuromorphic event stream,
Q. Zhan, G. Liu, X. Xie, R. Tao, M. Zhang, and H. Tang, “Spiking transfer learning from rgb image to neuromorphic event stream,”IEEE Trans. Image Process., vol. 3, pp. 4274–4287, Jul. 2024
work page 2024
-
[56]
Leveraging rgb images for pre-training of event-based hand pose estimation,
R. Liu, T. Ohkawa, T. H. E. Tse, M. Zhang, A. Yao, and Y . Sato, “Leveraging rgb images for pre-training of event-based hand pose estimation,”arXiv preprint arXiv:2509.16949, 2025
-
[57]
Low-rank adaptation for foundation models: A comprehensive review.arXiv preprint arXiv:2501.00365,
M. Yang, J. Chen, J. Tao, Y . Zhang, J. Liu, J. Zhang, Q. Ma, H. Verma, R. Zhang, M. Zhouet al., “Low-rank adaptation for foundation models: A comprehensive review,”arXiv preprint arXiv:2501.00365, 2024
-
[58]
Grad-cam: Visual explanations from deep networks via gradient-based localization,
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” inProc. Int. Conf. Comput. Vis., 2017, pp. 618–626
work page 2017
-
[59]
Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures,
Y . Duan, W. Wang, Z. Chen, X. Zhu, L. Lu, T. Lu, Y . Qiao, H. Li, J. Dai, and W. Wang, “Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures,” inProc. Int. Conf. Learn. Representations, 2024
work page 2024
-
[60]
Restore-rwkv: Efficient and effective medical image restoration with rwkv,
Z. Yang, J. Li, H. Zhang, D. Zhao, B. Wei, and Y . Xu, “Restore-rwkv: Efficient and effective medical image restoration with rwkv,”IEEE J. Biomed. Health Inform., vol. 30, no. 1, Jan. 2026
work page 2026
-
[61]
Webface260m: A benchmark unveiling the power of million-scale deep face recognition,
Z. Zhu, G. Huang, J. Deng, Y . Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Duet al., “Webface260m: A benchmark unveiling the power of million-scale deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 492–10 502
work page 2021
-
[62]
Magface: A universal representation for face recognition and quality assessment,
Q. Meng, S. Zhao, Z. Huang, and F. Zhou, “Magface: A universal representation for face recognition and quality assessment,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 225– 14 234
work page 2021
-
[63]
Sphereface2: Binary classification is all you need for deep face recognition,
Y . Wen, W. Liu, A. Weller, B. Raj, and R. Singh, “Sphereface2: Binary classification is all you need for deep face recognition,” inProc. Int. Conf. Learn. Representations, 2022
work page 2022
-
[64]
Uniface: Unified cross- entropy loss for deep face recognition,
J. Zhou, X. Jia, Q. Li, L. Shen, and J. Duan, “Uniface: Unified cross- entropy loss for deep face recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 20 730–20 739
work page 2023
-
[65]
Unitsface: Unified threshold integrated sample-to-sample loss for face recognition,
X. Jia, J. Zhou, L. Shen, J. Duanet al., “Unitsface: Unified threshold integrated sample-to-sample loss for face recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 36, pp. 32 732–32 747, 2023
work page 2023
-
[66]
RVface: Reliable vector guided softmax loss for face recognition,
X. Wang, S. Wang, Y . Liang, L. Gu, and Z. Lei, “RVface: Reliable vector guided softmax loss for face recognition,”IEEE Trans. Image Process., vol. 31, pp. 2337–2351, Mar. 2022
work page 2022
-
[67]
Topofr: A closer look at topology alignment on face recognition,
J. Dan, Y . Liu, J. Deng, H. Xie, S. Li, B. Sun, and S. Luo, “Topofr: A closer look at topology alignment on face recognition,”Proc. Adv. Neural Inf. Process. Syst., vol. 37, pp. 37 213–37 240, 2024
work page 2024
-
[68]
Face reconstruction from partially leaked facial embeddings,
H. O. Shahreza and S. Marcel, “Face reconstruction from partially leaked facial embeddings,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2024, pp. 4930–4934
work page 2024
-
[69]
High speed and high dynamic range video with an event camera,
H. Rebecq, R. Ranftl, V . Koltun, and D. Scaramuzza, “High speed and high dynamic range video with an event camera,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 6, pp. 1964–1980, Dec. 2019
work page 1964
-
[70]
P. R. G. Cadena, Y . Qian, C. Wang, and M. Yang, “Sparse-e2vid: A sparse convolutional model for event-based video reconstruction trained with real event noise,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2023, pp. 4150–4158. Qingguo Mengreceived his B.Eng. degree in com- puter science and technology from Henan Poly- technic Universi...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.