pith. machine review for the scientific record. sign in

arxiv: 2604.09127 · v1 · submitted 2026-04-10 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

FaceLiVTv2: An Improved Hybrid Architecture for Efficient Mobile Face Recognition

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:57 UTC · model grok-4.3

classification 💻 cs.CV
keywords lightweight face recognitionhybrid CNN-Transformermobile inferenceLite MHLARepMix blockglobal token interactionedge devices
0
0 comments X

The pith

FaceLiVTv2 refines a hybrid CNN-Transformer design to deliver better accuracy at lower latency for mobile face recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FaceLiVTv2 as an updated hybrid architecture aimed at running face recognition reliably on phones and edge devices that have tight limits on speed, memory, and power. It replaces the earlier multi-layer attention with Lite MHLA, which handles global token interactions using only linear projections per head plus simple affine rescaling to cut redundancy while keeping head diversity. These changes sit inside a RepMix block that also adds global depthwise convolution for local spatial mixing. Tests across LFW, CA-LFW, CP-LFW, CFP-FP, AgeDB-30, and IJB benchmarks show the model keeps or raises accuracy while cutting mobile inference time by 22 percent versus the first version and beating several other lightweight nets on speed. Readers would care because the work targets the exact constraints that decide whether real-time face recognition can run on ordinary hardware without cloud help.

Core claim

FaceLiVTv2 achieves improved accuracy-efficiency trade-offs by introducing Lite MHLA, which reduces redundancy in global token interactions through multi-head linear projections and affine rescale transformations, and embedding it within a RepMix block that unifies local-global feature coordination with global depthwise convolution.

What carries the argument

Lite MHLA, a lightweight global token interaction module that replaces multi-layer attention with multi-head linear token projections and affine rescale transformations while preserving head diversity.

Load-bearing premise

The measured gains in latency and accuracy come primarily from the Lite MHLA and RepMix architectural changes rather than from unreported differences in training data, optimization settings, or hardware tuning.

What would settle it

Re-training FaceLiVTv2 and the compared models from scratch with identical data, optimizer settings, and hardware, then observing no latency reduction or accuracy gain, would show that the architectural claims do not hold.

Figures

Figures reproduced from arXiv: 2604.09127 by Chi-Chia Sun, Jun-Wei Hsieh, Mao-Hsiu Hsu, Novendra Setyawan, Wen-Kai Kuo.

Figure 1
Figure 1. Figure 1: Comparison of our proposed FaceLiVTv2 with other [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FaceLiVTv2 architecture with Lite MHLA and structural reparameterization. Stages 1 and 2 use RepMix and the last [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Architecture comparison of (a) LiteMHLA in FaceLiVTv2 and (b) MHLA in FaceLiVTv1, (c) FaceLiVTv2 Block and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Structural Reparameterization on RepMix Block. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Pytorch style code of Lite MHLA. TABLE 3: Architecture Details of FaceLiVTv2 Variants. Stage Layer Res. Kernel n FaceLiVTv2 Channels Size XS S M L Input Image 1122 - 1 - - - - Stem Conv 562 3×3 1 16 24 28 32 Conv 282 3×3 1 32 48 56 64 Stage-1 RepMix Enc. 282 3×3 3 32 48 56 64 Downsampling 142 3×3 1 64 96 112 128 Stage-2 RepMix Enc. 142 3×3 3 64 96 112 128 Downsampling 7 2 3×3 1 128 192 224 256 Stage-3 Lite… view at source ↗
Figure 6
Figure 6. Figure 6: Model in CoreML runtime breakdown. Operations [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of our proposed FaceLiVTv2 with other [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance evolution for FaceLiVTv2-S across dif [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Receiver Operating Characteristic of FaceLiVTv2 on [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Latency test of (1) FaceLiVTv2-XS, (2) FaceLiVTv2- [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Compute-bound operations (Tensor MatMul, convolution, Pooling, and Activation) are shown in blue and orange tones, while memory-bound opera￾tions (Elementwise, Normalization, Copy/Reshape) [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 11
Figure 11. Figure 11: Layer distribution provided by XCode. are displayed in lighter colors for contrast. The final figure summarizes the proportion of compute versus memory operations for each model, allowing direct comparison of architectural efficiency across Transformer-based and con￾volutional designs. ONNX LATENCY EVALUATION SETUP The inference latency of all FaceLiVTv2 variants was mea￾sured using RTX-5090 GPU, Intel i5… view at source ↗
read the original abstract

Lightweight face recognition is increasingly important for deployment on edge and mobile devices, where strict constraints on latency, memory, and energy consumption must be met alongside reliable accuracy. Although recent hybrid CNN-Transformer architectures have advanced global context modeling, striking an effective balance between recognition performance and computational efficiency remains an open challenge. In this work, we present FaceLiVTv2, an improved version of our FaceLiVT hybrid architecture designed for efficient global--local feature interaction in mobile face recognition. At its core is Lite MHLA, a lightweight global token interaction module that replaces the original multi-layer attention design with multi-head linear token projections and affine rescale transformations, reducing redundancy while preserving representational diversity across heads. We further integrate Lite MHLA into a unified RepMix block that coordinates local and global feature interactions and adopts global depthwise convolution for adaptive spatial aggregation in the embedding stage. Under our experimental setup, results on LFW, CA-LFW, CP-LFW, CFP-FP, AgeDB-30, and IJB show that FaceLiVTv2 consistently improves the accuracy-efficiency trade-off over existing lightweight methods. Notably, FaceLiVTv2 reduces mobile inference latency by 22% relative to FaceLiVTv1, achieves speedups of up to 30.8% over GhostFaceNets on mobile devices, and delivers 20-41% latency improvements over EdgeFace and KANFace across platforms while maintaining higher recognition accuracy. These results demonstrate that FaceLiVTv2 offers a practical and deployable solution for real-time face recognition. Code is available at https://github.com/novendrastywn/FaceLiVT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces FaceLiVTv2, an improved hybrid CNN-Transformer architecture for efficient mobile face recognition. It replaces the original multi-layer attention with Lite MHLA (multi-head linear token projections plus affine rescale) and integrates it into a RepMix block that adds global depthwise convolution for spatial aggregation. Under the authors' experimental setup, FaceLiVTv2 is reported to improve the accuracy-efficiency trade-off on LFW, CA-LFW, CP-LFW, CFP-FP, AgeDB-30, and IJB benchmarks, with concrete gains of 22% lower mobile latency versus FaceLiVTv1, up to 30.8% speedup versus GhostFaceNets, and 20-41% latency reductions versus EdgeFace and KANFace while preserving higher accuracy. Code is released.

Significance. If the accuracy and latency deltas are shown to arise from the Lite MHLA and RepMix substitutions under matched training and measurement conditions, the work would provide a practical, deployable advance for real-time face recognition on edge devices. The public code release is a positive factor for reproducibility.

major comments (1)
  1. [Abstract] Abstract: the central claim that FaceLiVTv2 improves the accuracy-efficiency trade-off 'under our experimental setup' rests on direct comparisons to FaceLiVTv1, GhostFaceNets, EdgeFace, and KANFace, yet no information is supplied on whether those baselines were retrained with identical data, augmentations, loss, optimizer schedule, input resolution, or hardware. Face-recognition accuracy is known to vary several percent from such factors alone; without this control the attribution of the reported 22%, 30.8%, and 20-41% deltas specifically to Lite MHLA and RepMix cannot be verified and is load-bearing for the paper's conclusion.
minor comments (1)
  1. The abstract refers to 'multi-layer attention design' in the original FaceLiVT without a concise recap of the differences; a short comparison table or paragraph in the introduction would help readers assess the incremental contribution of Lite MHLA.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the concern regarding baseline comparisons point by point below and will revise the manuscript to improve clarity on the experimental controls.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that FaceLiVTv2 improves the accuracy-efficiency trade-off 'under our experimental setup' rests on direct comparisons to FaceLiVTv1, GhostFaceNets, EdgeFace, and KANFace, yet no information is supplied on whether those baselines were retrained with identical data, augmentations, loss, optimizer schedule, input resolution, or hardware. Face-recognition accuracy is known to vary several percent from such factors alone; without this control the attribution of the reported 22%, 30.8%, and 20-41% deltas specifically to Lite MHLA and RepMix cannot be verified and is load-bearing for the paper's conclusion.

    Authors: We agree that the manuscript should explicitly document the training and measurement protocols to allow readers to verify the source of the reported deltas. In the revised manuscript we will add a new subsection (4.1.1) titled 'Baseline Implementation and Training Protocol' that details: (i) which baselines were re-implemented and trained from scratch using the exact same dataset splits, augmentations, loss (ArcFace with identical margin and scale), optimizer (SGD with the same momentum and weight decay), learning-rate schedule, and input resolution (112×112) as FaceLiVTv2; (ii) which baselines used official pre-trained weights (with citations to their original papers) and the hardware/platform used for latency measurement (same mobile device and batch size=1); and (iii) a table summarizing these controls for every compared method. Latency numbers were all obtained on the same device under identical conditions. These additions will make the attribution to Lite MHLA and RepMix verifiable while preserving the original experimental results. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture design and empirical claims are independent of self-referential inputs

full rationale

The paper introduces Lite MHLA (multi-head linear projections + affine rescale) and RepMix (with global depthwise conv) as explicit design choices for the FaceLiVTv2 hybrid block, then reports measured accuracy and latency on standard face-recognition benchmarks against external baselines (GhostFaceNets, EdgeFace, KANFace, prior FaceLiVTv1). No equations, fitted parameters, or predictions appear in the provided text; performance deltas are direct experimental outcomes rather than quantities forced by construction from the same inputs. Self-reference to FaceLiVTv1 is versioning only and not invoked as a uniqueness theorem or load-bearing premise. The derivation chain consists of engineering substitutions validated externally, satisfying the self-contained benchmark criterion for score 0.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

Based on abstract only; the central claims rest on standard deep learning training assumptions and the effectiveness of newly introduced architectural components demonstrated through empirical comparison.

free parameters (1)
  • model hyperparameters and training settings
    Typical in neural network papers; specific values not provided in abstract but required for replication.
axioms (1)
  • domain assumption Standard assumptions of i.i.d. training data and convergence of gradient-based optimization
    Implicit in all empirical deep learning work; invoked when reporting benchmark results.
invented entities (2)
  • Lite MHLA no independent evidence
    purpose: lightweight global token interaction module replacing multi-layer attention
    New module introduced to reduce redundancy while preserving head diversity.
  • RepMix block no independent evidence
    purpose: unified block coordinating local and global feature interactions
    New architectural component integrating Lite MHLA with local features.

pith-pipeline@v0.9.0 · 5619 in / 1342 out tokens · 40685 ms · 2026-05-10T17:57:01.131116+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    A comprehensive review of face recognition techniques, trends, and challenges,

    H. L. Gururaj, B. C. Soundarya, S. Priya, J. Shreyas, and F. Flam- mini, “A comprehensive review of face recognition techniques, trends, and challenges,” IEEE Access, vol. 12, pp. 107903–107926, 2024

  2. [2]

    Detect faces efficiently: A survey and evaluations,

    Y. Feng, S. Yu, H. Peng, Y.-R. Li, and J. Zhang, “Detect faces efficiently: A survey and evaluations,” IEEE Trans. Biom. Behav. Identity Sci., vol. 4, no. 1, pp. 1–18, 2022

  3. [3]

    Edgeface: Efficient face recognition model for edge devices,

    A. George, C. Ecabert, H. O. Shahreza, K. Kotwal, and S. Marcel, “Edgeface: Efficient face recognition model for edge devices,” IEEE Trans. Biom. Behav. Identity Sci., 2024

  4. [4]

    Swiftfaceformer: An efficient and lightweight hybrid architecture for accurate face recognition applications,

    L. S. Luevano, Y. Mart ´ınez-D´ıaz, H. M ´endez-V´azquez, M. Gonz ´alez-Mendoza, and D. Frey, “Swiftfaceformer: An efficient and lightweight hybrid architecture for accurate face recognition applications,” in Int. Conf. Pattern Recognit. (ICPR) , pp. 244–258, Springer, 2024

  5. [5]

    Kanface: A novel approach to face recognition using kolmogorov-arnold net- works,

    H. Pham, B. Dao, P . N. Tran, T. Nguyen, and C. Do, “Kanface: A novel approach to face recognition using kolmogorov-arnold net- works,” in Int. Symp. Neural Netw. (ISNN) , pp. 149–160, Springer, 2025

  6. [6]

    Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices,

    S. Chen, Y. Liu, X. Gao, and Z. Han, “Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices,” in Chin. Conf. Biometric Recognit. (CCBR), pp. 428–438, Springer, 2018

  7. [7]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017

  8. [8]

    Ghostfacenets: Lightweight face recognition model from cheap operations,

    M. Alansari, O. A. Hay, S. Javed, A. Shoufan, Y. Zweiri, and N. Werghi, “Ghostfacenets: Lightweight face recognition model from cheap operations,” IEEE Access , vol. 11, pp. 35429–35446, 2023

  9. [9]

    Ghostnet: More features from cheap operations,

    K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap operations,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1580–1589, 2020

  10. [10]

    Transface: Calibrating transformer training for face recognition from a data- centric perspective,

    J. Dan, Y. Liu, H. Xie, J. Deng, H. Xie, X. Xie, and B. Sun, “Transface: Calibrating transformer training for face recognition from a data- centric perspective,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 20642–20653, 2023

  11. [11]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. , “An image is worth 16x16 words: Transformers for image recognition at scale,” Int. Conf. Learn. Represent. (ICLR), 2021

  12. [12]

    Edgenext: efficiently amalgamated cnn-transformer architecture for mobile vision applications,

    M. Maaz, A. Shaker, H. Cholakkal, S. Khan, S. W. Zamir, R. M. Anwer, and F. Shahbaz Khan, “Edgenext: efficiently amalgamated cnn-transformer architecture for mobile vision applications,” in Eur. Conf. Comput. Vis. (ECCV), pp. 3–20, Springer, 2022

  13. [13]

    Kan: Kolmogorov-arnold networks,

    Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljaˇci´c, T. Y. Hou, and M. Tegmark, “Kan: Kolmogorov-arnold networks,” Int. Conf. Learn. Represent. (ICLR), vol. 1, no. 2, p. 3, 2025

  14. [14]

    Swiftformer: Efficient additive attention for transformer- based real-time mobile vision applications,

    A. Shaker, M. Maaz, H. Rasheed, S. Khan, M.-H. Yang, and F. S. Khan, “Swiftformer: Efficient additive attention for transformer- based real-time mobile vision applications,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 17425–17436, 2023

  15. [15]

    Facelivt: Face recognition using linear vision transformer with structural reparameterization for mobile device,

    N. Setyawan, C.-C. Sun, M.-H. Hsu, W.-K. Kuo, and J.-W. Hsieh, “Facelivt: Face recognition using linear vision transformer with structural reparameterization for mobile device,” in Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 1720–1725, 2025

  16. [16]

    Facelivt: Energy efficient face recognition with linear vision transformer for limited resource device,

    N. Setyawan, J.-X. Gu, C.-C. Sun, M.-H. Hsu, W.-K. Kuo, C.-A. Shen, and J.-W. Hsieh, “Facelivt: Energy efficient face recognition with linear vision transformer for limited resource device,” inProc. IEEE Asia Pacific Conf. Circuits Syst. (APCCAS), pp. 1–5, IEEE, 2025

  17. [17]

    Arcface: Additive angular margin loss for deep face recognition,

    J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4690–4699, 2019

  18. [18]

    Texture-guided transfer learning for low-quality face recognition,

    M. Zhang, R. Liu, D. Deguchi, and H. Murase, “Texture-guided transfer learning for low-quality face recognition,” IEEE Trans. Image Process., vol. 33, pp. 95–107, 2023

  19. [19]

    Benchmarking lightweight face archi- tectures on specific face recognition scenarios,

    Y. Martinez-Diaz et al. , “Benchmarking lightweight face archi- tectures on specific face recognition scenarios,” Artif. Intell. Rev. , vol. 54, no. 8, pp. 6201–6244, 2021

  20. [20]

    Vargfacenet: An efficient variable group convolutional neural network for lightweight face recognition,

    M. Yan, M. Zhao, Z. Xu, Q. Zhang, G. Wang, and Z. Su, “Vargfacenet: An efficient variable group convolutional neural network for lightweight face recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), pp. 0–0, 2019

  21. [21]

    Mixfacenets: Extremely efficient face recognition networks,

    F. Boutros, N. Damer, M. Fang, F. Kirchbuchner, and A. Kuijper, “Mixfacenets: Extremely efficient face recognition networks,” in IEEE Int. Joint Conf. Biometrics (IJCB) , pp. 1–8, IEEE, 2021

  22. [22]

    Face transformer for recognition,

    Y. Zhong and W. Deng, “Face transformer for recognition,” arXiv preprint arXiv:2103.14803, 2021

  23. [23]

    Cosface: Large margin cosine loss for deep face recognition,

    H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 5265–5274, 2018

  24. [24]

    Magface: A univer- sal representation for face recognition and quality assessment,

    Q. Meng, S. Zhao, Z. Huang, and F. Zhou, “Magface: A univer- sal representation for face recognition and quality assessment,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) , pp. 14225–14234, 2021

  25. [25]

    Elasticface: Elastic margin loss for deep face recognition,

    F. Boutros, N. Damer, F. Kirchbuchner, and A. Kuijper, “Elasticface: Elastic margin loss for deep face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1578–1587, 2022

  26. [26]

    Adaface: Quality adaptive margin for face recognition,

    M. Kim, A. K. Jain, and X. Liu, “Adaface: Quality adaptive margin for face recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 18750–18759, 2022

  27. [27]

    Topofr: A closer look at topology alignment on face recognition,

    J. Dan, Y. Liu, J. Deng, H. Xie, S. Li, B. Sun, and S. Luo, “Topofr: A closer look at topology alignment on face recognition,”Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 37, pp. 37213–37240, 2024

  28. [28]

    Arface: Attention-aware and regularization for face recognition with reinforcement learning,

    L. Zhang, L. Sun, L. Yu, X. Dong, J. Chen, W. Cai, C. Wang, and X. Ning, “Arface: Attention-aware and regularization for face recognition with reinforcement learning,” IEEE Trans. Biom. Behav. Identity Sci., vol. 4, no. 1, pp. 30–42, 2022

  29. [29]

    Catface: Cross-attribute-guided transformer with self-attention distillation for low-quality face recognition,

    N. Alipour Talemi, H. Kashiani, and N. M. Nasrabadi, “Catface: Cross-attribute-guided transformer with self-attention distillation for low-quality face recognition,” IEEE Trans. Biom. Behav. Identity Sci., vol. 6, no. 1, pp. 132–146, 2024

  30. [30]

    Resolution invariant face recognition using a distillation approach,

    S. S. Khalid, M. Awais, Z.-H. Feng, C.-H. Chan, A. Farooq, A. Ak- bari, and J. Kittler, “Resolution invariant face recognition using a distillation approach,” IEEE Trans. Biom. Behav. Identity Sci., vol. 2, no. 4, pp. 410–420, 2020

  31. [31]

    Labeled faces in the wild: A database for studying face recognition in unconstrained environments,

    G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,” in Workshop Faces Real-Life Images: Detection, Alignment, Recognit., 2008

  32. [32]

    Agedb: the first manually collected, in-the-wild age database,

    S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou, “Agedb: the first manually collected, in-the-wild age database,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), pp. 51–59, 2017

  33. [33]

    Shufflefacenet: A lightweight face architecture for efficient and highly-accurate face recognition,

    Y. M.-D., L. S. Luevano, H. Me., M. N.-D., L. Chang, and M. G.- M., “Shufflefacenet: A lightweight face architecture for efficient and highly-accurate face recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), pp. 0–0, 2019

  34. [34]

    Transface++: Re- thinking the face recognition paradigm with a focus on accuracy, efficiency, and security,

    J. Dan, Y. Liu, B. Sun, J. Deng, and S. Luo, “Transface++: Re- thinking the face recognition paradigm with a focus on accuracy, efficiency, and security,” IEEE Trans. Pattern Anal. Mach. Intell. , 2025

  35. [35]

    Mobilefaceformer: a lightweight face recognition model against face variations,

    J. Li, L. Zhou, and J. Chen, “Mobilefaceformer: a lightweight face recognition model against face variations,” Multimed. Tools Appl., vol. 83, no. 5, pp. 12669–12685, 2024

  36. [36]

    Effi- cientvit: Memory efficient vision transformer with cascaded group attention,

    X. Liu, H. Peng, N. Zheng, Y. Yang, H. Hu, and Y. Yuan, “Effi- cientvit: Memory efficient vision transformer with cascaded group attention,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 14420–14430, 2023

  37. [37]

    Edgevits: Competing light-weight cnns on mobile devices with vision transformers,

    J. Pan, A. Bulat, F. Tan, X. Zhu, L. Dudziak, H. Li, G. Tzimiropou- los, and B. Martinez, “Edgevits: Competing light-weight cnns on mobile devices with vision transformers,” in Eur. Conf. Comput. Vis. (ECCV), pp. 294–311, Springer, 2022

  38. [38]

    Resmlp: Feedforward networks for image classification with data-efficient training,

    H. Touvron, P . Bojanowski, M. Caron, M. Cord, A. El-Nouby, E. Grave, G. Izacard, A. Joulin, G. Synnaeve, J. Verbeek, et al. , “Resmlp: Feedforward networks for image classification with data-efficient training,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 45, no. 4, pp. 5314–5321, 2022

  39. [39]

    Pocketnet: Extreme lightweight face recognition net- work using neural architecture search and multistep knowledge distillation,

    F. Boutros, P . Siebke, M. Klemt, N. Damer, F. Kirchbuchner, and A. Kuijper, “Pocketnet: Extreme lightweight face recognition net- work using neural architecture search and multistep knowledge distillation,” IEEE Access, vol. 10, pp. 46823–46833, 2022. IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE 14

  40. [40]

    Partial fc: Training 10 million identities on a sin- gle machine,

    X. An et al. , “Partial fc: Training 10 million identities on a sin- gle machine,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) , pp. 1445–1449, 2021

  41. [41]

    Low-resolution face recognition,

    Z. Cheng, X. Zhu, and S. Gong, “Low-resolution face recognition,” in Asian Conf. Comput. Vis. (ACCV), pp. 605–621, Springer, 2018

  42. [42]

    Cross-Age LFW: A Database for Studying Cross-Age Face Recognition in Unconstrained Environments

    T. Zheng, W. Deng, and J. Hu, “Cross-age lfw: A database for studying cross-age face recognition in unconstrained environ- ments,” arXiv preprint arXiv:1708.08197, 2017

  43. [43]

    Cross-pose lfw: A database for study- ing cross-pose face recognition in unconstrained environments,

    T. Zheng and W. Deng, “Cross-pose lfw: A database for study- ing cross-pose face recognition in unconstrained environments,” Beijing Univ. Posts Telecommun., Tech. Rep, vol. 5, no. 7, p. 5, 2018

  44. [44]

    Frontal to profile face verification in the wild,

    S. Sengupta, J.-C. Chen, C. Castillo, V . M. Patel, R. Chellappa, and D. W. Jacobs, “Frontal to profile face verification in the wild,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV) , pp. 1–9, IEEE, 2016

  45. [45]

    Iarpa janus benchmark-b face dataset,

    C. Whitelam et al., “Iarpa janus benchmark-b face dataset,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), pp. 90–98, 2017

  46. [46]

    Iarpa janus benchmark-c: Face dataset and protocol,

    B. Maze, J. Adams, J. A. Duncan, N. Kalka, T. Miller, C. Otto, A. K. Jain, W. T. Niggel, J. Anderson, J. Cheney, et al., “Iarpa janus benchmark-c: Face dataset and protocol,” in Int. Conf. Biometrics (ICB), pp. 158–165, IEEE, 2018

  47. [47]

    Transport-based single frame super resolution of very low resolution face images,

    S. Kolouri and G. K. Rohde, “Transport-based single frame super resolution of very low resolution face images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4876–4884, 2015. Novendra Setyawan (Student Member, IEEE) received the B.Eng. degree in Electrical Engi- neering from the University of Muhammadiyah Malang, Indonesia, in 2015, a...

  48. [48]

    He and his students received the Silver Medal of 2019 National College Software Creation Competition, the Silver Medal of 2018 Na- tional Microcomputer Competition, the Best Paper awards of Information Technology and Applications in Outlying Islands Conference, in 2013, 2014, 2016, 2017, 2018, 2021, and 2022, respectively, the Best Paper Award of Tanet 20...

  49. [49]

    neuralnetwork

    on Glint360K [40]. The AdamW optimizer and a polyno- mial decay learning rate schedule are utilized. The starting learning rate is 6 × 10−3 with a minimum of 1 × 10−5. The batch size is 342 on each 3× NVIDIA RTX-A6000 GPU, and the weight decay is 1 × 10−4. The model is trained for 50 epochs with a pre-processing resolution of (112 × 112). TABLE 14: FaceLi...

  50. [50]

    ACCURACY GAP FaceLiVTv2-L demonstrates the most compelling accuracy- efficiency trade-off among the evaluated lightweight mod- els

    — alongside the FaceLiVTv2 variants. ACCURACY GAP FaceLiVTv2-L demonstrates the most compelling accuracy- efficiency trade-off among the evaluated lightweight mod- els. Against ResNet200-TopoFR, FaceLiVTv2-L narrows the mean accuracy gap to only 0.69% while requiring 13.9× fewer parameters and 76.1× fewer FLOPs. Remark- ably, FaceLiVTv2-L achieves 96.59% ...