Flow Augmentation and Knowledge Distillation for Lightweight Face Presentation Attack Detection
Pith reviewed 2026-05-14 19:50 UTC · model grok-4.3
The pith
Lightweight RGB-only student learns motion cues via distillation from a flow-augmented teacher, matching detection accuracy without computing optical flow at inference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A dual-branch teacher that fuses RGB frames with colorwheel-encoded optical flow produces motion-aware representations that can be transferred through logit distillation to an RGB-only student; the student thereby learns to detect presentation attacks by implicitly modeling motion and temporal consistency without any flow computation or extra blocks at inference time.
What carries the argument
Logit distillation that transfers motion-aware knowledge from the flow-augmented dual-branch teacher to the lightweight RGB-only student.
If this is right
- The student reaches 0.0 percent HTER on both Replay-Attack and Replay-Mobile datasets.
- It records 0.94 percent HTER on ROSE-Youtu and 0.42 percent ACER on OULU-NPU.
- Parameter and FLOP counts drop sharply compared with the teacher while inference speed reaches 52 FPS on an NVIDIA Jetson Orin Nano.
- The same RGB-only architecture handles 2D print, replay, 3D mask, makeup, and occlusion attacks under varied capture conditions.
Where Pith is reading between the lines
- The same training-time flow augmentation plus distillation pattern could be tested on other video tasks where motion matters but real-time RGB inference is required.
- Extending the student to accept occasional depth or infrared frames at training time might further improve robustness without changing the inference footprint.
- If the distillation generalizes, similar teacher-student pairs could reduce the cost of temporal modeling in surveillance or medical video analysis.
Load-bearing premise
That the teacher's logit outputs alone carry enough motion-discriminative signal for the student to replicate accurate behavior from RGB inputs only.
What would settle it
If the student shows a large rise in HTER relative to the teacher on a replay-attack dataset that emphasizes subtle motion differences, the distillation would be shown not to have transferred the necessary features.
Figures
read the original abstract
Face presentation attack detection (FacePAD) remains challenging under diverse spoofing representation, including 2D print and replay, 3D mask-based spoofing, makeup-induced appearance manipulation, and physical occlusions, as well as under varying capture conditions. Motion cues are highly discriminative for FacePAD but typically require explicit optical flow estimation, which introduces substantial computational overhead and limits real-time deployment. In this work, we leverage optical flow to enhance motion representation during training while eliminating the need for flow computation at inference. We propose a dual-branch teacher model that fuses appearance cues from RGB frames with motion cues derived from colorwheel-encoded optical flow, enabling effective modeling of micro-motions and temporal consistency. To enable efficient deployment, we introduce a knowledge distillation framework that transfers motion-aware knowledge from the flow-augmented teacher to a lightweight RGB-only student via logit distillation. As a result, the student implicitly learns motion-sensitive representations without requiring explicit flow estimation or additional feature extraction blocks at inference. Extensive experiments demonstrate strong performance across multiple benchmarks, achieving 0.0% HTER on Replay-Attack and Replay-Mobile, 0.94% HTER on ROSE-Youtu, 5.65% HTER on SiW-Mv2, and 0.42% ACER on OULU-NPU. The distilled student achieves performance comparable to or better than the teacher while significantly reducing parameters and FLOPs, achieving 52 FPS on an NVIDIA Jetson Orin Nano, indicating its suitability for real-time and resource-constrained FacePAD deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a dual-branch teacher model that fuses RGB appearance features with motion cues from colorwheel-encoded optical flow to improve FacePAD. It then applies logit distillation to transfer this knowledge to a lightweight RGB-only student, allowing the student to implicitly capture motion-sensitive representations without explicit flow computation or extra blocks at inference. The student reportedly achieves 0.0% HTER on Replay-Attack and Replay-Mobile, 0.94% HTER on ROSE-Youtu, 5.65% HTER on SiW-Mv2, and 0.42% ACER on OULU-NPU, while running at 52 FPS on an NVIDIA Jetson Orin Nano with reduced parameters and FLOPs compared to the teacher.
Significance. If the logit distillation successfully equips the RGB student with motion-discriminative capability, the approach would provide a practical route to real-time, resource-efficient FacePAD on edge devices by eliminating flow overhead at inference while matching or exceeding teacher performance. The reported benchmark numbers indicate potential utility for deployment under diverse spoofing conditions, though the significance depends on confirming that the gains stem from the proposed flow-augmented transfer rather than the student architecture or data alone.
major comments (2)
- [Experiments] Experiments section: the manuscript does not include an ablation comparing the distilled RGB-only student against an identical student architecture trained with standard supervision on RGB frames alone (no teacher logits). This comparison is load-bearing for the central claim, as the 0.0% HTER on Replay-Attack/Replay-Mobile could otherwise arise from the student's own temporal modeling of RGB sequences or dataset biases rather than successful transfer of motion cues from the flow-augmented teacher.
- [Method] Method and Experiments sections: training protocols, data splits, hyperparameter settings, and exact distillation loss formulation are not detailed sufficiently to allow reproduction or to verify that the student truly acquires implicit motion sensitivity beyond what a plain RGB baseline would achieve.
minor comments (2)
- [Abstract] Abstract: the abstract omits any mention of training protocols, data splits, or ablation studies, which would better contextualize the strong reported HTER values.
- [Figures] Figures: ensure the dual-branch teacher architecture and the distillation pipeline diagrams include explicit labels for all input branches, fusion points, and loss terms to improve clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. The comments highlight important aspects for strengthening the central claims of our work on flow-augmented distillation for lightweight FacePAD. We address each major comment below and will revise the manuscript accordingly to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the manuscript does not include an ablation comparing the distilled RGB-only student against an identical student architecture trained with standard supervision on RGB frames alone (no teacher logits). This comparison is load-bearing for the central claim, as the 0.0% HTER on Replay-Attack/Replay-Mobile could otherwise arise from the student's own temporal modeling of RGB sequences or dataset biases rather than successful transfer of motion cues from the flow-augmented teacher.
Authors: We agree that this ablation is essential to isolate the contribution of the logit distillation. In the revised manuscript, we will add a direct comparison of the identical RGB-only student architecture trained under standard supervision (cross-entropy loss on RGB frames) versus the distilled version using teacher logits. The updated Experiments section will report the resulting HTER/ACER metrics on the same benchmarks, demonstrating that distillation yields measurable gains attributable to implicit motion sensitivity transferred from the flow-augmented teacher. revision: yes
-
Referee: [Method] Method and Experiments sections: training protocols, data splits, hyperparameter settings, and exact distillation loss formulation are not detailed sufficiently to allow reproduction or to verify that the student truly acquires implicit motion sensitivity beyond what a plain RGB baseline would achieve.
Authors: We acknowledge the need for greater detail to support reproducibility. The revised manuscript will expand both the Method and Experiments sections to include: (i) the precise distillation loss formulation (temperature-scaled KL divergence between teacher and student logits, combined with any auxiliary terms), (ii) full training protocols including optimizer, learning rate schedule, batch size, and number of epochs, (iii) exact data splits and preprocessing for each benchmark dataset, and (iv) all hyperparameter values. These additions will enable verification that performance improvements stem from the proposed knowledge transfer rather than baseline RGB training alone. revision: yes
Circularity Check
No significant circularity; purely empirical claims on external benchmarks
full rationale
The paper contains no equations, derivations, or parameter-fitting steps that could reduce to inputs by construction. All performance numbers (HTER, ACER, FPS) are reported as direct experimental outcomes on public datasets (Replay-Attack, Replay-Mobile, ROSE-Youtu, SiW-Mv2, OULU-NPU). The knowledge-distillation procedure is described at the level of training protocol rather than any closed-form identity or self-referential prediction. No self-citation is invoked to justify uniqueness or to substitute for an independent result. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dual-branch teacher model that fuses appearance cues from RGB frames with motion cues derived from colorwheel-encoded optical flow... logit-based knowledge distillation
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
achieving 52 FPS on an NVIDIA Jetson Orin Nano
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Deep learning for face anti-spoofing: A survey,
Z. Yu, Y . Qin, X. Li, C. Zhao, Z. Lei, and G. Zhao, “Deep learning for face anti-spoofing: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 5609–5631, 2023
work page 2023
-
[2]
Transfer learning using convolutional neural networks for face anti- spoofing,
O. Lucena, A. Junior, V . Moia, R. Souza, E. Valle, and R. Lotufo, “Transfer learning using convolutional neural networks for face anti- spoofing,” inImage Analysis and Recognition: 14th International Conference, ICIAR 2017, Montreal, QC, Canada, July 5–7, 2017, Proceedings 14. Springer, 2017, pp. 27–34
work page 2017
-
[3]
A performance evaluation of convolu- tional neural networks for face anti spoofing,
C. Nagpal and S. R. Dubey, “A performance evaluation of convolu- tional neural networks for face anti spoofing,” in2019 international joint conference on neural networks (IJCNN). IEEE, 2019, pp. 1–8
work page 2019
-
[4]
3d convolutional neural network based on face anti-spoofing,
J. Gan, S. Li, Y . Zhai, and C. Liu, “3d convolutional neural network based on face anti-spoofing,” in2017 2nd international conference on multimedia and image processing (ICMIP). IEEE, 2017, pp. 1–5
work page 2017
-
[5]
Learning deep models for face anti-spoofing: Binary or auxiliary supervision,
Y . Liu, A. Jourabloo, and X. Liu, “Learning deep models for face anti-spoofing: Binary or auxiliary supervision,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 389–398
work page 2018
-
[6]
Face anti-spoofing using patch and depth-based cnns,
Y . Atoum, Y . Liu, A. Jourabloo, and X. Liu, “Face anti-spoofing using patch and depth-based cnns,” in2017 IEEE international joint conference on biometrics (IJCB). IEEE, 2017, pp. 319–328
work page 2017
-
[7]
M. S. Jabbar, T. H. M. Siddique, K. Huang, and S. Khan, “Knowledge distillation with predicted depth for robust and lightweight face presentation attack detection,”Knowledge-Based Systems, p. 114325, 2025
work page 2025
-
[8]
A liveness detection method for face recognition based on optical flow field,
W. Bao, H. Li, N. Li, and W. Jiang, “A liveness detection method for face recognition based on optical flow field,” in2009 International Conference on Image Analysis and Signal Processing. IEEE, 2009, pp. 233–236
work page 2009
-
[9]
Non-intrusive liveness detection by face images,
K. Kollreider, H. Fronthaler, and J. Bigun, “Non-intrusive liveness detection by face images,”Image and Vision Computing, vol. 27, no. 3, pp. 233–244, 2009
work page 2009
-
[10]
A face anti-spoofing method based on optical flow field,
W. Yin, Y . Ming, and L. Tian, “A face anti-spoofing method based on optical flow field,” in2016 IEEE 13th international conference on signal processing (ICSP). IEEE, 2016, pp. 1333–1337
work page 2016
-
[11]
Integration of image quality and motion cues for face anti-spoofing: A neural network approach,
L. Feng, L.-M. Po, Y . Li, X. Xu, F. Yuan, T. C.-H. Cheung, and K.-W. Cheung, “Integration of image quality and motion cues for face anti-spoofing: A neural network approach,”Journal of Visual Communication and Image Representation, vol. 38, pp. 451–460, 2016
work page 2016
-
[12]
Joint discriminative learning of deep dynamic textures for 3d mask face anti-spoofing,
R. Shao, X. Lan, and P. C. Yuen, “Joint discriminative learning of deep dynamic textures for 3d mask face anti-spoofing,”IEEE Transactions on Information Forensics and Security, vol. 14, no. 4, pp. 923–938, 2018
work page 2018
-
[13]
Integrating fine-grained classification and motion relation analysis for face anti-spoofing,
Z. Cheng and X. Zhang, “Integrating fine-grained classification and motion relation analysis for face anti-spoofing,”IEEE Access, 2025
work page 2025
-
[14]
Face presentation attack detection based on optical flow and texture analysis,
L. Li, Z. Xia, J. Wu, L. Yang, and H. Han, “Face presentation attack detection based on optical flow and texture analysis,”Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 4, pp. 1455–1467, 2022
work page 2022
-
[15]
Unifying flow, stereo and depth estimation,
H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao, and A. Geiger, “Unifying flow, stereo and depth estimation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 13 941– 13 958, 2023
work page 2023
-
[16]
Spatio-temporal deep learning for improved face presen- tation attack detection,
S. Khan, T. H. M. Siddique, M. S. Ibrahim, A. J. Siddiqui, and K. Huang, “Spatio-temporal deep learning for improved face presen- tation attack detection,”Knowledge-Based Systems, p. 113059, 2025
work page 2025
-
[17]
Advspoofguard: Optimal transport driven robust face presentation attack detection system,
T. H. M. Siddique, S. Khan, Z. Wang, and K. Huang, “Advspoofguard: Optimal transport driven robust face presentation attack detection system,”Knowledge-Based Systems, p. 113759, 2025
work page 2025
-
[18]
Improving face presentation attack detection through deformable convolution and transfer learning,
S. M. Ibrahim, M. S. Ibrahim, S. Khan, Y .-W. Ko, and J.-G. Lee, “Improving face presentation attack detection through deformable convolution and transfer learning,”IEEE Access, 2025
work page 2025
-
[19]
On the effectiveness of local binary patterns in face anti-spoofing,
I. Chingovska, A. Anjos, and S. Marcel, “On the effectiveness of local binary patterns in face anti-spoofing,” in2012 BIOSIG-proceedings of the international conference of biometrics special interest group (BIOSIG). IEEE, 2012, pp. 1–7
work page 2012
-
[20]
The replay-mobile face presentation-attack database,
A. Costa-Pazo, S. Bhattacharjee, E. Vazquez-Fernandez, and S. Mar- cel, “The replay-mobile face presentation-attack database,” in2016 international conference of the Biometrics Special Interest Group (BIOSIG). IEEE, 2016, pp. 1–7
work page 2016
-
[21]
Unsupervised domain adaptation for face anti-spoofing,
H. Li, W. Li, H. Cao, S. Wang, F. Huang, and A. C. Kot, “Unsupervised domain adaptation for face anti-spoofing,”IEEE Transactions on Information Forensics and Security, vol. 13, pp. 1794–1809, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:4624345
work page 2018
-
[22]
Oulu- npu: A mobile face presentation attack database with real-world variations,
Z. Boulkenafet, J. Komulainen, L. Li, X. Feng, and A. Hadid, “Oulu- npu: A mobile face presentation attack database with real-world variations,” in2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, 2017, pp. 612–618
work page 2017
-
[23]
Multi-domain learning for updating face anti-spoofing models,
X. Guo, Y . Liu, A. Jain, and X. Liu, “Multi-domain learning for updating face anti-spoofing models,” inEuropean conference on computer vision. Springer, 2022, pp. 230–249
work page 2022
-
[24]
Confidence aware learning for reli- able face anti-spoofing,
X. Long, J. Zhang, and S. Shan, “Confidence aware learning for reli- able face anti-spoofing,”IEEE Transactions on Information Forensics and Security, 2025
work page 2025
-
[25]
Fully supervised contrastive learning in latent space for face presentation attack detection,
M. O. Alassafi, M. S. Ibrahim, I. Naseem, R. AlGhamdi, R. Alotaibi, F. A. Kateb, H. M. Oqaibi, A. A. Alshdadi, and S. A. Yusuf, “Fully supervised contrastive learning in latent space for face presentation attack detection,”Applied Intelligence, vol. 53, no. 19, pp. 21 770– 21 787, 2023
work page 2023
-
[26]
Securing phygital gameplay: Strategies for video-replay spoofing detection,
V . D. Husz ´ar and V . K. Adhikarla, “Securing phygital gameplay: Strategies for video-replay spoofing detection,”IEEE Access, 2024
work page 2024
-
[27]
Face spoofing detection based on chromatic ed-lbp texture feature,
X. Shu, H. Tang, and S. Huang, “Face spoofing detection based on chromatic ed-lbp texture feature,”Multimedia Systems, vol. 27, no. 2, pp. 161–176, 2021
work page 2021
-
[28]
Texture and quality analysis for face spoofing detection,
N. Daniel and A. Anitha, “Texture and quality analysis for face spoofing detection,”Computers & Electrical Engineering, vol. 94, p. 107293, 2021
work page 2021
-
[29]
Face-fake-net: The deep learning method for image face anti-spoofing detection: Paper id 45,
M. Alshaikhli, O. Elharrouss, S. Al-Maadeed, and A. Bouridane, “Face-fake-net: The deep learning method for image face anti-spoofing detection: Paper id 45,” in2021 9th European Workshop on Visual Information Processing (EUVIP). IEEE, 2021, pp. 1–6
work page 2021
-
[30]
Face presentation attack detection via ensemble learning algorithm,
K. W. Lee, J. Y . Lim, K. M. Lim, and C. P. Lee, “Face presentation attack detection via ensemble learning algorithm,” in2023 IEEE 11th Conference on Systems, Process & Control (ICSPC). IEEE, 2023, pp. 101–106
work page 2023
-
[31]
Face anti-spoofing based on 3d learnable convolutional operators,
Z. Ning, W. Zhang, and J. Yang, “Face anti-spoofing based on 3d learnable convolutional operators,” in2024 36th Chinese Control and Decision Conference (CCDC). IEEE, 2024, pp. 4034–4040
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.