Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

Baojin Huang; Chao Liang; Dengpan Ye; Jikang Cheng; Yuhong Yang; Zhanhe Lei; Zhen Han; Zhongyuan Wang

arxiv: 2603.24139 · v2 · pith:ERMQVHD3new · submitted 2026-03-25 · 💻 cs.CV · cs.LG

Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

Zhanhe Lei , Zhongyuan Wang , Jikang Cheng , Baojin Huang , Yuhong Yang , Zhen Han , Chao Liang , Dengpan Ye This is my paper

Pith reviewed 2026-05-21 09:46 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords deepfake detectionreinforcement learningcurriculum learningtutor-student frameworkgeneralizationadaptive weightingPPO agent

0 comments

The pith

A reinforcement learning tutor dynamically weights training samples to improve deepfake detector generalization to unseen manipulation techniques.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard supervised training for deepfake detectors assigns equal importance to every sample, which can leave the model weak against new forgery methods. This paper models the training process itself as a Markov Decision Process in which a Tutor agent learns a policy for re-weighting each sample's contribution to the loss. The Tutor, built as a PPO agent, sees not only the current image but also the student's recent history such as exponential moving average loss and how often the sample has been forgotten. It receives reward only when the student's prediction on that sample flips from wrong to right, encouraging the Tutor to surface hard-yet-learnable examples at the right moment. The resulting adaptive curriculum produces a detector whose performance on manipulation techniques absent from training exceeds that of models trained with uniform sample weighting.

Core claim

The central claim is that a PPO-based Tutor observing a state that combines visual features with historical learning signals (EMA loss and forgetting counts) and assigning continuous loss weights between 0 and 1, when rewarded strictly for immediate incorrect-to-correct transitions in the Student, learns a curriculum policy that yields measurably higher generalization of the deepfake detector on manipulation techniques never encountered during training.

What carries the argument

The Tutor agent, implemented as a Proximal Policy Optimization (PPO) policy that maps each training sample's state (visual features plus EMA loss and forgetting counts) to a continuous loss weight in [0,1] and is rewarded only for immediate Student performance gains.

If this is right

The Student detector exhibits higher accuracy on manipulation techniques absent from the training distribution.
Training focuses computational effort on hard-but-learnable samples instead of treating every example equally.
The Tutor learns to de-emphasize samples that produce no immediate performance change.
The overall process yields more generalizable features without requiring additional data or model capacity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tutor-student loop could be applied to other detection or classification tasks where sample difficulty varies across domains.
Over longer training runs the method might reduce the total number of epochs needed to reach a target robustness level.
Combining the dynamic weighting with existing data-augmentation pipelines could produce further gains on cross-dataset benchmarks.

Load-bearing premise

Rewarding the tutor solely for immediate incorrect-to-correct transitions on individual samples produces a stable curriculum policy rather than short-term overfitting or unstable reinforcement learning dynamics.

What would settle it

Train two identical deepfake detectors on the same data, one with the proposed Tutor weighting and one with uniform loss weights, then evaluate both on a test set containing only manipulation techniques completely absent from training; if the Tutor-trained detector shows no accuracy or AUC improvement, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2603.24139 by Baojin Huang, Chao Liang, Dengpan Ye, Jikang Cheng, Yuhong Yang, Zhanhe Lei, Zhen Han, Zhongyuan Wang.

**Figure 2.** Figure 2: A simplified overview of our proposed Tutor-Student Reinforcement Learning (TSRL) framework. The Tutor (RL Agent) learns a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The Tutor-Student Reinforcement Learning (TSRL) Framework [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Visual comparison of the average AUC (on DF40) for the [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: UMAP visualization of feature spaces. We present two comparative visualizations. (a) Fake vs Real: Visualization by class (Green: Real, Red: Fake). The Baseline model (left) exhibits a single manifold with heavy class overlap, indicating a confused feature space. In contrast, our TSRL framework (right) learns a perfectly disentangled representation, cleanly separating all Real samples (green arc) from all … view at source ↗

read the original abstract

Standard supervised training for deepfake detection treats all samples with uniform importance, which can be suboptimal for learning robust and generalizable features. In this work, we propose a novel Tutor-Student Reinforcement Learning (TSRL) framework to dynamically optimize the training curriculum. Our method models the training process as a Markov Decision Process where a ``Tutor'' agent learns to guide a ``Student'' (the deepfake detector). The Tutor, implemented as a Proximal Policy Optimization (PPO) agent, observes a rich state representation for each training sample, encapsulating not only its visual features but also its historical learning dynamics, such as EMA loss and forgetting counts. Based on this state, the Tutor takes an action by assigning a continuous weight (0-1) to the sample's loss, thereby dynamically re-weighting the training batch. The Tutor is rewarded based on the Student's immediate performance change, specifically rewarding transitions from incorrect to correct predictions. This strategy encourages the Tutor to learn a curriculum that prioritizes high-value samples, such as hard-but-learnable examples, leading to a more efficient and effective training process. We demonstrate that this adaptive curriculum improves the Student's generalization capabilities against unseen manipulation techniques compared to traditional training methods. Code is available at https://github.com/wannac1/TSRL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TSRL adds a PPO tutor that reweights deepfake samples using visual features plus EMA loss and forgetting counts, with reward only on immediate wrong-to-right flips, but the short-horizon signal leaves the generalization claim under-supported.

read the letter

The core of this paper is a tutor-student loop where a PPO agent picks continuous weights for each training sample. The state combines the image features with two history signals: an exponential moving average of the loss and a count of how often the sample was forgotten. The reward is strictly whether the student model flips from incorrect to correct on that sample after the weighted step. That combination of state and reward is not a routine extension of prior curriculum or RL work in detection tasks, and the released code makes the setup reproducible on its face.

Referee Report

2 major / 1 minor

Summary. The paper proposes a Tutor-Student Reinforcement Learning (TSRL) framework for deepfake detection training. A PPO-based Tutor agent observes per-sample states consisting of visual features, EMA loss, and forgetting counts, then assigns continuous weights (0-1) to re-weight the student's loss. The tutor receives reward only for immediate student prediction flips from incorrect to correct. The central claim is that this produces an adaptive curriculum yielding better generalization to unseen manipulation techniques than standard uniform supervised training.

Significance. If the empirical claims hold, the work would contribute a concrete RL-driven curriculum mechanism that incorporates learning-history features into sample weighting for deepfake detectors. The open code link is a positive factor for reproducibility. However, the complete absence of any reported results, baselines, datasets, or ablations makes it impossible to gauge actual significance or whether the approach outperforms existing curriculum or hard-example mining methods.

major comments (2)

[Abstract] Abstract: the central claim that the adaptive curriculum 'improves the Student's generalization capabilities against unseen manipulation techniques compared to traditional training methods' is asserted with no quantitative results, baselines, dataset details, or ablation studies supplied, leaving the primary contribution unevaluated.
[Method (Tutor reward and state)] Reward definition (as described in the abstract and method outline): the tutor reward is defined exclusively on immediate incorrect-to-correct prediction transitions after a single weighted update. This short-horizon signal, paired with a state vector that contains no manipulation-type or cross-domain statistics, creates a risk that the PPO policy will overfit to training-distribution boundary samples rather than learning a curriculum that builds invariance to unseen techniques; no ablation replacing the immediate reward with a delayed or validation-based signal is described.

minor comments (1)

[Abstract] Abstract: the description of the state representation and action space is clear, but a short statement of the evaluation protocol (e.g., which unseen manipulation families are held out) would help readers assess the generalization claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the TSRL framework. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the adaptive curriculum 'improves the Student's generalization capabilities against unseen manipulation techniques compared to traditional training methods' is asserted with no quantitative results, baselines, dataset details, or ablation studies supplied, leaving the primary contribution unevaluated.

Authors: The referee correctly notes that the submitted manuscript does not contain quantitative results, baselines, dataset details, or ablations to support the generalization claim. The abstract phrasing reflects preliminary internal experiments that were not reported in this version. In the revised manuscript we will add a complete Experiments section reporting results on standard deepfake benchmarks (e.g., FaceForensics++ and cross-manipulation splits), comparisons against uniform training and existing curriculum/hard-example methods, and ablations on state components and reward design. We will also revise the abstract to accurately describe the evaluated contributions rather than assert unevaluated claims. revision: yes
Referee: [Method (Tutor reward and state)] Reward definition (as described in the abstract and method outline): the tutor reward is defined exclusively on immediate incorrect-to-correct prediction transitions after a single weighted update. This short-horizon signal, paired with a state vector that contains no manipulation-type or cross-domain statistics, creates a risk that the PPO policy will overfit to training-distribution boundary samples rather than learning a curriculum that builds invariance to unseen techniques; no ablation replacing the immediate reward with a delayed or validation-based signal is described.

Authors: The immediate reward was selected to supply a dense, per-update signal that lets the PPO tutor rapidly adjust sample weights based on observable student progress. The state already incorporates historical dynamics through EMA loss and forgetting counts in addition to visual features. We acknowledge that the absence of explicit manipulation-type or cross-domain statistics in the state, together with the short reward horizon, could encourage overfitting to training-distribution patterns rather than learning invariance to unseen manipulations. We will add an ablation that replaces the immediate reward with a delayed signal based on validation-set accuracy and report the resulting generalization performance on held-out manipulation techniques. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical curriculum gains rest on external validation, not definitional reduction.

full rationale

The TSRL setup defines the tutor reward explicitly from the student's immediate prediction flip (incorrect-to-correct) after a weighted update, with state features (visual + EMA loss + forgetting counts) independent of the tutor's policy parameters. The central claim—that this produces better generalization on unseen manipulations—is presented as an experimental outcome rather than a mathematical identity or fitted-input prediction. No equations reduce the reported performance lift to the reward definition by construction, and no self-citation chain is invoked to justify uniqueness or the ansatz. The derivation therefore remains self-contained against the training distribution and held-out test results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that the training process forms a Markov Decision Process whose state can be adequately captured by visual features plus EMA loss and forgetting counts; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption The training process can be modeled as a Markov Decision Process where the tutor observes a state that includes historical learning dynamics.
Stated in the description of the tutor agent's observation and action space.

pith-pipeline@v0.9.0 · 5782 in / 1077 out tokens · 49671 ms · 2026-05-21T09:46:43.872812+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Mesonet: a compact facial video forgery detection network

Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. Mesonet: a compact facial video forgery detection network. In2018 IEEE international workshop on informa- tion forensics and security (WIFS), pages 1–7. IEEE, 2018. 2

work page 2018
[2]

Curriculum learning

Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th annual international conference on machine learning, pages 41–48, 2009. 2, 3

work page 2009
[3]

Google AI Blog. DFD. https://ai.googleblog. com / 2019 / 09 / contributing - data - to - deepfake - detection . html, 2020. Accessed: 2021-04-24. 5

work page 2019
[4]

End-to-end reconstruction- classification learning for face forgery detection

Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. End-to-end reconstruction- classification learning for face forgery detection. InCVPR, pages 4113–4122, 2022. 1, 2, 6

work page 2022
[5]

Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection

Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In CVPR, pages 18710–18719, 2022. 2, 6

work page 2022
[6]

Can we leave deepfake data behind in training deepfake detector?Advances in Neu- ral Information Processing Systems, 37:21979–21998, 2024

Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, and Chen Li. Can we leave deepfake data behind in training deepfake detector?Advances in Neu- ral Information Processing Systems, 37:21979–21998, 2024. 2, 5, 6

work page 2024
[7]

Stacking brick by brick: Aligned feature isolation for incremental face forgery detection

Jikang Cheng, Zhiyuan Yan, Ying Zhang, Li Hao, Jiaxin Ai, Qin Zou, Chen Li, and Zhongyuan Wang. Stacking brick by brick: Aligned feature isolation for incremental face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13927–13936, 2025

work page 2025
[8]

Ed4: Explicit data- level debiasing for deepfake detection.IEEE Transactions on Image Processing, 34:4618–4630, 2025

Jikang Cheng, Ying Zhang, Qin Zou, Zhiyuan Yan, Chao Liang, Zhongyuan Wang, and Chen Li. Ed4: Explicit data- level debiasing for deepfake detection.IEEE Transactions on Image Processing, 34:4618–4630, 2025. 1

work page 2025
[9]

The deepfake detection chal- lenge (DFDC) preview dataset

Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. The deepfake detection challenge (dfdc) preview dataset.arXiv preprint arXiv:1910.08854,

work page arXiv 1910
[10]

Generative adversarial nets.Advances in neural information processing systems, 27, 2014

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014. 2

work page 2014
[11]

Implicit identity driven deepfake face swapping detection

Baojin Huang, Zhongyuan Wang, Jifan Yang, Jiaxin Ai, Qin Zou, Qian Wang, and Dengpan Ye. Implicit identity driven deepfake face swapping detection. InCVPR, pages 4490– 4499, 2023. 1, 2, 5, 6

work page 2023
[12]

Deepfake detection challenge

Kaggle. Deepfake detection challenge. https : / / www . kaggle . com / c / deepfake - detection - challenge, 2020. Accessed: 2021-04-24. 5

work page 2020
[13]

Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection

Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6458–6467,

work page
[14]

Face x-ray for more general face forgery detection

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. Face x-ray for more general face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5001–5010, 2020. 2

work page 2020
[15]

Raidx: A retrieval-augmented generation and grpo reinforcement learning framework for explainable deepfake detection

Tianxiao Li, Zhenglin Huang, Haiquan Wen, Yiwei He, Shuchang Lyu, Baoyuan Wu, and Guangliang Cheng. Raidx: A retrieval-augmented generation and grpo reinforcement learning framework for explainable deepfake detection. In Proceedings of the 33rd ACM International Conference on Multimedia, pages 11746–11755, 2025. 3

work page 2025
[16]

Celeb-df: A large-scale challenging dataset for deepfake forensics

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deepfake forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3207–3216,

work page
[17]

Fake it till you make it: Curricular dynamic forgery augmentations towards general deepfake detection

Yuzhen Lin, Wentang Song, Bin Li, Yuezun Li, Jiangqun Ni, Han Chen, and Qiushi Li. Fake it till you make it: Curricular dynamic forgery augmentations towards general deepfake detection. InEuropean conference on computer vision, pages 104–122. Springer, 2024. 1, 2, 3, 5, 6

work page 2024
[18]

Spatial- phase shallow learning: rethinking face forgery detection in frequency domain

Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. Spatial- phase shallow learning: rethinking face forgery detection in frequency domain. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 772–781, 2021. 2, 6

work page 2021
[19]

General- izing face forgery detection with high-frequency features

Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. General- izing face forgery detection with high-frequency features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16317–16326, 2021. 1, 2, 6

work page 2021
[20]

Momina Masood, Mariam Nawaz, Khalid Mahmood Malik, Ali Javed, Aun Irtaza, and Hafiz Malik. Deepfakes generation and detection: state-of-the-art, open challenges, countermea- sures, and way forward: Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward.Applied intelligence, 53(4):3974–4026, 2023. 1, 2

work page 2023
[21]

On improving cross-dataset generalization of deepfake detectors

Aakash Varma Nadimpalli and Ajita Rattani. On improving cross-dataset generalization of deepfake detectors. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 91–99, 2022. 3

work page 2022
[22]

Core: Consistent repre- sentation learning for face forgery detection

Yunsheng Ni, Depu Meng, Changqian Yu, Chengbin Quan, Dongchun Ren, and Youjian Zhao. Core: Consistent repre- sentation learning for face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 12–21, 2022. 1, 2, 5, 6, 7

work page 2022
[23]

Thinking in frequency: Face forgery detection by mining frequency-aware clues

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InEuropean conference on computer vision, pages 86–103. Springer, 2020. 6

work page 2020
[24]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763, 2021. 1, 2, 5, 6

work page 2021
[25]

Faceforen- sics++: Learning to detect manipulated facial images

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Chris- tian Riess, Justus Thies, and Matthias Nießner. Faceforen- sics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1–11, 2019. 5, 6

work page 2019
[26]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 2, 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Detecting deep- fakes with self-blended images

Kaede Shiohara and Toshihiko Yamasaki. Detecting deep- fakes with self-blended images. InCVPR, pages 18720– 18729, 2022. 1, 2, 6

work page 2022
[28]

Towards generic deepfake detection with dynamic curriculum

Wentang Song, Yuzhen Lin, and Bin Li. Towards generic deepfake detection with dynamic curriculum. InICASSP, pages 4500–4504. IEEE, 2024. 3

work page 2024
[29]

A quality-centric framework for generic deepfake detection.arXiv preprint arXiv:2411.05335, 2024

Wentang Song, Zhiyuan Yan, Yuzhen Lin, Taiping Yao, Changsheng Chen, Shen Chen, Yandan Zhao, Shouhong Ding, and Bin Li. A quality-centric framework for generic deepfake detection.arXiv preprint arXiv:2411.05335, 2024. 1, 2

work page arXiv 2024
[30]

Dfbench: Benchmarking deepfake image detection capability of large multimodal models

Jiarui Wang, Huiyu Duan, Juntong Wang, Ziheng Jia, Woo Yi Yang, Xiaorong Zhu, Yu Zhao, Jiaying Qian, Yuke Xing, Guangtao Zhai, et al. Dfbench: Benchmarking deepfake image detection capability of large multimodal models. In Proceedings of the 33rd ACM International Conference on Multimedia, pages 12666–12673, 2025. 3

work page 2025
[31]

Gan-generated faces detection: A survey and new perspectives.ECAI 2023, pages 2533–2542, 2023

Xin Wang, Hui Guo, Shu Hu, Ming-Ching Chang, and Si- wei Lyu. Gan-generated faces detection: A survey and new perspectives.ECAI 2023, pages 2533–2542, 2023. 1, 2

work page 2023
[32]

Are high-quality ai-generated images more difficult for models to detect? InForty-second International Conference on Machine Learning, 2025

Yao Xiao, Binbin Yang, Weiyan Chen, Jiahao Chen, Zijie Cao, Ziyi Dong, Xiangyang Ji, Liang Lin, Wei Ke, and Pengxu Wei. Are high-quality ai-generated images more difficult for models to detect? InForty-second International Conference on Machine Learning, 2025. 1, 2

work page 2025
[33]

Tall: Thumbnail layout for deepfake video detection

Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, and Ran He. Tall: Thumbnail layout for deepfake video detection. InICCV, pages 22658–22668, 2023. 1, 2, 6

work page 2023
[34]

Ucf: Uncovering common features for generalizable deepfake detection

Zhiyuan Yan, Yong Zhang, Yanbo Fan, and Baoyuan Wu. Ucf: Uncovering common features for generalizable deepfake detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 22412–22423, 2023. 1, 2, 5, 6

work page 2023
[35]

Deepfakebench: A comprehensive benchmark of deepfake detection

Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, and Baoyuan Wu. Deepfakebench: A comprehensive benchmark of deepfake detection. InAdvances in Neural Information Processing Systems, pages 4534–4565, 2023. 5, 6

work page 2023
[36]

Transcending forgery specificity with latent space augmentation for generalizable deepfake detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, and Baoyuan Wu. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection. In CVPR, pages 8984–8994, 2024. 1, 2, 6

work page 2024
[37]

Df40: Toward next-generation deepfake detection.Advances in Neural Information Process- ing Systems, 37:29387–29434, 2024

Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, et al. Df40: Toward next-generation deepfake detection.Advances in Neural Information Process- ing Systems, 37:29387–29434, 2024. 5, 6, 7

work page 2024
[38]

Orthogonal subspace decom- position for generalizable ai-generated image detection

Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decom- position for generalizable ai-generated image detection. In International Conference on Machine Learning, pages 70268– 70288. PMLR, 2025. 5, 6

work page 2025
[39]

Learning self-consistency for deepfake detection

Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. Learning self-consistency for deepfake detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 15023–15033, 2021. 2

work page 2021
[40]

Face forgery detection by 3d decomposition

Xiangyu Zhu, Hao Wang, Hongyan Fei, Zhen Lei, and Stan Z Li. Face forgery detection by 3d decomposition. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2929–2939, 2021. 2

work page 2021

[1] [1]

Mesonet: a compact facial video forgery detection network

Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. Mesonet: a compact facial video forgery detection network. In2018 IEEE international workshop on informa- tion forensics and security (WIFS), pages 1–7. IEEE, 2018. 2

work page 2018

[2] [2]

Curriculum learning

Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th annual international conference on machine learning, pages 41–48, 2009. 2, 3

work page 2009

[3] [3]

Google AI Blog. DFD. https://ai.googleblog. com / 2019 / 09 / contributing - data - to - deepfake - detection . html, 2020. Accessed: 2021-04-24. 5

work page 2019

[4] [4]

End-to-end reconstruction- classification learning for face forgery detection

Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. End-to-end reconstruction- classification learning for face forgery detection. InCVPR, pages 4113–4122, 2022. 1, 2, 6

work page 2022

[5] [5]

Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection

Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang. Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection. In CVPR, pages 18710–18719, 2022. 2, 6

work page 2022

[6] [6]

Can we leave deepfake data behind in training deepfake detector?Advances in Neu- ral Information Processing Systems, 37:21979–21998, 2024

Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, and Chen Li. Can we leave deepfake data behind in training deepfake detector?Advances in Neu- ral Information Processing Systems, 37:21979–21998, 2024. 2, 5, 6

work page 2024

[7] [7]

Stacking brick by brick: Aligned feature isolation for incremental face forgery detection

Jikang Cheng, Zhiyuan Yan, Ying Zhang, Li Hao, Jiaxin Ai, Qin Zou, Chen Li, and Zhongyuan Wang. Stacking brick by brick: Aligned feature isolation for incremental face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13927–13936, 2025

work page 2025

[8] [8]

Ed4: Explicit data- level debiasing for deepfake detection.IEEE Transactions on Image Processing, 34:4618–4630, 2025

Jikang Cheng, Ying Zhang, Qin Zou, Zhiyuan Yan, Chao Liang, Zhongyuan Wang, and Chen Li. Ed4: Explicit data- level debiasing for deepfake detection.IEEE Transactions on Image Processing, 34:4618–4630, 2025. 1

work page 2025

[9] [9]

The deepfake detection chal- lenge (DFDC) preview dataset

Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. The deepfake detection challenge (dfdc) preview dataset.arXiv preprint arXiv:1910.08854,

work page arXiv 1910

[10] [10]

Generative adversarial nets.Advances in neural information processing systems, 27, 2014

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014. 2

work page 2014

[11] [11]

Implicit identity driven deepfake face swapping detection

Baojin Huang, Zhongyuan Wang, Jifan Yang, Jiaxin Ai, Qin Zou, Qian Wang, and Dengpan Ye. Implicit identity driven deepfake face swapping detection. InCVPR, pages 4490– 4499, 2023. 1, 2, 5, 6

work page 2023

[12] [12]

Deepfake detection challenge

Kaggle. Deepfake detection challenge. https : / / www . kaggle . com / c / deepfake - detection - challenge, 2020. Accessed: 2021-04-24. 5

work page 2020

[13] [13]

Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection

Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6458–6467,

work page

[14] [14]

Face x-ray for more general face forgery detection

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. Face x-ray for more general face forgery detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5001–5010, 2020. 2

work page 2020

[15] [15]

Raidx: A retrieval-augmented generation and grpo reinforcement learning framework for explainable deepfake detection

Tianxiao Li, Zhenglin Huang, Haiquan Wen, Yiwei He, Shuchang Lyu, Baoyuan Wu, and Guangliang Cheng. Raidx: A retrieval-augmented generation and grpo reinforcement learning framework for explainable deepfake detection. In Proceedings of the 33rd ACM International Conference on Multimedia, pages 11746–11755, 2025. 3

work page 2025

[16] [16]

Celeb-df: A large-scale challenging dataset for deepfake forensics

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deepfake forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3207–3216,

work page

[17] [17]

Fake it till you make it: Curricular dynamic forgery augmentations towards general deepfake detection

Yuzhen Lin, Wentang Song, Bin Li, Yuezun Li, Jiangqun Ni, Han Chen, and Qiushi Li. Fake it till you make it: Curricular dynamic forgery augmentations towards general deepfake detection. InEuropean conference on computer vision, pages 104–122. Springer, 2024. 1, 2, 3, 5, 6

work page 2024

[18] [18]

Spatial- phase shallow learning: rethinking face forgery detection in frequency domain

Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. Spatial- phase shallow learning: rethinking face forgery detection in frequency domain. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 772–781, 2021. 2, 6

work page 2021

[19] [19]

General- izing face forgery detection with high-frequency features

Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. General- izing face forgery detection with high-frequency features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16317–16326, 2021. 1, 2, 6

work page 2021

[20] [20]

Momina Masood, Mariam Nawaz, Khalid Mahmood Malik, Ali Javed, Aun Irtaza, and Hafiz Malik. Deepfakes generation and detection: state-of-the-art, open challenges, countermea- sures, and way forward: Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward.Applied intelligence, 53(4):3974–4026, 2023. 1, 2

work page 2023

[21] [21]

On improving cross-dataset generalization of deepfake detectors

Aakash Varma Nadimpalli and Ajita Rattani. On improving cross-dataset generalization of deepfake detectors. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 91–99, 2022. 3

work page 2022

[22] [22]

Core: Consistent repre- sentation learning for face forgery detection

Yunsheng Ni, Depu Meng, Changqian Yu, Chengbin Quan, Dongchun Ren, and Youjian Zhao. Core: Consistent repre- sentation learning for face forgery detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 12–21, 2022. 1, 2, 5, 6, 7

work page 2022

[23] [23]

Thinking in frequency: Face forgery detection by mining frequency-aware clues

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InEuropean conference on computer vision, pages 86–103. Springer, 2020. 6

work page 2020

[24] [24]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763, 2021. 1, 2, 5, 6

work page 2021

[25] [25]

Faceforen- sics++: Learning to detect manipulated facial images

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Chris- tian Riess, Justus Thies, and Matthias Nießner. Faceforen- sics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1–11, 2019. 5, 6

work page 2019

[26] [26]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 2, 3, 5

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Detecting deep- fakes with self-blended images

Kaede Shiohara and Toshihiko Yamasaki. Detecting deep- fakes with self-blended images. InCVPR, pages 18720– 18729, 2022. 1, 2, 6

work page 2022

[28] [28]

Towards generic deepfake detection with dynamic curriculum

Wentang Song, Yuzhen Lin, and Bin Li. Towards generic deepfake detection with dynamic curriculum. InICASSP, pages 4500–4504. IEEE, 2024. 3

work page 2024

[29] [29]

A quality-centric framework for generic deepfake detection.arXiv preprint arXiv:2411.05335, 2024

Wentang Song, Zhiyuan Yan, Yuzhen Lin, Taiping Yao, Changsheng Chen, Shen Chen, Yandan Zhao, Shouhong Ding, and Bin Li. A quality-centric framework for generic deepfake detection.arXiv preprint arXiv:2411.05335, 2024. 1, 2

work page arXiv 2024

[30] [30]

Dfbench: Benchmarking deepfake image detection capability of large multimodal models

Jiarui Wang, Huiyu Duan, Juntong Wang, Ziheng Jia, Woo Yi Yang, Xiaorong Zhu, Yu Zhao, Jiaying Qian, Yuke Xing, Guangtao Zhai, et al. Dfbench: Benchmarking deepfake image detection capability of large multimodal models. In Proceedings of the 33rd ACM International Conference on Multimedia, pages 12666–12673, 2025. 3

work page 2025

[31] [31]

Gan-generated faces detection: A survey and new perspectives.ECAI 2023, pages 2533–2542, 2023

Xin Wang, Hui Guo, Shu Hu, Ming-Ching Chang, and Si- wei Lyu. Gan-generated faces detection: A survey and new perspectives.ECAI 2023, pages 2533–2542, 2023. 1, 2

work page 2023

[32] [32]

Are high-quality ai-generated images more difficult for models to detect? InForty-second International Conference on Machine Learning, 2025

Yao Xiao, Binbin Yang, Weiyan Chen, Jiahao Chen, Zijie Cao, Ziyi Dong, Xiangyang Ji, Liang Lin, Wei Ke, and Pengxu Wei. Are high-quality ai-generated images more difficult for models to detect? InForty-second International Conference on Machine Learning, 2025. 1, 2

work page 2025

[33] [33]

Tall: Thumbnail layout for deepfake video detection

Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, and Ran He. Tall: Thumbnail layout for deepfake video detection. InICCV, pages 22658–22668, 2023. 1, 2, 6

work page 2023

[34] [34]

Ucf: Uncovering common features for generalizable deepfake detection

Zhiyuan Yan, Yong Zhang, Yanbo Fan, and Baoyuan Wu. Ucf: Uncovering common features for generalizable deepfake detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 22412–22423, 2023. 1, 2, 5, 6

work page 2023

[35] [35]

Deepfakebench: A comprehensive benchmark of deepfake detection

Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, and Baoyuan Wu. Deepfakebench: A comprehensive benchmark of deepfake detection. InAdvances in Neural Information Processing Systems, pages 4534–4565, 2023. 5, 6

work page 2023

[36] [36]

Transcending forgery specificity with latent space augmentation for generalizable deepfake detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, and Baoyuan Wu. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection. In CVPR, pages 8984–8994, 2024. 1, 2, 6

work page 2024

[37] [37]

Df40: Toward next-generation deepfake detection.Advances in Neural Information Process- ing Systems, 37:29387–29434, 2024

Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, et al. Df40: Toward next-generation deepfake detection.Advances in Neural Information Process- ing Systems, 37:29387–29434, 2024. 5, 6, 7

work page 2024

[38] [38]

Orthogonal subspace decom- position for generalizable ai-generated image detection

Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decom- position for generalizable ai-generated image detection. In International Conference on Machine Learning, pages 70268– 70288. PMLR, 2025. 5, 6

work page 2025

[39] [39]

Learning self-consistency for deepfake detection

Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. Learning self-consistency for deepfake detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 15023–15033, 2021. 2

work page 2021

[40] [40]

Face forgery detection by 3d decomposition

Xiangyu Zhu, Hao Wang, Hongyan Fei, Zhen Lei, and Stan Z Li. Face forgery detection by 3d decomposition. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2929–2939, 2021. 2

work page 2021