Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection

Alejandro Cobo; Jos\'e Miguel Buenaposada; Luis Baumela; Roberto Valle

arxiv: 2512.04175 · v2 · submitted 2025-12-03 · 💻 cs.CV

Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection

Alejandro Cobo , Roberto Valle , Jos\'e Miguel Buenaposada , Luis Baumela This is my paper

Pith reviewed 2026-05-17 01:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords deepfake detectionkinematic inconsistenciesvideo manipulationgeneralizationfacial landmarksmotion basessynthetic artifacts

0 comments

The pith

Manipulating motion bases in facial landmarks creates training data that teaches detectors to spot kinematic flaws in unseen deepfakes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the generalization problem in deepfake video detection by shifting focus from frame-to-frame flicker to violations of natural motion dependencies across facial regions. It introduces a method to generate synthetic training videos from pristine footage: an autoencoder breaks landmark trajectories into motion bases, selected bases are altered to disrupt realistic movement correlations, and the resulting inconsistencies are inserted back into the video through face morphing. A detector trained on these examples learns to recognize the biomechanical artifacts. If the approach holds, it would let models trained only on manipulated real videos perform well against entirely new deepfake generation techniques.

Core claim

We propose a synthetic video generation method that creates training data with subtle kinematic inconsistencies. We train an autoencoder to decompose facial landmark configurations into motion bases. By manipulating these bases, we selectively break the natural correlations in facial movements and introduce these artifacts into pristine videos via face morphing. A network trained on our data learns to spot these sophisticated biomechanical flaws, achieving state-of-the-art generalization results on several popular benchmarks.

What carries the argument

Autoencoder decomposition of facial landmark configurations into motion bases that are selectively manipulated to violate natural movement correlations.

If this is right

Detectors can be trained without any real deepfake examples yet still generalize across manipulation methods.
The key signal shifts from low-level temporal flicker to higher-order violations of facial motion dependencies.
Any collection of pristine videos can be turned into useful training data by applying the motion-base manipulation process.
State-of-the-art cross-dataset results are reported on multiple standard deepfake benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real deepfake pipelines may systematically fail to reproduce the statistical dependencies that govern natural facial motion.
The same decomposition-and-disruption technique could be tested on full-body or scene-level motion for broader video forgery detection.
Combining kinematic inconsistency detection with existing artifact-based methods might yield still stronger generalization.

Load-bearing premise

The kinematic inconsistencies created by manipulating motion bases in pristine videos accurately simulate the motion artifacts present in real deepfake videos produced by diverse unseen methods.

What would settle it

Train the detector on the synthetic data then test it on a set of deepfake videos engineered to preserve natural motion-base correlations while still using novel manipulation pipelines; a sharp drop in detection accuracy would falsify the claim.

Figures

Figures reproduced from arXiv: 2512.04175 by Alejandro Cobo, Jos\'e Miguel Buenaposada, Luis Baumela, Roberto Valle.

**Figure 2.** Figure 2: Overview of our method. We leverage a pretrained Land [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the Landmark Perturbation Network. The encoder [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of subtle temporal artifacts introduced by our method (bottom row) compared to the facial movement of the [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Correlation matrices of the temporal artifacts caused by different landmarks extracted from deepfake videos (a) and temporal [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Generalizing deepfake detection to unseen manipulations remains a key challenge. A recent approach to tackle this issue is to train a network with pristine face images that have been manipulated with hand-crafted artifacts to extract more generalizable clues. While effective for static images, extending this to the video domain is an open issue. Existing methods model temporal artifacts as frame-to-frame instabilities, overlooking a key vulnerability: the violation of natural motion dependencies between different facial regions. In this paper, we propose a synthetic video generation method that creates training data with subtle kinematic inconsistencies. We train an autoencoder to decompose facial landmark configurations into motion bases. By manipulating these bases, we selectively break the natural correlations in facial movements and introduce these artifacts into pristine videos via face morphing. A network trained on our data learns to spot these sophisticated biomechanical flaws, achieving state-of-the-art generalization results on several popular benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new angle is synthesizing deepfake training videos by breaking natural facial motion correlations via autoencoder motion bases, but the evidence that these match real unseen artifacts is still thin.

read the letter

The paper's core move is to generate synthetic training videos that contain kinematic inconsistencies. They train an autoencoder on facial landmark sequences to extract motion bases, then selectively corrupt those bases to break natural correlations between facial regions, and finally insert the changes into clean videos with face morphing. A detector trained on the resulting data is meant to learn to spot these flaws and generalize to real deepfakes from unseen methods. This is a clear step beyond simple frame-to-frame jitter or static hand-crafted artifacts. The framing around motion dependencies between different facial parts is a reasonable observation about what current video methods tend to miss. If the full experiments hold up with proper controls, the synthesis pipeline itself could be a useful addition to the toolkit for creating more targeted training data. The main soft spot is the leap from synthetic inconsistencies to real generalization. The abstract claims state-of-the-art results on benchmarks, but without seeing the numbers, ablations on manipulation strength or number of bases, and any direct comparison of the introduced artifacts against those produced by actual deepfake generators, it is hard to judge whether the detector is learning the intended biomechanical signals or something tied to the morphing process. The assumption that 2D landmark base manipulation will reliably simulate the temporal or 3D inconsistencies from diverse end-to-end synthesis methods is plausible but unproven in the summary. This work is aimed at people already working on robust video deepfake detection who are looking for better ways to create synthetic training sets. A reader focused on generalization techniques would get something concrete to try or critique. It is worth sending to peer review so the experiments can be examined in detail and the artifact-matching question can be tested properly.

Referee Report

2 major / 2 minor

Summary. The paper proposes a synthetic video generation pipeline to improve generalization in deepfake detection. An autoencoder decomposes facial landmark configurations from pristine videos into motion bases; these bases are manipulated to break natural kinematic correlations between facial regions, and the resulting inconsistencies are inserted into real videos via face morphing. A detector trained on the resulting data is claimed to learn to detect sophisticated biomechanical flaws and achieves state-of-the-art generalization on several popular benchmarks.

Significance. If the synthetic kinematic inconsistencies are shown to be representative of motion artifacts arising in real unseen deepfake generators, the approach would offer a scalable, manipulation-agnostic route to training generalizable detectors that focuses on violation of natural motion dependencies rather than frame-level flicker. The method supplies a concrete, falsifiable mechanism for creating training signals and could be reproduced given the described autoencoder and morphing steps.

major comments (2)

[§3.2] §3.2 (Synthetic Data Generation): The claim that selectively breaking correlations via manipulation of autoencoder-derived motion bases produces flaws representative of those in real deepfakes from diverse unseen methods is load-bearing for the generalization result, yet the manuscript provides no distributional comparison (e.g., motion correlation statistics or feature-space distance) between the synthetic artifacts and the actual inconsistencies present in the benchmark deepfakes.
[Table 2] Table 2 (Generalization benchmarks): The reported SOTA cross-dataset numbers rest on the assumption that the introduced kinematic flaws are the operative cues; without an ablation that removes or varies the motion-base manipulation while keeping other factors fixed, it is unclear whether the performance gain is attributable to the kinematic focus or to other aspects of the synthetic pipeline.

minor comments (2)

[Abstract] The abstract states 'state-of-the-art generalization results' without quoting the precise metrics or the size of the improvement over the strongest baseline.
[§3] Notation for the number of motion bases and the manipulation intensity parameter should be introduced with explicit symbols and ranges in the method section to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below and will revise the manuscript to incorporate the suggested analyses.

read point-by-point responses

Referee: [§3.2] §3.2 (Synthetic Data Generation): The claim that selectively breaking correlations via manipulation of autoencoder-derived motion bases produces flaws representative of those in real deepfakes from diverse unseen methods is load-bearing for the generalization result, yet the manuscript provides no distributional comparison (e.g., motion correlation statistics or feature-space distance) between the synthetic artifacts and the actual inconsistencies present in the benchmark deepfakes.

Authors: We agree that a direct distributional comparison would strengthen the claim that the synthetic kinematic inconsistencies are representative of those arising in real unseen deepfakes. In the revised manuscript, we will add a quantitative analysis (e.g., in an expanded Section 3.2) comparing motion correlation statistics—specifically, pairwise velocity correlations across facial regions—between our synthetically manipulated videos and the deepfake videos in the benchmark datasets. We will report metrics such as the average absolute difference in correlation matrices and feature-space distances to demonstrate that the introduced artifacts align with real biomechanical inconsistencies. revision: yes
Referee: [Table 2] Table 2 (Generalization benchmarks): The reported SOTA cross-dataset numbers rest on the assumption that the introduced kinematic flaws are the operative cues; without an ablation that removes or varies the motion-base manipulation while keeping other factors fixed, it is unclear whether the performance gain is attributable to the kinematic focus or to other aspects of the synthetic pipeline.

Authors: We concur that an ablation isolating the effect of the motion-base manipulation is necessary to attribute the generalization gains specifically to the kinematic inconsistencies. In the revised manuscript, we will add an ablation study comparing the full pipeline against a control variant in which the autoencoder-derived bases are either left unmanipulated or subjected to random perturbations that do not target natural correlations, while holding the face-morphing step and all other pipeline components fixed. Generalization results on the benchmarks will be reported to confirm that the targeted correlation-breaking step is the primary driver of the observed improvements. revision: yes

Circularity Check

0 steps flagged

No circularity: synthetic data pipeline and generalization claim are independent of fitted inputs

full rationale

The paper describes a self-contained pipeline: an autoencoder is trained on facial landmarks from pristine videos to extract motion bases; these bases are then manipulated to break natural correlations and the resulting inconsistencies are inserted into pristine videos via face morphing to create synthetic training data; a detector is trained on that data and evaluated for generalization on external benchmarks. None of the claimed results (e.g., state-of-the-art generalization) reduce by construction to the inputs via self-definition, renaming of fitted quantities, or load-bearing self-citations. The central assumption that the synthetic kinematic flaws are representative is an empirical hypothesis, not a tautological equivalence, and the derivation chain remains independent of the target deepfake distributions.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that facial motions have decomposable bases with learnable natural correlations that, when broken, produce detectable flaws generalizable to real deepfakes. No free parameters or invented entities are explicitly quantified in the abstract.

free parameters (2)

number of motion bases
Dimensionality chosen for the autoencoder decomposition of landmark trajectories.
manipulation intensity
Degree to which natural correlations are violated when altering the bases.

axioms (1)

domain assumption Facial movements exhibit consistent natural correlations across regions that can be captured by linear or low-dimensional bases
Invoked when training the autoencoder on pristine landmark configurations to enable selective breaking of dependencies.

pith-pipeline@v0.9.0 · 5460 in / 1312 out tokens · 34041 ms · 2026-05-17T01:48:26.578556+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We train an autoencoder to decompose facial landmark configurations into motion bases. By manipulating these bases, we selectively break the natural correlations in facial movements

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Deepshield: Forti- fying deepfake video detection with local and global forgery analysis

Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, and Guanbin Li. Deepshield: Forti- fying deepfake video detection with local and global forgery analysis. InICCV, pages 12524–12534, 2025

work page 2025
[2]

MARLIN: masked autoencoder for facial video rep- resentation learning

Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, and Munawar Hayat. MARLIN: masked autoencoder for facial video rep- resentation learning. InCVPR, pages 1493–1504, 2023

work page 2023
[3]

End-to-end reconstruction- classification learning for face forgery detection

Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. End-to-end reconstruction- classification learning for face forgery detection. InCVPR, pages 4103–4112, 2022

work page 2022
[4]

Self-supervised learning of adversarial exam- ple: Towards good generalizations for deepfake detection

Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang. Self-supervised learning of adversarial exam- ple: Towards good generalizations for deepfake detection. InCVPR, pages 18689–18698, 2022

work page 2022
[5]

Buenaposada, and Luis Baumela

Alejandro Cobo, Roberto Valle, Jos ´e M. Buenaposada, and Luis Baumela. Spatiotemporal face alignment for generaliz- able deepfake detection. InIEEE FG, pages 1–6, 2025

work page 2025
[6]

Cootes, Gareth J

Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. Active appearance models.IEEE TPAMI, 23(6):681– 685, 2001

work page 2001
[7]

Retinaface: Single-shot multi-level face localisation in the wild

Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. Retinaface: Single-shot multi-level face localisation in the wild. InCVPR, pages 5202–5211, 2020

work page 2020
[8]

Brian Dolhansky, Russell Howes, Ben Pflaum, Niv Baram, and Cristian Canton Ferrer

Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton-Ferrer. The deepfake detec- tion challenge (DFDC) preview dataset.arXiv preprint, abs/1910.08854, 2019

work page arXiv 1910
[9]

Multi-pie.Image and Vis

Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker. Multi-pie.Image and Vis. Comput., 28(5): 807–813, 2010

work page 2010
[10]

Controllable guide-space for generalizable face forgery detection

Ying Guo, Cheng Zhen, and Pengfei Yan. Controllable guide-space for generalizable face forgery detection. In ICCV, pages 20761–20770, 2023

work page 2023
[11]

Leveraging real talking faces via self- supervision for robust forgery detection

Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, and Maja Pantic. Leveraging real talking faces via self- supervision for robust forgery detection. InCVPR, pages 14930–14942, 2022

work page 2022
[12]

Towards more general video-based deepfake detection through facial component guided adaptation for foundation model

Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, and Jun- Cheng Chen. Towards more general video-based deepfake detection through facial component guided adaptation for foundation model. InCVPR, pages 22995–23005, 2025

work page 2025
[13]

Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection

Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. InCVPR, pages 2886–2895, 2020

work page 2020
[14]

Beyond spatial frequency: Pixel-wise temporal frequency- based deepfake video detection.ICCV, 2025

Taehoon Kim, Jongwook Choi, Yonghyun Jeong, Haeun Noh, Jaejun Yoo, Seungryul Baek, and Jongwon Choi. Beyond spatial frequency: Pixel-wise temporal frequency- based deepfake video detection.ICCV, 2025

work page 2025
[15]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

work page 2015
[16]

Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes

Nicolas Larue, Ngoc-Son Vu, Vitomir Struc, Peter Peer, and Vassilis Christophides. Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes. In ICCV, pages 20954–20964, 2023

work page 2023
[17]

Face x-ray for more general face forgery detection

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. Face x-ray for more general face forgery detection. InCVPR, pages 5000–5009, 2020

work page 2020
[18]

Celeb-df: A large-scale challenging dataset for deep- fake forensics

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deep- fake forensics. InCVPR, pages 3204–3213, 2020

work page 2020
[19]

Fakeradar: Probing forgery outliers to detect unknown deepfake videos

Zhaolun Li, Jichang Li, Yinqi Cai, Junye Chen, Xiaonan Luo, Guanbin Li, and Rushi Lan. Fakeradar: Probing forgery outliers to detect unknown deepfake videos. InICCV, 2025

work page 2025
[20]

Fake it till you make it: Curricu- lar dynamic forgery augmentations towards general deepfake detection

Yuzhen Lin, Wentang Song, Bin Li, Yuezun Li, Jiangqun Ni, Han Chen, and Qiushi Li. Fake it till you make it: Curricu- lar dynamic forgery augmentations towards general deepfake detection. InECCV, pages 104–122, 2024

work page 2024
[21]

Spatial- phase shallow learning: Rethinking face forgery detection in frequency domain

Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. Spatial- phase shallow learning: Rethinking face forgery detection in frequency domain. InCVPR, pages 772–781, 2021

work page 2021
[22]

Gener- alizing face forgery detection with high-frequency features

Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. Gener- alizing face forgery detection with high-frequency features. InCVPR, pages 16317–16326, 2021

work page 2021
[23]

Vulnerability-aware spatio-temporal learning for generalizable and interpretable deepfake video detection

Dat Nguyen, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. Vulnerability-aware spatio-temporal learning for generalizable and interpretable deepfake video detection. InICCV, 2025

work page 2025
[24]

Shape preserving facial landmarks with graph attention networks

Andr ´es Prados-Torreblanca, Jos´e Miguel Buenaposada, and Luis Baumela. Shape preserving facial landmarks with graph attention networks. InBMVC, 2022

work page 2022
[25]

Thinking in frequency: Face forgery detection by min- ing frequency-aware clues

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by min- ing frequency-aware clues. InECCV, pages 86–103, 2020

work page 2020
[26]

Contributing data to deepfake de- tection research.https://research.google/ blog/contributing-data-to-deepfake- detection-research/, 2019

Google Research. Contributing data to deepfake de- tection research.https://research.google/ blog/contributing-data-to-deepfake- detection-research/, 2019. Accessed: 2025-10-04

work page 2019
[27]

Faceforen- sics++: Learning to detect manipulated facial images

Andreas R ¨ossler, Davide Cozzolino, Luisa Verdoliva, Chris- tian Riess, Justus Thies, and Matthias Nießner. Faceforen- sics++: Learning to detect manipulated facial images. In ICCV, pages 1–11, 2019

work page 2019
[28]

Deepfake-adapter: Dual-level adapter for deepfake detec- tion.IJCV, 133(6):3613–3628, 2025

Rui Shao, Tianxing Wu, Liqiang Nie, and Ziwei Liu. Deepfake-adapter: Dual-level adapter for deepfake detec- tion.IJCV, 133(6):3613–3628, 2025

work page 2025
[29]

Detecting deep- fakes with self-blended images

Kaede Shiohara and Toshihiko Yamasaki. Detecting deep- fakes with self-blended images. InCVPR, pages 18699– 18708, 2022

work page 2022
[30]

De- ferred neural rendering: image synthesis using neural tex- tures.ACM TOG, 38(4):66:1–66:12, 2019

Justus Thies, Michael Zollh ¨ofer, and Matthias Nießner. De- ferred neural rendering: image synthesis using neural tex- tures.ACM TOG, 38(4):66:1–66:12, 2019

work page 2019
[31]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, pages 5998–6008, 2017

work page 2017
[32]

FSFM: A generalizable face se- curity foundation model via self-supervised facial represen- tation learning

Gaojian Wang, Feng Lin, Tong Wu, Zhenguang Liu, Zhongjie Ba, and Kui Ren. FSFM: A generalizable face se- curity foundation model via self-supervised facial represen- tation learning. InCVPR, pages 24364–24376, 2025. 9

work page 2025
[33]

Altfreezing for more general video face forgery detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, and Houqiang Li. Altfreezing for more general video face forgery detection. InCVPR, pages 4129–4138, 2023

work page 2023
[34]

TALL: thumbnail layout for deepfake video detection

Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, and Ran He. TALL: thumbnail layout for deepfake video detection. InICCV, pages 22601–22611, 2023

work page 2023
[35]

UCF: uncovering common features for generalizable deep- fake detection

Zhiyuan Yan, Yong Zhang, Yanbo Fan, and Baoyuan Wu. UCF: uncovering common features for generalizable deep- fake detection. InICCV, pages 22355–22366, 2023

work page 2023
[36]

Transcending forgery specificity with latent space augmentation for generalizable deepfake detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, and Baoyuan Wu. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection. In CVPR, pages 8984–8994, 2024

work page 2024
[37]

DF40: toward next-generation deepfake detection

Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, and Li Yuan. DF40: toward next-generation deepfake detection. InNeurIPS, 2024

work page 2024
[38]

Orthogonal subspace decom- position for generalizable AI-generated image detection

Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decom- position for generalizable AI-generated image detection. In ICML, 2025

work page 2025
[39]

Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping Yao, Shouhong Ding, Yunsheng Wu, and Li Yuan. Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning. InCVPR, pages 12615–12625, 2025

work page 2025
[40]

Multi-attentional deep- fake detection

Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. Multi-attentional deep- fake detection. InCVPR, pages 2185–2194, 2021

work page 2021
[41]

Exploring temporal coherence for more general video face forgery detection

Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. Exploring temporal coherence for more general video face forgery detection. InICCV, pages 15024–15034, 2021

work page 2021
[42]

Celebv- hq: A large-scale video facial attributes dataset

Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, and Chen Change Loy. Celebv- hq: A large-scale video facial attributes dataset. InECCV, pages 650–667, 2022

work page 2022
[43]

Wilddeepfake: A challenging real-world dataset for deepfake detection

Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. Wilddeepfake: A challenging real-world dataset for deepfake detection. InACM MM, pages 2382– 2390, 2020. 10

work page 2020

[1] [1]

Deepshield: Forti- fying deepfake video detection with local and global forgery analysis

Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, and Guanbin Li. Deepshield: Forti- fying deepfake video detection with local and global forgery analysis. InICCV, pages 12524–12534, 2025

work page 2025

[2] [2]

MARLIN: masked autoencoder for facial video rep- resentation learning

Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, and Munawar Hayat. MARLIN: masked autoencoder for facial video rep- resentation learning. InCVPR, pages 1493–1504, 2023

work page 2023

[3] [3]

End-to-end reconstruction- classification learning for face forgery detection

Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. End-to-end reconstruction- classification learning for face forgery detection. InCVPR, pages 4103–4112, 2022

work page 2022

[4] [4]

Self-supervised learning of adversarial exam- ple: Towards good generalizations for deepfake detection

Liang Chen, Yong Zhang, Yibing Song, Lingqiao Liu, and Jue Wang. Self-supervised learning of adversarial exam- ple: Towards good generalizations for deepfake detection. InCVPR, pages 18689–18698, 2022

work page 2022

[5] [5]

Buenaposada, and Luis Baumela

Alejandro Cobo, Roberto Valle, Jos ´e M. Buenaposada, and Luis Baumela. Spatiotemporal face alignment for generaliz- able deepfake detection. InIEEE FG, pages 1–6, 2025

work page 2025

[6] [6]

Cootes, Gareth J

Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. Active appearance models.IEEE TPAMI, 23(6):681– 685, 2001

work page 2001

[7] [7]

Retinaface: Single-shot multi-level face localisation in the wild

Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. Retinaface: Single-shot multi-level face localisation in the wild. InCVPR, pages 5202–5211, 2020

work page 2020

[8] [8]

Brian Dolhansky, Russell Howes, Ben Pflaum, Niv Baram, and Cristian Canton Ferrer

Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton-Ferrer. The deepfake detec- tion challenge (DFDC) preview dataset.arXiv preprint, abs/1910.08854, 2019

work page arXiv 1910

[9] [9]

Multi-pie.Image and Vis

Ralph Gross, Iain Matthews, Jeffrey Cohn, Takeo Kanade, and Simon Baker. Multi-pie.Image and Vis. Comput., 28(5): 807–813, 2010

work page 2010

[10] [10]

Controllable guide-space for generalizable face forgery detection

Ying Guo, Cheng Zhen, and Pengfei Yan. Controllable guide-space for generalizable face forgery detection. In ICCV, pages 20761–20770, 2023

work page 2023

[11] [11]

Leveraging real talking faces via self- supervision for robust forgery detection

Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, and Maja Pantic. Leveraging real talking faces via self- supervision for robust forgery detection. InCVPR, pages 14930–14942, 2022

work page 2022

[12] [12]

Towards more general video-based deepfake detection through facial component guided adaptation for foundation model

Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, and Jun- Cheng Chen. Towards more general video-based deepfake detection through facial component guided adaptation for foundation model. InCVPR, pages 22995–23005, 2025

work page 2025

[13] [13]

Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection

Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. InCVPR, pages 2886–2895, 2020

work page 2020

[14] [14]

Beyond spatial frequency: Pixel-wise temporal frequency- based deepfake video detection.ICCV, 2025

Taehoon Kim, Jongwook Choi, Yonghyun Jeong, Haeun Noh, Jaejun Yoo, Seungryul Baek, and Jongwon Choi. Beyond spatial frequency: Pixel-wise temporal frequency- based deepfake video detection.ICCV, 2025

work page 2025

[15] [15]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015

work page 2015

[16] [16]

Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes

Nicolas Larue, Ngoc-Son Vu, Vitomir Struc, Peter Peer, and Vassilis Christophides. Seeable: Soft discrepancies and bounded contrastive learning for exposing deepfakes. In ICCV, pages 20954–20964, 2023

work page 2023

[17] [17]

Face x-ray for more general face forgery detection

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. Face x-ray for more general face forgery detection. InCVPR, pages 5000–5009, 2020

work page 2020

[18] [18]

Celeb-df: A large-scale challenging dataset for deep- fake forensics

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. Celeb-df: A large-scale challenging dataset for deep- fake forensics. InCVPR, pages 3204–3213, 2020

work page 2020

[19] [19]

Fakeradar: Probing forgery outliers to detect unknown deepfake videos

Zhaolun Li, Jichang Li, Yinqi Cai, Junye Chen, Xiaonan Luo, Guanbin Li, and Rushi Lan. Fakeradar: Probing forgery outliers to detect unknown deepfake videos. InICCV, 2025

work page 2025

[20] [20]

Fake it till you make it: Curricu- lar dynamic forgery augmentations towards general deepfake detection

Yuzhen Lin, Wentang Song, Bin Li, Yuezun Li, Jiangqun Ni, Han Chen, and Qiushi Li. Fake it till you make it: Curricu- lar dynamic forgery augmentations towards general deepfake detection. InECCV, pages 104–122, 2024

work page 2024

[21] [21]

Spatial- phase shallow learning: Rethinking face forgery detection in frequency domain

Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. Spatial- phase shallow learning: Rethinking face forgery detection in frequency domain. InCVPR, pages 772–781, 2021

work page 2021

[22] [22]

Gener- alizing face forgery detection with high-frequency features

Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. Gener- alizing face forgery detection with high-frequency features. InCVPR, pages 16317–16326, 2021

work page 2021

[23] [23]

Vulnerability-aware spatio-temporal learning for generalizable and interpretable deepfake video detection

Dat Nguyen, Marcella Astrid, Anis Kacem, Enjie Ghorbel, and Djamila Aouada. Vulnerability-aware spatio-temporal learning for generalizable and interpretable deepfake video detection. InICCV, 2025

work page 2025

[24] [24]

Shape preserving facial landmarks with graph attention networks

Andr ´es Prados-Torreblanca, Jos´e Miguel Buenaposada, and Luis Baumela. Shape preserving facial landmarks with graph attention networks. InBMVC, 2022

work page 2022

[25] [25]

Thinking in frequency: Face forgery detection by min- ing frequency-aware clues

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by min- ing frequency-aware clues. InECCV, pages 86–103, 2020

work page 2020

[26] [26]

Contributing data to deepfake de- tection research.https://research.google/ blog/contributing-data-to-deepfake- detection-research/, 2019

Google Research. Contributing data to deepfake de- tection research.https://research.google/ blog/contributing-data-to-deepfake- detection-research/, 2019. Accessed: 2025-10-04

work page 2019

[27] [27]

Faceforen- sics++: Learning to detect manipulated facial images

Andreas R ¨ossler, Davide Cozzolino, Luisa Verdoliva, Chris- tian Riess, Justus Thies, and Matthias Nießner. Faceforen- sics++: Learning to detect manipulated facial images. In ICCV, pages 1–11, 2019

work page 2019

[28] [28]

Deepfake-adapter: Dual-level adapter for deepfake detec- tion.IJCV, 133(6):3613–3628, 2025

Rui Shao, Tianxing Wu, Liqiang Nie, and Ziwei Liu. Deepfake-adapter: Dual-level adapter for deepfake detec- tion.IJCV, 133(6):3613–3628, 2025

work page 2025

[29] [29]

Detecting deep- fakes with self-blended images

Kaede Shiohara and Toshihiko Yamasaki. Detecting deep- fakes with self-blended images. InCVPR, pages 18699– 18708, 2022

work page 2022

[30] [30]

De- ferred neural rendering: image synthesis using neural tex- tures.ACM TOG, 38(4):66:1–66:12, 2019

Justus Thies, Michael Zollh ¨ofer, and Matthias Nießner. De- ferred neural rendering: image synthesis using neural tex- tures.ACM TOG, 38(4):66:1–66:12, 2019

work page 2019

[31] [31]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, pages 5998–6008, 2017

work page 2017

[32] [32]

FSFM: A generalizable face se- curity foundation model via self-supervised facial represen- tation learning

Gaojian Wang, Feng Lin, Tong Wu, Zhenguang Liu, Zhongjie Ba, and Kui Ren. FSFM: A generalizable face se- curity foundation model via self-supervised facial represen- tation learning. InCVPR, pages 24364–24376, 2025. 9

work page 2025

[33] [33]

Altfreezing for more general video face forgery detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, and Houqiang Li. Altfreezing for more general video face forgery detection. InCVPR, pages 4129–4138, 2023

work page 2023

[34] [34]

TALL: thumbnail layout for deepfake video detection

Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, and Ran He. TALL: thumbnail layout for deepfake video detection. InICCV, pages 22601–22611, 2023

work page 2023

[35] [35]

UCF: uncovering common features for generalizable deep- fake detection

Zhiyuan Yan, Yong Zhang, Yanbo Fan, and Baoyuan Wu. UCF: uncovering common features for generalizable deep- fake detection. InICCV, pages 22355–22366, 2023

work page 2023

[36] [36]

Transcending forgery specificity with latent space augmentation for generalizable deepfake detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, and Baoyuan Wu. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection. In CVPR, pages 8984–8994, 2024

work page 2024

[37] [37]

DF40: toward next-generation deepfake detection

Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, and Li Yuan. DF40: toward next-generation deepfake detection. InNeurIPS, 2024

work page 2024

[38] [38]

Orthogonal subspace decom- position for generalizable AI-generated image detection

Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decom- position for generalizable AI-generated image detection. In ICML, 2025

work page 2025

[39] [39]

Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning

Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping Yao, Shouhong Ding, Yunsheng Wu, and Li Yuan. Generalizing deepfake video detection with plug- and-play: Video-level blending and spatiotemporal adapter tuning. InCVPR, pages 12615–12625, 2025

work page 2025

[40] [40]

Multi-attentional deep- fake detection

Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. Multi-attentional deep- fake detection. InCVPR, pages 2185–2194, 2021

work page 2021

[41] [41]

Exploring temporal coherence for more general video face forgery detection

Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. Exploring temporal coherence for more general video face forgery detection. InICCV, pages 15024–15034, 2021

work page 2021

[42] [42]

Celebv- hq: A large-scale video facial attributes dataset

Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, and Chen Change Loy. Celebv- hq: A large-scale video facial attributes dataset. InECCV, pages 650–667, 2022

work page 2022

[43] [43]

Wilddeepfake: A challenging real-world dataset for deepfake detection

Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. Wilddeepfake: A challenging real-world dataset for deepfake detection. InACM MM, pages 2382– 2390, 2020. 10

work page 2020