arxiv: 2508.06248 · v4 · submitted 2025-08-08 · 💻 cs.CV

Deepfake Detection that Generalizes Across Benchmarks

Andrii Yermakov , Jan Cech , Jiri Matas , Mario Fritz This is my paper

classification 💻 cs.CV

keywords generalizationdatasetsmethodapproachesdeepfakedetectionfoundationalgend

0 comments

read the original abstract

The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment. Although many approaches adapt foundation models by introducing significant architectural complexity, this work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders. The proposed method, GenD, fine-tunes only the Layer Normalization parameters (0.03% of the total) and enhances generalization by enforcing a hyperspherical feature manifold using L2 normalization and metric learning on it. We conducted an extensive evaluation on 14 benchmark datasets spanning from 2019 to 2025. The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC. Our analysis yields two primary findings for the field: 1) training on paired real-fake data from the same source video is essential for mitigating shortcut learning and improving generalization, and 2) detection difficulty on academic datasets has not strictly increased over time, with models trained on older, diverse datasets showing strong generalization capabilities. This work delivers a computationally efficient and reproducible method, proving that state-of-the-art generalization is attainable by making targeted, minimal changes to a pre-trained foundational image encoder model. The code is at: https://github.com/yermandy/GenD

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Energy-Based Constraint Networks: Learning Structural Coherence Across Modalities
cs.CV 2026-05 unverdicted novelty 7.0

Energy-based constraint networks learn structural coherence from contrastive pairs using frozen encoders, achieving 93.4% accuracy on text corruptions and 0.959 AUC on deepfake detection with composable branches that ...
Aletheia: Physics-Conditioned Localized Artifact Attention (PhyLAA-X) for End-to-End Generalizable and Robust Deepfake Video Detection
cs.CV 2026-04 unverdicted novelty 6.0

PhyLAA-X embeds physics-derived feature volumes into localized artifact attention for improved cross-generator generalization and adversarial robustness in deepfake detection.
Fractal Characterization of Low-Correlation Signals in AI-Generated Image Detection
cs.CV 2026-04 unverdicted novelty 4.0

Fractal characterization of low-correlation signals distinguishes AI-generated images from real ones with claimed robustness and superior performance.