Faceforensics: A large-scale video dataset for forgery detection in human faces

Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Nießner · 2018 · cs.CV · arXiv 1803.09179

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

With recent advances in computer vision and graphics, it is now possible to generate videos with extremely realistic synthetic faces, even in real time. Countless applications are possible, some of which raise a legitimate alarm, calling for reliable detectors of fake videos. In fact, distinguishing between original and manipulated video can be a challenge for humans and computers alike, especially when the videos are compressed or have low resolution, as it often happens on social networks. Research on the detection of face manipulations has been seriously hampered by the lack of adequate datasets. To this end, we introduce a novel face manipulation dataset of about half a million edited images (from over 1000 videos). The manipulations have been generated with a state-of-the-art face editing approach. It exceeds all existing video manipulation datasets by at least an order of magnitude. Using our new dataset, we introduce benchmarks for classical image forensic tasks, including classification and segmentation, considering videos compressed at various quality levels. In addition, we introduce a benchmark evaluation for creating indistinguishable forgeries with known ground truth; for instance with generative refinement models.

representative citing papers

FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation

cs.CV · 2026-03-10 · unverdicted · novelty 7.0

FrameDiT proposes Matrix Attention for DiTs to achieve SOTA video generation with improved temporal coherence and efficiency comparable to local factorized attention.

The DeepSpeak Dataset

cs.CV · 2024-08-09 · unverdicted · novelty 7.0

DeepSpeak provides over 100 hours of consented, identity-matched real and modern deepfake audiovisual content focused on talking heads, with evaluations showing existing detectors fail to generalize without retraining.

Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations

cs.CV · 2019-06-21 · unverdicted · novelty 6.0

Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.

We Need No Pixels: Video Manipulation Detection Using Stream Descriptors

cs.LG · 2019-06-20 · unverdicted · novelty 6.0

Video forgeries are detectable via binary classification on multimedia stream descriptors without pixel analysis.

Latte: Latent Diffusion Transformer for Video Generation

cs.CV · 2024-01-05 · unverdicted · novelty 6.0

Latte achieves state-of-the-art video generation on FaceForensics, SkyTimelapse, UCF101, and Taichi-HD by using a latent diffusion transformer with four efficient spatial-temporal decomposition variants and best-practice training choices.

citing papers explorer

Showing 5 of 5 citing papers.

FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation cs.CV · 2026-03-10 · unverdicted · none · ref 39 · internal anchor
FrameDiT proposes Matrix Attention for DiTs to achieve SOTA video generation with improved temporal coherence and efficiency comparable to local factorized attention.
The DeepSpeak Dataset cs.CV · 2024-08-09 · unverdicted · none · ref 43 · internal anchor
DeepSpeak provides over 100 hours of consented, identity-matched real and modern deepfake audiovisual content focused on talking heads, with evaluations showing existing detectors fail to generalize without retraining.
Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations cs.CV · 2019-06-21 · unverdicted · none · ref 24 · internal anchor
Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.
We Need No Pixels: Video Manipulation Detection Using Stream Descriptors cs.LG · 2019-06-20 · unverdicted · none · ref 37 · internal anchor
Video forgeries are detectable via binary classification on multimedia stream descriptors without pixel analysis.
Latte: Latent Diffusion Transformer for Video Generation cs.CV · 2024-01-05 · unverdicted · none · ref 12
Latte achieves state-of-the-art video generation on FaceForensics, SkyTimelapse, UCF101, and Taichi-HD by using a latent diffusion transformer with four efficient spatial-temporal decomposition variants and best-practice training choices.

Faceforensics: A large-scale video dataset for forgery detection in human faces

fields

years

verdicts

representative citing papers

citing papers explorer