LAVA is a layered audio-visual watermarking system using cross-modal fusion and calibration-aware alignment to achieve robust deepfake tamper detection and localization under compression and asynchrony.
hub
FakeA VCeleb: A novel audio-video multimodal deepfake dataset.arXiv preprint arXiv:2108.05080
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
dataset 3representative citing papers
VideoASMR-Bench shows state-of-the-art VLMs fail to reliably detect AI-generated ASMR videos from real ones, though humans can still identify the fakes relatively easily.
MVAD is the first comprehensive benchmark dataset for AI-generated multimodal video-audio detection, with three realistic forgery patterns, high-quality outputs from state-of-the-art models, and diversity across visual styles and content categories.
Deepfake detectors act as alpha blending searchers; training solely on self-blended real images yields top cross-dataset generalization on 15 datasets without using synthetic deepfakes.
A training-free dual-system framework refines anomaly score ordering on uncertain samples from self-supervised talking head forgery detectors to improve detection performance.
The paper introduces semantic mismatch between authentic audio and video as a new DeepFake detection challenge via the RARV-SMM class and demonstrates that a semantic reinforcement strategy with ImageBind embeddings improves detection on FakeAVCeleb and LAV-DF.
AIFIND stabilizes incremental face forgery detection by aligning volatile features to invariant semantic anchors from low-level artifacts using attention and harmonization modules.
AVPF generates self-created audio-visual pseudo-fakes from real samples to train deepfake detectors that generalize better, with reported average gains up to 7.4%.
GenD achieves state-of-the-art average cross-dataset AUROC in deepfake detection by parameter-efficient adaptation of a foundational vision encoder with hyperspherical manifold enforcement via L2 normalization and metric learning.
Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generalization in AI-generated image detection.
Deepfake research prepared for a public-figure catastrophe that did not occur, leaving dominant real harms like NCII and voice scams under-defended.
Omni-Fake delivers a unified multimodal deepfake benchmark dataset and RL-driven detector that reports gains in accuracy, cross-modal generalization, and explainability over prior baselines.
Emo-Boost augments low-level deepfake detectors with intra- and inter-modal emotion consistency checks to raise cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.
citing papers explorer
-
LAVA: Layered Audio-Visual Anti-tampering Watermarking for Robust Deepfake Detection and Localization
LAVA is a layered audio-visual watermarking system using cross-modal fusion and calibration-aware alignment to achieve robust deepfake tamper detection and localization under compression and asynchrony.
-
VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?
VideoASMR-Bench shows state-of-the-art VLMs fail to reliably detect AI-generated ASMR videos from real ones, though humans can still identify the fakes relatively easily.
-
MVAD: A Benchmark Dataset for Multimodal AI-Generated Video-Audio Detection
MVAD is the first comprehensive benchmark dataset for AI-generated multimodal video-audio detection, with three realistic forgery patterns, high-quality outputs from state-of-the-art models, and diversity across visual styles and content categories.
-
The Alpha Blending Hypothesis: Compositing Shortcut in Deepfake Detection
Deepfake detectors act as alpha blending searchers; training solely on self-blended real images yields top cross-dataset generalization on 15 datasets without using synthetic deepfakes.
-
Enhancing Self-Supervised Talking Head Forgery Detection via a Training-Free Dual-System Framework
A training-free dual-system framework refines anomaly score ordering on uncertain samples from self-supervised talking head forgery detectors to improve detection performance.
-
Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge
The paper introduces semantic mismatch between authentic audio and video as a new DeepFake detection challenge via the RARV-SMM class and demonstrates that a semantic reinforcement strategy with ImageBind embeddings improves detection on FakeAVCeleb and LAV-DF.
-
AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection
AIFIND stabilizes incremental face forgery detection by aligning volatile features to invariant semantic anchors from low-level artifacts using attention and harmonization modules.
-
Generalizing Video DeepFake Detection by Self-generated Audio-Visual Pseudo-Fakes
AVPF generates self-created audio-visual pseudo-fakes from real samples to train deepfake detectors that generalize better, with reported average gains up to 7.4%.
-
Deepfake Detection that Generalizes Across Benchmarks
GenD achieves state-of-the-art average cross-dataset AUROC in deepfake detection by parameter-efficient adaptation of a foundational vision encoder with hyperspherical manifold enforcement via L2 normalization and metric learning.
-
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generalization in AI-generated image detection.
-
The Deepfakes We Missed: We Built Detectors for a Threat That Didn't Arrive
Deepfake research prepared for a public-figure catastrophe that did not occur, leaving dominant real harms like NCII and voice scams under-defended.
-
Omni-Fake: Benchmarking Unified Multimodal Social Media Deepfake Detection
Omni-Fake delivers a unified multimodal deepfake benchmark dataset and RL-driven detector that reports gains in accuracy, cross-modal generalization, and explainability over prior baselines.
-
EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection
Emo-Boost augments low-level deepfake detectors with intra- and inter-modal emotion consistency checks to raise cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.
- BioLip: Language-Generalizable Lip-Sync Deepfake Detection via Biomechanical Constraint Violation Modeling