Vulnerability-aware spatio-temporal learning for generalizable deepfake video detection

· 2025 · arXiv 2501.01184

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection

cs.CV · 2026-05-01 · unverdicted · novelty 7.0

CMTA detects AI-generated videos by capturing unnatural temporal stability in visual-textual semantic alignment via joint embeddings and multi-grained temporal modeling, outperforming prior methods in cross-generator tests.

CAM-VFD: Cross-Attention Multimodal Video Forgery Detection

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

CAM-VFD detects video forgeries by using cross-attention to identify contradictions between CLIP appearance, VideoMAE motion, and MiDaS depth features.

Deepfake Detection in Social Media: A Temporal Artifact Analysis Using 3D Convolutional Neural Networks

cs.CV · 2026-05-17 · unverdicted · novelty 4.0

3D CNN detector with temporal consistency regularizer reaches 92.8% accuracy on DeepfakeTIMIT and 76.4% cross-dataset on FaceForensics++ without fine-tuning.

citing papers explorer

Showing 3 of 3 citing papers.

CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection cs.CV · 2026-05-01 · unverdicted · none · ref 21
CMTA detects AI-generated videos by capturing unnatural temporal stability in visual-textual semantic alignment via joint embeddings and multi-grained temporal modeling, outperforming prior methods in cross-generator tests.
CAM-VFD: Cross-Attention Multimodal Video Forgery Detection cs.CV · 2026-05-16 · unverdicted · none · ref 24
CAM-VFD detects video forgeries by using cross-attention to identify contradictions between CLIP appearance, VideoMAE motion, and MiDaS depth features.
Deepfake Detection in Social Media: A Temporal Artifact Analysis Using 3D Convolutional Neural Networks cs.CV · 2026-05-17 · unverdicted · none · ref 2
3D CNN detector with temporal consistency regularizer reaches 92.8% accuracy on DeepfakeTIMIT and 76.4% cross-dataset on FaceForensics++ without fine-tuning.

Vulnerability-aware spatio-temporal learning for generalizable deepfake video detection

fields

years

verdicts

representative citing papers

citing papers explorer