CMTA detects AI-generated videos by capturing unnatural temporal stability in visual-textual semantic alignment via joint embeddings and multi-grained temporal modeling, outperforming prior methods in cross-generator tests.
Vulnerability-aware spatio-temporal learning for generalizable deepfake video detection
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
CAM-VFD detects video forgeries by using cross-attention to identify contradictions between CLIP appearance, VideoMAE motion, and MiDaS depth features.
3D CNN detector with temporal consistency regularizer reaches 92.8% accuracy on DeepfakeTIMIT and 76.4% cross-dataset on FaceForensics++ without fine-tuning.
citing papers explorer
-
CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection
CMTA detects AI-generated videos by capturing unnatural temporal stability in visual-textual semantic alignment via joint embeddings and multi-grained temporal modeling, outperforming prior methods in cross-generator tests.
-
CAM-VFD: Cross-Attention Multimodal Video Forgery Detection
CAM-VFD detects video forgeries by using cross-attention to identify contradictions between CLIP appearance, VideoMAE motion, and MiDaS depth features.
-
Deepfake Detection in Social Media: A Temporal Artifact Analysis Using 3D Convolutional Neural Networks
3D CNN detector with temporal consistency regularizer reaches 92.8% accuracy on DeepfakeTIMIT and 76.4% cross-dataset on FaceForensics++ without fine-tuning.