Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Bo Han; Daiyuan Li; Feng Liu; Guoxuan Pang; Jiahao Yang; Mingkui Tan; Shuhai Zhang; Shutao Li; ZiHao Lian

arxiv: 2510.08073 · v2 · pith:642QPUQDnew · submitted 2025-10-09 · 💻 cs.CV · cs.LG

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Shuhai Zhang , ZiHao Lian , Jiahao Yang , Daiyuan Li , Guoxuan Pang , Feng Liu , Bo Han , Shutao Li

show 1 more author

Mingkui Tan

This is my paper

classification 💻 cs.CV cs.LG

keywords detectionvideosnsg-vdvideoai-generatedmodelingproposespatiotemporal

0 comments

read the original abstract

AI-generated videos have achieved near-perfect visual realism (e.g., Sora), urgently necessitating reliable detection mechanisms. However, detecting such videos faces significant challenges in modeling high-dimensional spatiotemporal dynamics and identifying subtle anomalies that violate physical laws. In this paper, we propose the first physics-driven AI-generated video detection paradigm based on probability flow conservation principles. Specifically, we propose a statistic called Normalized Spatiotemporal Gradient (NSG), which quantifies the ratio of spatial probability gradients to temporal density changes, explicitly capturing deviations from natural video dynamics. Leveraging pre-trained diffusion models, we develop an NSG estimator through spatial gradients approximation and motion-aware temporal modeling without complex motion decomposition while preserving physical constraints. Building on this, we propose an NSG-based video detection method (NSG-VD) that computes the Maximum Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric. Last, we derive an upper bound of NSG feature distances between real and generated videos, proving that generated videos exhibit amplified discrepancies due to distributional shifts. Extensive experiments confirm that NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall and 10.75% in F1-Score, validating the superior performance of NSG-VD. The source code is available at https://github.com/ZSHsh98/NSG-VD.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection
cs.CV 2025-12 conditional novelty 8.0

RobustSora benchmark demonstrates that current AI video detectors rely heavily on visible watermarks, with average accuracy drops of 6.6 percentage points when watermarks are erased and increased false alarms when wat...
CAM-VFD: Cross-Attention Multimodal Video Forgery Detection
cs.CV 2026-05 unverdicted novelty 6.0

CAM-VFD detects video forgeries by using cross-attention to identify contradictions between CLIP appearance, VideoMAE motion, and MiDaS depth features.
Beyond Semantics: Uncovering the Physics of Fakes via Universal Physical Descriptors for Cross-Modal Synthetic Detection
cs.CV 2026-04 unverdicted novelty 6.0

Five universal physical descriptors including Laplacian variance, Sobel statistics, and residual noise variance, when integrated as text encodings with CLIP, achieve up to 99.8% accuracy detecting synthetic images acr...
Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
cs.CV 2025-12 unverdicted novelty 6.0

Skyra is an MLLM that detects AI-generated videos by identifying and reasoning over grounded visual artifacts, supported by a new annotated dataset and benchmark.
Micro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts
cs.CV 2026-05 unverdicted novelty 5.0

MDMF detects AI-generated images by learning patch-level forensic signatures and quantifying their distributional discrepancies with MMD, yielding larger separation than global methods when micro-defects are present.