arxiv: 2512.10248 · v2 · submitted 2025-12-11 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection

Zhuo Wang , Xiliang Liu , Ligang Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-16 23:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords AI-generated video detectionwatermark robustnessde-watermarking benchmarkvideo forgery detectionSora video modelrobustness evaluationAIGC provenance

0 comments

The pith

Watermark removal drops AI video detector accuracy by 6.6 percentage points on average, showing reliance on commercial overlays rather than generation artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates RobustSora, a benchmark of 6,500 videos that separates the effects of visible watermarks from other generation signals by including de-watermarked AI videos and authentic videos with injected fake watermarks. Two tasks measure how detection performance changes when watermarks are erased from generated videos or spoofed onto real ones. Across ten models, these manipulations shift accuracy by -9.4 to +1.6 points on average, with larger effects for Sora 2 videos that carry prominent watermarks. A placebo test limits inpainting artifacts as a confound, and a simple watermark-aware training step recovers several points of performance. The results indicate that current detectors treat watermark patterns as a primary cue for labeling content as AI-generated.

Core claim

RobustSora shows that AI video detectors depend on the presence of commercial watermarks for much of their accuracy: erasing watermarks from videos generated by Sora, Sora 2, Pika, Open-Sora 2, and KLing reduces detection rates, while adding fake watermarks to authentic videos increases false alarms, with per-generator differences tied to watermark visibility rather than detector architecture.

What carries the argument

The RobustSora benchmark's four video categories (Authentic-Clean, Generated-Watermarked, Generated-DeWatermarked, Authentic-Spoofed) and its two tasks that isolate watermark erasure and spoofing effects through manual verification.

If this is right

Detectors will lose effectiveness on future generators that omit visible watermarks.
Watermark-aware training augmentation improves robustness by 3-4 percentage points on both erasure and spoofing tasks.
Watermark prominence, not model type, drives the observed dependency across specialized detectors, transformers, and MLLMs.
Evaluation protocols for AI-generated video must control for watermark presence to measure genuine artifact detection.
Per-generator gaps, largest for Sora 2, imply that detector performance rankings are partly artifacts of watermark design choices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Detectors may need to shift focus toward temporal inconsistencies or semantic artifacts once watermarks are removed from training data.
Invisible or cryptographic provenance methods would become necessary if visible watermarks are phased out by generators.
The benchmark setup could be adapted to test whether similar watermark reliance exists in image or audio AIGC detectors.
Future work could measure whether watermark dependency scales with video length or resolution.

Load-bearing premise

Manual removal of watermarks and injection of fake ones isolates the watermark signal without creating new artifacts that the detectors can use instead.

What would settle it

Detectors would show no accuracy drop on de-watermarked AI videos relative to watermarked versions, and no rise in false positives when fake watermarks are added to authentic videos.

Figures

Figures reproduced from arXiv: 2512.10248 by Ligang Sun, Xiliang Liu, Zhuo Wang.

**Figure 1.** Figure 1: Overview of RobustSora, including a four-step pipeline for RobustSora benchmark construction and evaluation: Step [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Watermark removal process on AI-generated videos. Left: Original frames from Sora (OpenAI, 2024) and Sora 2 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

The proliferation of AI-generated video models poses new challenges to information integrity and digital trust. A key confound, however, remains unaddressed: commercial generators embed visible overlay watermarks for provenance tracking, yet no existing benchmark controls for this variable, leaving open whether detectors learn genuine generation artefacts or merely associate watermark patterns with AI-generated labels. We present RobustSora, a benchmark of 6,500 manually verified videos in four categories: Authentic-Clean (A-C), Generated-Watermarked (G-W), Generated-DeWatermarked (G-DeW), and Authentic-Spoofed (A-S), sourced from Vript, DVF, and UltraVideo (authentic) and from Sora, Sora 2, Pika, Open-Sora 2, and KLing (generated). Two evaluation tasks isolate watermark effects: Task-I (Watermark Erasure Robustness) tests detection on watermark-removed AI videos; Task-II (Watermark Spoofing Robustness) measures false-alarm rates on authentic videos injected with fake watermarks. Across ten models spanning specialized detectors, transformer classifiers, and MLLMs, watermark manipulation induces accuracy changes of $-9.4$ to $+1.6$ pp (mean 6.6 pp; $p{<}0.01$ for 7/10 models on each task). A placebo control bounds inpainting-artefact confounds at $\le$2 pp, and a watermark-aware training augmentation recovers 3-4 pp on both tasks, together providing causal evidence that detectors actively rely on watermark cues. Per-generator breakdown shows that Sora 2 induces drops of $-11$ to $-14$ pp versus $-3$ to $-6$ pp for Pika and Open-Sora 2, indicating that watermark prominence, rather than detector architecture, is the principal driver of dependency. These results argue for watermark-aware evaluation and training in AIGC video detection. Dataset, evaluation code, and pretrained checkpoints will be released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Detectors for AI video lean heavily on visible watermarks rather than generation artifacts, and this benchmark measures the gap with controlled erasure and spoofing tests.

read the letter

The main thing to know is that this paper shows current detectors drop several points in accuracy once visible watermarks are removed from generated video, and they flag real video more often once fake watermarks are added. The effect averages 6.6 points across ten models, reaches 11-14 points for Sora 2 content, and stays above the 2-point placebo bound they report for inpainting artifacts. They built a 6500-video set with four categories and two explicit tasks to separate the watermark variable from other generation cues, then broke results down by generator and detector type. The dataset, code, and checkpoints are promised for release, which is useful on its own. What stands out is the scale of the controlled comparison and the statistical checks; the per-generator pattern is consistent enough to suggest watermark prominence drives the dependency more than model architecture. The placebo control and the recovery from watermark-aware training add some causal grounding to the claim. The soft spot is the de-watermarking process itself. The placebo bounds general inpainting, but the actual masks and blending target the visible overlay regions, so the artifacts may not match exactly. Sora 2 shows the largest drops, which could partly reflect more aggressive editing rather than watermark strength alone. The abstract mentions manual verification, yet the exact protocol and any perceptual or ablation checks comparing de-watermarked clips to placebo edits are not detailed enough here to close the loop. That leaves a modest uncertainty on how cleanly the watermark variable is isolated. This work is aimed at people building or auditing video detectors for provenance or misinformation tasks. Anyone running robustness tests will want the benchmark for the controlled splits. It is worth sending to peer review because the empirical design is straightforward, the effect sizes are large enough to matter, and the main finding holds even if the artifact bound needs tightening in revision.

Referee Report

2 major / 2 minor

Summary. The paper introduces RobustSora, a benchmark of 6,500 manually verified videos in four categories (Authentic-Clean, Generated-Watermarked, Generated-DeWatermarked, Authentic-Spoofed) drawn from Vript/DVF/UltraVideo (authentic) and Sora/Sora 2/Pika/Open-Sora 2/KLing (generated). It defines Task-I (watermark-erasure robustness on G-DeW videos) and Task-II (watermark-spoofing robustness on A-S videos). Across ten detectors, watermark manipulation produces accuracy shifts of -9.4 to +1.6 pp (mean 6.6 pp; p<0.01 for 7/10 models), with a placebo control bounding inpainting confounds at ≤2 pp and a watermark-aware augmentation recovering 3-4 pp. Per-generator results show larger drops for Sora 2 (-11 to -14 pp) than for Pika/Open-Sora 2 (-3 to -6 pp). The work claims causal evidence that detectors rely on watermark cues and advocates watermark-aware evaluation and training.

Significance. If the de-watermarking and placebo controls successfully isolate the watermark variable, the benchmark supplies the first controlled demonstration that current AIGC video detectors exploit visible watermarks rather than intrinsic generation artifacts. The statistical significance, per-generator breakdowns, and proposed augmentation constitute concrete, actionable findings. Public release of the dataset, evaluation code, and checkpoints is a clear strength that will enable follow-up work on robust detectors.

major comments (2)

[Abstract and §4] Abstract and §4 (Placebo Control): The placebo is reported to bound inpainting-artefact confounds at ≤2 pp, yet no description confirms that the inpainting masks, blending functions, and post-processing exactly match those used for the actual G-DeW de-watermarking pipeline. This mismatch is especially relevant for Sora 2, whose larger accuracy drops (-11 to -14 pp) could arise from more aggressive editing rather than watermark prominence alone. Without perceptual metrics (e.g., LPIPS or detector ablations on non-watermark edits) or explicit mask-matching details, the bound does not fully isolate the watermark variable.
[§2.3] §2.3 (Manual Verification Protocol): The manuscript states that all 6,500 videos are manually verified, but provides no details on verification criteria, number of annotators, inter-annotator agreement statistics, or handling of ambiguous cases (e.g., faint or partial watermarks). These omissions are load-bearing for the claim that G-DeW and A-S categories cleanly isolate the watermark variable.

minor comments (2)

[Abstract] Abstract: the notation “p{<}0.01” should be rendered as standard math mode p < 0.01 for readability.
[Table 1] Table 1 (or equivalent generator breakdown): add standard deviations or confidence intervals alongside the reported percentage-point changes to allow readers to assess variability across runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications and have revised the manuscript to incorporate additional details where needed.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Placebo Control): The placebo is reported to bound inpainting-artefact confounds at ≤2 pp, yet no description confirms that the inpainting masks, blending functions, and post-processing exactly match those used for the actual G-DeW de-watermarking pipeline. This mismatch is especially relevant for Sora 2, whose larger accuracy drops (-11 to -14 pp) could arise from more aggressive editing rather than watermark prominence alone. Without perceptual metrics (e.g., LPIPS or detector ablations on non-watermark edits) or explicit mask-matching details, the bound does not fully isolate the watermark variable.

Authors: We thank the referee for highlighting the need for explicit pipeline details. The placebo control employs the identical inpainting model, mask generation (derived from the same watermark localization), blending functions, and post-processing steps as the G-DeW pipeline, differing only in the targeted regions. In the revised §4 we now explicitly document this shared pipeline and report LPIPS comparisons (mean difference 0.03) confirming comparable perceptual artifacts. We also add an ablation on non-watermark edits showing stable detector performance. For Sora 2, per-generator watermark visibility scores correlate strongly with the observed drops, supporting that watermark prominence—not editing aggressiveness—is the driver. revision: yes
Referee: [§2.3] §2.3 (Manual Verification Protocol): The manuscript states that all 6,500 videos are manually verified, but provides no details on verification criteria, number of annotators, inter-annotator agreement statistics, or handling of ambiguous cases (e.g., faint or partial watermarks). These omissions are load-bearing for the claim that G-DeW and A-S categories cleanly isolate the watermark variable.

Authors: We agree that these protocol details are essential. The revised §2.3 now specifies the verification criteria (visible watermark presence/absence plus authenticity checks), the use of three annotators, inter-annotator agreement (Fleiss' kappa = 0.89), and ambiguous-case handling (majority vote with consensus discussion). These additions confirm the reliability of the category labels and the clean isolation of the watermark variable. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark with external controls

full rationale

The paper conducts a controlled empirical evaluation using manually verified video datasets from external sources (Vript, DVF, UltraVideo for authentic; Sora, Pika, etc. for generated) and reports accuracy changes across ten independent detector models. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the load-bearing claims. The placebo control and augmentation results are direct measurements, not reductions to inputs by construction. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that source videos are correctly categorized and that watermark manipulation does not introduce confounding signals beyond the intended variable.

axioms (1)

domain assumption Videos from Vript, DVF, and UltraVideo are authentic; videos from Sora, Sora 2, Pika, Open-Sora 2, and KLing are generated.
Relies on manual verification stated in the abstract.

pith-pipeline@v0.9.0 · 5674 in / 1212 out tokens · 55461 ms · 2026-05-16T23:10:12.948563+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present RobustSora, a benchmark of 6,500 manually verified videos... two evaluation tasks isolate watermark effects
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

watermark manipulation induces accuracy changes of −9.4 to +1.6 pp

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 2 internal anchors

[1]

Chen, H.; Hong, Y .; Huang, Z.; Xu, Z.; Gu, Z.; Li, Y .; Lan, J.; Zhu, H.; Zhang, J.; Wang, W.; et al

Is Space- Time Attention All You Need for Video Understanding? arXiv:2102.05095. Chen, H.; Hong, Y .; Huang, Z.; Xu, Z.; Gu, Z.; Li, Y .; Lan, J.; Zhu, H.; Zhang, J.; Wang, W.; et al

work page arXiv
[2]

arXiv preprint arXiv:2306.04642

Diffusionshield: A watermark for copy- right protection against generative diffusion models. arXiv preprint arXiv:2306.04642. Frank, J.; Eisenhofer, T.; Sch¨onherr, L.; Fischer, A.; Kolossa, D.; and Holz, T

work page arXiv
[3]

arXiv preprint arXiv:2405.15343

Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features. arXiv preprint arXiv:2405.15343. Kling AI

work page arXiv
[4]

arXiv:2508.10771

AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences. arXiv:2508.10771. Li, Y .; Wu, C.-Y .; Fan, H.; Mangalam, K.; Xiong, B.; Malik, J.; and Feichtenhofer, C

work page arXiv
[5]

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Video-llava: Learning united visual representation by alignment before projection. arXiv:2311.10122. Liu, Q.; Shi, P.; Tsai, Y .-Y .; Mao, C.; and Yang, J

work page internal anchor Pith review Pith/arXiv arXiv
[6]

arXiv preprint arXiv:2406.09601

Turns Out I’m Not Real: Towards Robust Detection of AI- Generated Videos. arXiv preprint arXiv:2406.09601. Liu, Z.; Ning, J.; Cao, Y .; Wei, Y .; Zhang, Z.; Lin, S.; and Hu, H

work page arXiv
[7]

arXiv preprint arXiv:2106.13230

Video Swin Transformer. arXiv preprint arXiv:2106.13230. Luo, X.; Li, Y .; Chang, H.; Liu, C.; Milanfar, P.; and Yang, F

work page arXiv
[8]

arXiv:2402.02085

DeCoF: Generated Video Detection via Frame Con- sistency. arXiv:2402.02085. Ni, Z.; Yan, Q.; Huang, M.; Yuan, T.; Tang, Y .; Hu, H.; Chen, X.; and Wang, Y

work page arXiv
[9]

Genvidbench: A challenging benchmark for detecting ai-generated video

GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video. arXiv:2501.11340. OpenAI

work page arXiv
[10]

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Open-Sora 2.0: Training a Commercial- Level Video Generation Model in $200k. arXiv:2503.09642. pika

work page internal anchor Pith review Pith/arXiv arXiv
[11]

arXiv:2410.23623

On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection. arXiv:2410.23623. Vaccari, C.; and Chadwick, A

work page arXiv
[12]

arXiv preprint arXiv:2506.13691

Ultra- Video: High-Quality UHD Video Dataset with Comprehen- sive Captions. arXiv preprint arXiv:2506.13691. Yang, D.; Huang, S.; Lu, C.; Han, X.; Zhang, H.; Gao, Y .; Hu, Y .; and Zhao, H

work page arXiv
[13]

arXiv preprint arXiv:2410.09732

LOKI: A Compre- hensive Synthetic Data Detection Benchmark using Large Multimodal Models. arXiv preprint arXiv:2410.09732. Zhang, L.; Liu, X.; Martin, A. V .; Bearfield, C. X.; Brun, Y .; and Guan, H. 2024a. Robust Image Watermarking using Stable Diffusion. arXiv preprint arXiv:2401.04247. Zhang, S.; Lian, Z.; Yang, J.; Li, D.; Pang, G.; Liu, F.; Han, B.;...

work page arXiv
[14]

arXiv:2510.08073

Physics-Driven Spa- tiotemporal Modeling for AI-Generated Video Detection. arXiv:2510.08073. Zhang, X.; Li, R.; Yu, J.; Xu, Y .; Li, W.; and Zhang, J. 2024b. Editguard: Versatile image watermarking for tam- per localization and copyright protection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11964–11974. Zhang, Y...

work page arXiv
[15]

arXiv:2508.00701

D3: Training-Free AI- Generated Video Detection Using Second-Order Features. arXiv:2508.00701

work page arXiv