Compositional Adversarial Training for Robust Visual Watermarking

Andrew Xu; Anirudh Satheesh; Furong Huang; Georgios Milis; Heng Huang; Michael-Andrei Panaitescu-Liess; Zikui Cai

arxiv: 2605.16720 · v1 · pith:MMZPTCDLnew · submitted 2026-05-16 · 💻 cs.CV · cs.LG

Compositional Adversarial Training for Robust Visual Watermarking

Anirudh Satheesh , Michael-Andrei Panaitescu-Liess , Andrew Xu , Georgios Milis , Heng Huang , Zikui Cai , Furong Huang This is my paper

Pith reviewed 2026-05-19 21:49 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords robust visual watermarkingcompositional adversarial trainingGumbel-Softmaxmin-max optimizationattack compositionwatermark robustnessimage watermarkingvideo watermarking

0 comments

The pith

Training visual watermarks against learned sequences of attacks produces higher robustness than random augmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard training for visual watermarks relies on random post-processing augmentations, but these rarely sample the combinations of attacks that actually remove the embedded message. The paper instead casts robustness as a min-max problem over structured compositional transformations and solves it with a learned sequential adversary. The adversary watches the current watermarked image and picks attack families step by step to maximize detection failure, using straight-through Gumbel-Softmax selection plus entropy regularization so gradients flow end-to-end. When this adversary is used during training, the resulting watermarks retain more message capacity under both single-step and multi-step attacks and generalize better to images and videos outside the training distribution.

Core claim

Formulating watermark robustness as a min-max problem over a structured space of compositional transformations and solving it via a sequential differentiable adversary selected with Gumbel-Softmax yields watermarks whose capacity improves by up to 63.5 percent in single-step attack settings and 13.0 percent in the compositional setting, with the largest gains on hard composed attacks and out-of-distribution evaluations.

What carries the argument

Compositional Adversarial Training (CAT), which trains a sequential adversary to select and compose attack families step-by-step using straight-through Gumbel-Softmax for end-to-end differentiability and entropy regularization to prevent mode collapse.

If this is right

Watermark capacity rises by up to 63.5 percent under single-step attacks and 13.0 percent under compositional attacks.
The largest gains appear on hard composed attacks and on out-of-distribution image and video benchmarks.
In the autoregressive setting, true-positive rate at one-percent false-positive rate improves by 12 percent on difficult geometric transformations.
The framework acts as a plug-in that improves existing models such as VideoSeal and PixelSeal without changing their architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sequential-adversary idea could be tested on other media such as audio or text watermarks where attack compositions also matter.
Extending the attack sequence length or adding more attack families may produce further gains provided the entropy term continues to discourage collapse.
The results suggest that any robustness training problem currently solved with random sampling might benefit from replacing it with a learned compositional adversary.

Load-bearing premise

The learned sequential adversary with Gumbel-Softmax selection can reliably cover the combinatorial space of realistic attack pipelines without missing critical compositions or collapsing to one attack mode.

What would settle it

Run both CAT-trained and random-augmentation watermarks on a fixed test suite of rare composed attack pipelines never seen in training and observe no capacity gain or lower detection rates for the CAT version.

Figures

Figures reproduced from arXiv: 2605.16720 by Andrew Xu, Anirudh Satheesh, Furong Huang, Georgios Milis, Heng Huang, Michael-Andrei Panaitescu-Liess, Zikui Cai.

**Figure 1.** Figure 1: Overview of Compositional Adversarial Training (CAT) for visual watermarking. CAT improves the overall bit accuracy by 2.2% and capacity by 17.0% for single-step and compositional attacks. Corresponding author(s): Anirudh Satheesh anirudhsatheesh.com; Email anirudhs@terpmail.umd.edu arXiv:2605.16720v1 [cs.CV] 16 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Random augmentation creates unstable training due to inefficient augmentation allocations, whereas [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Conceptual overview of the proposed training pipeline. The embedder writes message m into image x to produce the watermarked image x0. The adversary then repeatedly observes the current image xt , uses a recurrent controller to produce logits, selects an attack family via straight-through Gumbel-Softmax, and applies differentiable attacks to obtain xt+1. After T steps, the final attacked image xT is passed… view at source ↗

**Figure 4.** Figure 4: CAT substantially accelerates convergence for both PixelSeal and VideoSeal, and this advantage [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: CAT improves PixelSeal training across payload sizes. Validation bit error rate over training is shown for 32-, 64-, 128-, and 256-bit payloads under no augmentation (gray), random augmentation (blue), and CAT (orange). CAT consistently drives the error lower than random augmentation and remains effective as payload increases, whereas random augmentation plateaus at substantially higher error. E. Additiona… view at source ↗

**Figure 6.** Figure 6: Continuous attack sweeps for autoregressive watermark robustness. [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: ROC curves for clean autoregressive watermark detection. [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Token-match histograms under clean evaluation. [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: Forward transfer from single-step training to two-step attack compositions. [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: Backward transfer from compositional training to single-step attacks. [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗

**Figure 11.** Figure 11: Additional qualitative results. H. Qualitative Results Figures 11–15 present additional qualitative examples for the image watermarking experiments discussed in the main paper. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗

**Figure 12.** Figure 12: Additional qualitative results. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗

**Figure 13.** Figure 13: Additional qualitative results. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗

**Figure 14.** Figure 14: Additional qualitative results. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_14.png] view at source ↗

**Figure 15.** Figure 15: Additional qualitative results. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_15.png] view at source ↗

read the original abstract

Robust watermarking is typically trained with random post-processing augmentation, but random sampling under-covers the combinatorial space of realistic attack pipelines and rarely encounters the rare compositions that actually break detection. This leads to unstable training and poor sample efficiency. We instead formulate watermark robustness as a min-max problem over a structured space of compositional transformations. We propose Compositional Adversarial Training (CAT), a plug-in framework that learns a sequential differentiable adversary that observes the current watermarked image and selects an attack family at each step to maximally disrupt message recovery. CAT combines a straight-through Gumbel-Softmax attack selection with entropy regularization, allowing the backward pass to be end-to-end differentiable and aggregate gradient information across attack families, yielding faster, smoother convergence without collapsing to a single attack mode. We evaluate CAT on post-generation watermarks VideoSeal 0.0, VideoSeal 1.0, and PixelSeal and in-generation WMAR under both single-step and two-step attack suites, on in-distribution and multiple out-of-distribution image and video benchmarks. CAT consistently outperforms random-augmentation baselines trained with the same augmentation budget, with the largest gains on hard composed attacks and OOD evaluations; improving overall watermark capacity by up to $63.5\%$ in the single-step attack setting and $13.0\%$ in the compositional setting. In the autoregressive setting, CAT improves the TPR@FPR$=1\%$ by $12\%$ on average on difficult geometric transformations. These results show that robust visual watermarking benefits from training against adaptive compositional adversaries rather than independent random corruptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CAT adds a Gumbel-Softmax sequential selector with entropy regularization to train watermarks against learned attack compositions and reports gains over random baselines, but the abstract leaves the source of those gains and the coverage of the attack space unclear.

read the letter

The paper's core move is to replace random post-processing augmentations with a learned sequential adversary that picks attack families step by step to maximize disruption to the watermark message. It uses straight-through Gumbel-Softmax for the discrete choices and adds entropy regularization so gradients can flow across families without immediate collapse to one mode. This turns the usual min-max robustness problem into an end-to-end differentiable setup that is meant to hit the rare compositions random sampling misses. They plug the method into VideoSeal 0.0, VideoSeal 1.0, PixelSeal, and WMAR, then test single-step and two-step suites on both in-distribution and OOD image and video sets. The headline numbers are capacity lifts of 63.5% in the single-step case and 13% in the compositional case, plus a 12% average TPR improvement at 1% FPR on geometric transforms in the autoregressive setting. Those are the concrete results worth noting. The formulation is clean and the motivation about sample efficiency is practical for anyone who has watched random augmentations fail on composed pipelines. The soft spots sit in the evaluation. The abstract gives no error bars, no statistical tests, and no ablations on the entropy term or on what happens when the selector is removed. Without those controls it is hard to tell whether the reported edges come from better coverage of the combinatorial space or from the adversary simply locking onto a few strong attacks that happen to work well on the test sets. The stress-test concern about mode collapse is therefore still live; if the regularization is not doing enough work, the advantage over a matched-budget random baseline shrinks. This is for groups working on practical visual watermarking or similar embedding tasks that need to survive realistic post-processing chains. A reader who already cares about differentiable optimization over discrete attack spaces will find the technical pieces useful even if the numbers need more backing. I would send it to peer review so the authors can add the missing ablations and significance checks, but the current version is not yet ready to stand on its own.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Compositional Adversarial Training (CAT) for robust visual watermarking. It models watermark robustness as a min-max problem over compositional transformations and proposes a plug-in framework with a sequential differentiable adversary that uses straight-through Gumbel-Softmax selection combined with entropy regularization to avoid mode collapse. The approach is tested on VideoSeal 0.0, VideoSeal 1.0, PixelSeal, and WMAR under single-step and two-step attacks on in- and out-of-distribution image and video datasets, reporting capacity gains of up to 63.5% and 13.0% respectively over random-augmentation baselines with matched budget, along with improved TPR@FPR on geometric transformations.

Significance. Should the empirical findings prove robust, this work offers a promising direction for training watermark detectors against realistic, composed attack pipelines that random sampling often misses. By learning an adaptive adversary, it potentially improves sample efficiency and stability in training robust watermarks, which is relevant for protecting digital media. The consistent gains across multiple watermarking methods and settings, including OOD, highlight the practical value if the adversary's exploration of the attack space is adequately demonstrated.

major comments (2)

The central assumption that the Gumbel-Softmax sequential selection with entropy regularization reliably covers the combinatorial space of attack compositions without mode collapse is not sufficiently validated. No ablation studies on the entropy regularization strength or reports on the diversity of selected attack sequences (e.g., frequency of each family) are provided. This is critical because if collapse occurs, the method reduces to standard adversarial training against a subset of attacks, undermining the claimed superiority over random-augmentation baselines with the same budget.
The reported improvements lack error bars, statistical significance tests, or details on the exact attack compositions used in the suites. This makes it difficult to assess the reliability of the 63.5% and 13.0% capacity gains, especially given potential sensitivity to unstated implementation choices in the adversary training.

minor comments (2)

The manuscript should specify the number of attack families and the maximum number of steps in the sequential adversary to allow reproducibility.
Include figures or tables showing example attack compositions or selection probabilities over training to enhance clarity on the learned policy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment below and outline revisions to strengthen the manuscript's clarity and empirical support.

read point-by-point responses

Referee: The central assumption that the Gumbel-Softmax sequential selection with entropy regularization reliably covers the combinatorial space of attack compositions without mode collapse is not sufficiently validated. No ablation studies on the entropy regularization strength or reports on the diversity of selected attack sequences (e.g., frequency of each family) are provided. This is critical because if collapse occurs, the method reduces to standard adversarial training against a subset of attacks, undermining the claimed superiority over random-augmentation baselines with the same budget.

Authors: We appreciate this observation. The entropy regularization is explicitly introduced in Section 3.2 to promote diversity across attack families during sequential selection and to mitigate mode collapse, enabling gradient aggregation over the full combinatorial space. However, we agree that direct empirical validation via ablations and diversity statistics is currently absent and would better substantiate the claims. In the revised manuscript we will add (i) an ablation varying the entropy coefficient and reporting its effect on capacity and convergence, and (ii) histograms or tables showing the empirical frequency of each attack family selected by the adversary throughout training. These additions will confirm that the learned policy explores the space more effectively than random sampling under the same budget. revision: yes
Referee: The reported improvements lack error bars, statistical significance tests, or details on the exact attack compositions used in the suites. This makes it difficult to assess the reliability of the 63.5% and 13.0% capacity gains, especially given potential sensitivity to unstated implementation choices in the adversary training.

Authors: We concur that additional statistical rigor and implementation details would improve interpretability. In the revision we will (i) report mean capacity gains together with standard deviations computed over at least five independent training runs, (ii) include paired statistical significance tests (e.g., Wilcoxon signed-rank) between CAT and the matched-budget random-augmentation baselines, and (iii) provide an appendix table enumerating the precise attack families, parameters, and composition rules used in both the single-step and two-step suites. These changes will allow readers to evaluate the robustness of the reported gains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains from CAT vs. matched-budget random baselines are externally validated

full rationale

The paper defines watermark robustness as a min-max problem over compositional transformations and solves it via the CAT procedure (straight-through Gumbel-Softmax selection plus entropy regularization). Reported improvements (63.5% single-step capacity, 13% compositional, 12% TPR@FPR=1%) are measured on held-out in-distribution and OOD image/video benchmarks against random-augmentation baselines trained with identical budget. No equation, prediction, or central claim reduces by construction to a fitted input, self-definition, or self-citation chain; the evaluation remains falsifiable on external data and does not rename known results or smuggle ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into modeling assumptions; standard differentiability of image transformations and attack families is presupposed, with no explicit free parameters or invented entities named.

axioms (1)

domain assumption The space of realistic attack pipelines can be structured as sequential compositions of differentiable transformation families.
Invoked in the min-max formulation of watermark robustness over compositional transformations.

pith-pipeline@v0.9.0 · 5843 in / 1222 out tokens · 29807 ms · 2026-05-19T21:49:34.038254+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

formulate watermark robustness as a min-max problem over a structured space of compositional transformations... straight-through Gumbel-Softmax attack selection with entropy regularization
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sequential differentiable adversary that observes the current watermarked image and selects an attack family at each step

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 6 internal anchors

[1]

Explaining and Harnessing Adversarial Examples

Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Towards Deep Learning Models Resistant to Adversarial Attacks

Towards deep learning models resistant to adversarial attacks , author=. arXiv preprint arXiv:1706.06083 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

arXiv preprint arXiv:2210.02577 , year=

A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition , author=. arXiv preprint arXiv:2210.02577 , year=

work page arXiv
[4]

International conference on machine learning , pages=

Population based augmentation: Efficient learning of augmentation policy schedules , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[5]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Trivialaugment: Tuning-free yet state-of-the-art data augmentation , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[6]

arXiv preprint arXiv:2006.12655 , year=

Perceptual adversarial robustness: Defense against unseen threat models , author=. arXiv preprint arXiv:2006.12655 , year=

work page arXiv 2006
[7]

Intriguing properties of neural networks

Intriguing properties of neural networks , author=. arXiv preprint arXiv:1312.6199 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

2017 ieee symposium on security and privacy (sp) , pages=

Towards evaluating the robustness of neural networks , author=. 2017 ieee symposium on security and privacy (sp) , pages=. 2017 , organization=

work page 2017
[9]

Advances in neural information processing systems , volume=

Adversarial training and robustness for multiple perturbations , author=. Advances in neural information processing systems , volume=

work page
[10]

International Conference on Machine Learning , pages=

Adversarial robustness against the union of multiple perturbation models , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[11]

International conference on machine learning , pages=

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[12]

Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security , pages=

Sat: Improving adversarial training via curriculum-based loss smoothing , author=. Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security , pages=

work page
[13]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Towards compositional adversarial robustness: Generalizing adversarial training to composite semantic perturbations , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[14]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Autoaugment: Learning augmentation strategies from data , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[15]

European conference on computer vision , pages=

Differentiable automatic data augmentation , author=. European conference on computer vision , pages=. 2020 , organization=

work page 2020
[16]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Direct differentiable augmentation search , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[17]

arXiv preprint arXiv:1912.11188 , year=

Adversarial autoaugment , author=. arXiv preprint arXiv:1912.11188 , year=

work page arXiv 1912
[18]

arXiv preprint arXiv:2506.16349 , year=

Watermarking autoregressive image generation , author=. arXiv preprint arXiv:2506.16349 , year=

work page arXiv
[19]

arXiv preprint arXiv:2310.00076 , year=

Robustness of ai-image detectors: Fundamental limits and practical attacks , author=. arXiv preprint arXiv:2310.00076 , year=

work page arXiv
[20]

Diffusion models for adversarial purifi- cation.arXiv preprint arXiv:2205.07460,

Diffusion models for adversarial purification , author=. arXiv preprint arXiv:2205.07460 , year=

work page arXiv
[21]

Proceedings of international conference on image processing , volume=

DCT-based watermark recovering without resorting to the uncorrupted original image , author=. Proceedings of international conference on image processing , volume=. 1997 , organization=

work page 1997
[22]

Optics Express , volume=

Wavelet transform based watermark for digital images , author=. Optics Express , volume=. 1998 , publisher=

work page 1998
[23]

IEEE transactions on image processing , volume=

Improved wavelet-based watermarking through pixel-wise masking , author=. IEEE transactions on image processing , volume=. 2001 , publisher=

work page 2001
[24]

Proceedings of the European conference on computer vision (ECCV) , pages=

Hidden: Hiding data with deep networks , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

work page
[25]

Expert Systems with Applications , volume=

ReDMark: Framework for residual diffusion watermarking based on deep networks , author=. Expert Systems with Applications , volume=. 2020 , publisher=

work page 2020
[26]

2022 9th International Conference on Behavioural and Social Computing (BESC) , pages=

Digital Watermarking via Inverse Gradient Attention , author=. 2022 9th International Conference on Behavioural and Social Computing (BESC) , pages=. 2022 , organization=

work page 2022
[27]

Proceedings of the 29th ACM international conference on multimedia , pages=

Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression , author=. Proceedings of the 29th ACM international conference on multimedia , pages=

work page
[28]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

The stable signature: Rooting watermarks in latent diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[29]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Trustmark: Robust watermarking and watermark removal for arbitrary resolution images , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[30]

Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust

Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust , author=. arXiv preprint arXiv:2305.20030 , year=

work page arXiv
[31]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Editguard: Versatile image watermarking for tamper localization and copyright protection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[32]

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

InvisMark: Invisible and robust watermarking for AI-generated image provenance , author=. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=. 2025 , organization=

work page 2025
[33]

Robust watermarking using generative priors against image editing: From benchmarking to advances.arXiv preprint arXiv:2410.18775, 2024

Robust watermarking using generative priors against image editing: From benchmarking to advances , author=. arXiv preprint arXiv:2410.18775 , year=

work page arXiv
[34]

arXiv preprint arXiv:1909.01285 , year=

Robust invisible video watermarking with attention , author=. arXiv preprint arXiv:1909.01285 , year=

work page arXiv 1909
[35]

Video seal: Open and efficient video watermarking

Video seal: Open and efficient video watermarking , author=. arXiv preprint arXiv:2412.09492 , year=

work page arXiv
[36]

arXiv preprint arXiv:1910.01221 , year=

Romark: A robust watermarking system using adversarial training , author=. arXiv preprint arXiv:1910.01221 , year=

work page arXiv 1910
[37]

IEEE Transactions on Image Processing , year=

Dvmark: a deep multiscale framework for video watermarking , author=. IEEE Transactions on Image Processing , year=

work page
[38]

, author=

VStegNET: Video Steganography Network using Spatio-Temporal features and Micro-Bottleneck. , author=. BMVC , volume=

work page
[39]

Journal of Information Security and Applications , volume=

VHNet: A video hiding network with robustness to video coding , author=. Journal of Information Security and Applications , volume=. 2023 , publisher=

work page 2023
[40]

Pacific Conference on Computer Graphics and Applications (Huangshan, CN)(PG’24)

StegaVideo: Robust High-Resolution Video Steganography with Temporal and Edge Guidance , author=. Pacific Conference on Computer Graphics and Applications (Huangshan, CN)(PG’24). EG, Eindhoven, NL, Article , volume=

work page
[41]

IEEE Transactions on Circuits and Systems for Video Technology , volume=

Robust and compatible video watermarking via spatio-temporal enhancement and multiscale pyramid attention , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2024 , publisher=

work page 2024
[42]

2023 International Conference on Culture-Oriented Science and Technology (CoST) , pages=

ItoV: efficiently adapting deep learning-based image watermarking to video watermarking , author=. 2023 International Conference on Culture-Oriented Science and Technology (CoST) , pages=. 2023 , organization=

work page 2023
[43]

Advances in Neural Information Processing Systems , volume=

Robin: Robust and invisible watermarks for diffusion models with adversarial optimization , author=. Advances in Neural Information Processing Systems , volume=

work page
[44]

arXiv preprint arXiv:2506.20370 , year=

InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking , author=. arXiv preprint arXiv:2506.20370 , year=

work page arXiv
[45]

Advances in neural information processing systems , volume=

Invisible image watermarks are provably removable using generative ai , author=. Advances in neural information processing systems , volume=

work page
[46]

arXiv preprint arXiv:2412.12511 , year=

Invisible Watermarks: Attacks and Robustness , author=. arXiv preprint arXiv:2412.12511 , year=

work page arXiv
[47]

International Conference on Machine Learning , pages=

WAVES: Benchmarking the Robustness of Image Watermarks , author=. International Conference on Machine Learning , pages=. 2024 , organization=

work page 2024
[48]

arXiv preprint arXiv:2512.16874 , year=

Pixel Seal: Adversarial-only training for invisible image and video watermarking , author=. arXiv preprint arXiv:2512.16874 , year=

work page arXiv
[49]

International Conference on Learning Representations , year=

Categorical Reparameterization with Gumbel-Softmax , author=. International Conference on Learning Representations , year=

work page
[50]

2024 , booktitle=

Erasing the Invisible: A Stress-Test Challenge for Image Watermarks , author=. 2024 , booktitle=

work page 2024
[51]

DINOv3

Dinov3 , author=. arXiv preprint arXiv:2508.10104 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[52]

International conference on machine learning , pages=

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[53]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Segment anything , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[54]

Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

Ntire 2017 challenge on single image super-resolution: Dataset and study , author=. Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

work page 2017
[55]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages=

Low bitrate image compression with discretized gaussian mixture likelihoods , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages=

work page
[56]

, author=

A Compression Objective and a Cycle Loss for Neural Image Compression. , author=. CVPR Workshops , pages=

work page
[57]

Advances in neural information processing systems , volume=

Training generative adversarial networks with limited data , author=. Advances in neural information processing systems , volume=

work page
[58]

Movie Gen: A Cast of Media Foundation Models

Movie gen: A cast of media foundation models , author=. arXiv preprint arXiv:2410.13720 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[59]

SAM 2: Segment Anything in Images and Videos

Sam 2: Segment anything in images and videos , author=. arXiv preprint arXiv:2408.00714 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[60]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Taming transformers for high-resolution image synthesis , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[61]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Randomized autoregressive visual generation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page

[1] [1]

Explaining and Harnessing Adversarial Examples

Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Towards Deep Learning Models Resistant to Adversarial Attacks

Towards deep learning models resistant to adversarial attacks , author=. arXiv preprint arXiv:1706.06083 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

arXiv preprint arXiv:2210.02577 , year=

A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition , author=. arXiv preprint arXiv:2210.02577 , year=

work page arXiv

[4] [4]

International conference on machine learning , pages=

Population based augmentation: Efficient learning of augmentation policy schedules , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019

[5] [5]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Trivialaugment: Tuning-free yet state-of-the-art data augmentation , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page

[6] [6]

arXiv preprint arXiv:2006.12655 , year=

Perceptual adversarial robustness: Defense against unseen threat models , author=. arXiv preprint arXiv:2006.12655 , year=

work page arXiv 2006

[7] [7]

Intriguing properties of neural networks

Intriguing properties of neural networks , author=. arXiv preprint arXiv:1312.6199 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

2017 ieee symposium on security and privacy (sp) , pages=

Towards evaluating the robustness of neural networks , author=. 2017 ieee symposium on security and privacy (sp) , pages=. 2017 , organization=

work page 2017

[9] [9]

Advances in neural information processing systems , volume=

Adversarial training and robustness for multiple perturbations , author=. Advances in neural information processing systems , volume=

work page

[10] [10]

International Conference on Machine Learning , pages=

Adversarial robustness against the union of multiple perturbation models , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020

[11] [11]

International conference on machine learning , pages=

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020

[12] [12]

Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security , pages=

Sat: Improving adversarial training via curriculum-based loss smoothing , author=. Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security , pages=

work page

[13] [13]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Towards compositional adversarial robustness: Generalizing adversarial training to composite semantic perturbations , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[14] [14]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Autoaugment: Learning augmentation strategies from data , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[15] [15]

European conference on computer vision , pages=

Differentiable automatic data augmentation , author=. European conference on computer vision , pages=. 2020 , organization=

work page 2020

[16] [16]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Direct differentiable augmentation search , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page

[17] [17]

arXiv preprint arXiv:1912.11188 , year=

Adversarial autoaugment , author=. arXiv preprint arXiv:1912.11188 , year=

work page arXiv 1912

[18] [18]

arXiv preprint arXiv:2506.16349 , year=

Watermarking autoregressive image generation , author=. arXiv preprint arXiv:2506.16349 , year=

work page arXiv

[19] [19]

arXiv preprint arXiv:2310.00076 , year=

Robustness of ai-image detectors: Fundamental limits and practical attacks , author=. arXiv preprint arXiv:2310.00076 , year=

work page arXiv

[20] [20]

Diffusion models for adversarial purifi- cation.arXiv preprint arXiv:2205.07460,

Diffusion models for adversarial purification , author=. arXiv preprint arXiv:2205.07460 , year=

work page arXiv

[21] [21]

Proceedings of international conference on image processing , volume=

DCT-based watermark recovering without resorting to the uncorrupted original image , author=. Proceedings of international conference on image processing , volume=. 1997 , organization=

work page 1997

[22] [22]

Optics Express , volume=

Wavelet transform based watermark for digital images , author=. Optics Express , volume=. 1998 , publisher=

work page 1998

[23] [23]

IEEE transactions on image processing , volume=

Improved wavelet-based watermarking through pixel-wise masking , author=. IEEE transactions on image processing , volume=. 2001 , publisher=

work page 2001

[24] [24]

Proceedings of the European conference on computer vision (ECCV) , pages=

Hidden: Hiding data with deep networks , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

work page

[25] [25]

Expert Systems with Applications , volume=

ReDMark: Framework for residual diffusion watermarking based on deep networks , author=. Expert Systems with Applications , volume=. 2020 , publisher=

work page 2020

[26] [26]

2022 9th International Conference on Behavioural and Social Computing (BESC) , pages=

Digital Watermarking via Inverse Gradient Attention , author=. 2022 9th International Conference on Behavioural and Social Computing (BESC) , pages=. 2022 , organization=

work page 2022

[27] [27]

Proceedings of the 29th ACM international conference on multimedia , pages=

Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression , author=. Proceedings of the 29th ACM international conference on multimedia , pages=

work page

[28] [28]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

The stable signature: Rooting watermarks in latent diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page

[29] [29]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Trustmark: Robust watermarking and watermark removal for arbitrary resolution images , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page

[30] [30]

Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust

Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust , author=. arXiv preprint arXiv:2305.20030 , year=

work page arXiv

[31] [31]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Editguard: Versatile image watermarking for tamper localization and copyright protection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[32] [32]

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

InvisMark: Invisible and robust watermarking for AI-generated image provenance , author=. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=. 2025 , organization=

work page 2025

[33] [33]

Robust watermarking using generative priors against image editing: From benchmarking to advances.arXiv preprint arXiv:2410.18775, 2024

Robust watermarking using generative priors against image editing: From benchmarking to advances , author=. arXiv preprint arXiv:2410.18775 , year=

work page arXiv

[34] [34]

arXiv preprint arXiv:1909.01285 , year=

Robust invisible video watermarking with attention , author=. arXiv preprint arXiv:1909.01285 , year=

work page arXiv 1909

[35] [35]

Video seal: Open and efficient video watermarking

Video seal: Open and efficient video watermarking , author=. arXiv preprint arXiv:2412.09492 , year=

work page arXiv

[36] [36]

arXiv preprint arXiv:1910.01221 , year=

Romark: A robust watermarking system using adversarial training , author=. arXiv preprint arXiv:1910.01221 , year=

work page arXiv 1910

[37] [37]

IEEE Transactions on Image Processing , year=

Dvmark: a deep multiscale framework for video watermarking , author=. IEEE Transactions on Image Processing , year=

work page

[38] [38]

, author=

VStegNET: Video Steganography Network using Spatio-Temporal features and Micro-Bottleneck. , author=. BMVC , volume=

work page

[39] [39]

Journal of Information Security and Applications , volume=

VHNet: A video hiding network with robustness to video coding , author=. Journal of Information Security and Applications , volume=. 2023 , publisher=

work page 2023

[40] [40]

Pacific Conference on Computer Graphics and Applications (Huangshan, CN)(PG’24)

StegaVideo: Robust High-Resolution Video Steganography with Temporal and Edge Guidance , author=. Pacific Conference on Computer Graphics and Applications (Huangshan, CN)(PG’24). EG, Eindhoven, NL, Article , volume=

work page

[41] [41]

IEEE Transactions on Circuits and Systems for Video Technology , volume=

Robust and compatible video watermarking via spatio-temporal enhancement and multiscale pyramid attention , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2024 , publisher=

work page 2024

[42] [42]

2023 International Conference on Culture-Oriented Science and Technology (CoST) , pages=

ItoV: efficiently adapting deep learning-based image watermarking to video watermarking , author=. 2023 International Conference on Culture-Oriented Science and Technology (CoST) , pages=. 2023 , organization=

work page 2023

[43] [43]

Advances in Neural Information Processing Systems , volume=

Robin: Robust and invisible watermarks for diffusion models with adversarial optimization , author=. Advances in Neural Information Processing Systems , volume=

work page

[44] [44]

arXiv preprint arXiv:2506.20370 , year=

InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking , author=. arXiv preprint arXiv:2506.20370 , year=

work page arXiv

[45] [45]

Advances in neural information processing systems , volume=

Invisible image watermarks are provably removable using generative ai , author=. Advances in neural information processing systems , volume=

work page

[46] [46]

arXiv preprint arXiv:2412.12511 , year=

Invisible Watermarks: Attacks and Robustness , author=. arXiv preprint arXiv:2412.12511 , year=

work page arXiv

[47] [47]

International Conference on Machine Learning , pages=

WAVES: Benchmarking the Robustness of Image Watermarks , author=. International Conference on Machine Learning , pages=. 2024 , organization=

work page 2024

[48] [48]

arXiv preprint arXiv:2512.16874 , year=

Pixel Seal: Adversarial-only training for invisible image and video watermarking , author=. arXiv preprint arXiv:2512.16874 , year=

work page arXiv

[49] [49]

International Conference on Learning Representations , year=

Categorical Reparameterization with Gumbel-Softmax , author=. International Conference on Learning Representations , year=

work page

[50] [50]

2024 , booktitle=

Erasing the Invisible: A Stress-Test Challenge for Image Watermarks , author=. 2024 , booktitle=

work page 2024

[51] [51]

DINOv3

Dinov3 , author=. arXiv preprint arXiv:2508.10104 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[52] [52]

International conference on machine learning , pages=

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018

[53] [53]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Segment anything , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page

[54] [54]

Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

Ntire 2017 challenge on single image super-resolution: Dataset and study , author=. Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

work page 2017

[55] [55]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages=

Low bitrate image compression with discretized gaussian mixture likelihoods , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages=

work page

[56] [56]

, author=

A Compression Objective and a Cycle Loss for Neural Image Compression. , author=. CVPR Workshops , pages=

work page

[57] [57]

Advances in neural information processing systems , volume=

Training generative adversarial networks with limited data , author=. Advances in neural information processing systems , volume=

work page

[58] [58]

Movie Gen: A Cast of Media Foundation Models

Movie gen: A cast of media foundation models , author=. arXiv preprint arXiv:2410.13720 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[59] [59]

SAM 2: Segment Anything in Images and Videos

Sam 2: Segment anything in images and videos , author=. arXiv preprint arXiv:2408.00714 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[60] [60]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Taming transformers for high-resolution image synthesis , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[61] [61]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Randomized autoregressive visual generation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page