Watermarks Attack Watermarks: Re-Watermarking as a Generic Removal Strategy

Benjamin I. P. Rubinstein; Maria Bulychev; Neil G. Marchant

arxiv: 2605.16796 · v1 · pith:E5AEMJFCnew · submitted 2026-05-16 · 💻 cs.CR · cs.CV

Watermarks Attack Watermarks: Re-Watermarking as a Generic Removal Strategy

Maria Bulychev , Neil G. Marchant , Benjamin I. P. Rubinstein This is my paper

Pith reviewed 2026-05-19 21:17 UTC · model grok-4.3

classification 💻 cs.CR cs.CV

keywords watermarkingadversarial attackswatermark removalimage provenancecopyright protectionre-watermarkingblack-box attack

0 comments

The pith

Re-watermarking an already watermarked image reliably suppresses the original signal.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that re-applying a watermark to an image already protected by one suppresses the original watermark's detection. This works across 96 combinations of datasets, victim watermarks, and attack watermarks. The technique needs no gradients, surrogate models, or detection keys. A separate classifier identifies the presence and identity of an existing watermark with accuracies between 0.878 and 0.953. When used together, the steps cut original bit accuracy by 25 to 48 percent and question the strength of current watermarking schemes.

Core claim

Watermark attacks are analogous to watermarking itself because both seek imperceptible changes that trigger a detector. Re-watermarking an already watermarked image therefore reliably suppresses the original signal. Rigorous experiments over 96 dataset-victim-attack combinations confirm the effect without requiring gradients, surrogate models, or keys. A simple classifier detects watermark presence and identity at 0.878-0.953 accuracy. The combined pipeline reduces bit accuracy by at least 25 percent and up to 48 percent.

What carries the argument

The analogy that watermark attacks and watermarking both apply imperceptible perturbations to trigger detectors, allowing one watermark to interfere with another.

If this is right

Re-watermarking suppresses the original signal without gradients, surrogate models, or detection keys.
A classifier identifies existing watermarks with overall accuracies of 0.878-0.953.
Combining identification and re-watermarking reduces bit accuracy by 25 to 48 percent.
Current watermarking schemes for provenance and copyright protection face a simple generic attack.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Watermark designers may need to test resistance specifically against other watermarks as a new class of interference.
The ability to first identify which watermark is present turns detection into an enabler for targeted removal.
The same re-watermarking logic could be tested on audio, video, or text detectors that rely on imperceptible triggers.
Multiple successive re-watermarkings might produce compounding suppression effects worth measuring.

Load-bearing premise

Any imperceptible change meant to trigger a detector, including a new watermark, will interfere with an existing watermark's detection independent of the schemes used.

What would settle it

An experiment on a new combination of victim and attack watermarks in which re-watermarking leaves original detection accuracy or bit accuracy essentially unchanged.

Figures

Figures reproduced from arXiv: 2605.16796 by Benjamin I. P. Rubinstein, Maria Bulychev, Neil G. Marchant.

**Figure 2.** Figure 2: Detection metrics before and after apply [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Per-method prediction distribution of the classifier, aggregated over MS-COCO and [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Per-image normalized quality degradation introduced by each watermarking method, [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Performance vs. image quality degradation on DiffusionDB. Each watermarked image [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Visual examples of all attacks from Section 4 applied to a single DiffusionDB image [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

read the original abstract

Watermarking combines an imperceptible change to an input image that will trigger a detector, to assert provenance and protect intellectual property. The literature has shown great interest in attacks on watermarking schemes: attackers are clearly motivated to steal copyrighted material or circumvent legislated deepfake protections. In this work, we make a simple-yet-powerful observation: that such attacks on watermarking-like watermarks themselves-seek an imperceptible change to an input image (now already watermarked) that will trigger a detector. This analogy comparing watermark attacks to watermarking itself is highly suggestive: that watermarks could be used to attack watermarks. Our first contribution validates this hypothesis. In rigorous experiments spanning 96 combinations of dataset, victim, and attack watermarks, we show that simply re-watermarking an already watermarked image reliably suppresses the original signal, without requiring gradients, surrogate models, or detection keys. Our second contribution is a simple classifier for detecting the presence and identity of an existing watermark in a given image. Surprisingly, experimental findings demonstrate outstanding overall accuracies 0.878-0.953. This result is of independent interest as a security vulnerability: research shows that method-specific attacks achieve substantially stronger removal than black-box attacks. Taken together, watermark identification combined with re-watermarking successfully reduces bit accuracy by at least 25% and up to 48%. Our work constitutes a cheap, generic, and highly effective attack pipeline, calling into question the reliability of current watermarking schemes to such a simple attack, as well as the value of existing sophisticated attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Re-watermarking reduces original watermark bit accuracy across many schemes but the results may reflect generic image perturbation rather than scheme-specific interference.

read the letter

Re-watermarking an already watermarked image drops the original watermark's bit accuracy by 25-48% and the authors also show a classifier that identifies the watermark scheme with accuracies from 0.878 to 0.953. Those two results are the main things to take away here. The paper tests the idea over 96 dataset-victim-attack combinations, which gives decent coverage for an empirical claim. The classifier stands out as a separate finding that could matter for anyone trying to detect hidden watermarks without the key. The attack itself stays simple and black-box, needing no gradients, surrogates, or victim keys, which is a practical plus compared with more involved removal methods in the literature. The numbers on bit-accuracy drop look consistent enough to treat as a real observation rather than noise. The clearest soft spot is the lack of controls that would separate watermark-specific interference from any small imperceptible change. The experiments do not appear to compare re-watermarking against calibrated additive noise or mild compression at matched PSNR or LPIPS levels. If those generic perturbations produce similar drops, then the central analogy loses force and the method becomes less distinct from ordinary signal degradation. The abstract presents the approach as a generic removal strategy, but without those baselines the evidence for scheme-specific conflict stays incomplete. This work is aimed at people building or evaluating watermark detectors for generative images and provenance tracking. A reader focused on practical robustness questions would get value from the attack pipeline and the classifier numbers, even if the mechanism needs tighter pinning down. The paper deserves peer review. The empirical scope is broad enough and the idea is straightforward enough that referees can check the details, request the missing controls, and assess whether the removal claim holds up under closer scrutiny.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that re-watermarking an already watermarked image reliably suppresses the original watermark signal, serving as a generic removal strategy. This is supported by experiments across 96 dataset-victim-attack combinations showing bit-accuracy drops of 25-48% without gradients, surrogates, or keys. A secondary contribution is a classifier detecting watermark presence and identity with accuracies of 0.878-0.953, which combined with re-watermarking yields the reported suppression.

Significance. If the central claim holds after controls, the work would be significant for digital watermarking and IP protection research. The breadth of 96 empirical combinations provides substantial coverage across schemes and datasets. The watermark-detection classifier is of independent interest as a demonstrated vulnerability. The approach is cheap and black-box, directly challenging the robustness of existing watermarking methods and the necessity of sophisticated attacks.

major comments (2)

[Experimental evaluation] Experimental evaluation: the results across the 96 combinations do not include control baselines using generic imperceptible perturbations (e.g., additive Gaussian noise or JPEG compression) calibrated to the same PSNR/LPIPS as the re-watermarking step. Without this isolation, it remains unclear whether the 25-48% bit-accuracy drop arises from scheme-specific embedding conflict or from generic signal degradation, which is load-bearing for the claim of a distinct 'generic removal strategy'.
[Methods] Methods section: full details on the number of trials per combination, variance or error bars on reported accuracies, and any statistical significance tests for the bit-accuracy drops are absent. This prevents full assessment of the reliability of the 0.878-0.953 classifier accuracies and the removal results.

minor comments (2)

[Abstract] The abstract states 'rigorous experiments' but does not name the specific datasets, which would aid immediate comprehension.
[Introduction] Notation distinguishing victim watermark W_v and attack watermark W_a is introduced but could be defined more formally on first use in the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive suggestions. The comments highlight important aspects for strengthening the experimental validation and methodological transparency of our work on re-watermarking as an attack strategy. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Experimental evaluation] Experimental evaluation: the results across the 96 combinations do not include control baselines using generic imperceptible perturbations (e.g., additive Gaussian noise or JPEG compression) calibrated to the same PSNR/LPIPS as the re-watermarking step. Without this isolation, it remains unclear whether the 25-48% bit-accuracy drop arises from scheme-specific embedding conflict or from generic signal degradation, which is load-bearing for the claim of a distinct 'generic removal strategy'.

Authors: We agree that control experiments with generic perturbations matched for perceptual similarity (PSNR and LPIPS) are necessary to isolate whether the observed bit-accuracy reductions stem from specific watermark embedding conflicts or from non-specific signal degradation. While our re-watermarking method applies a structured perturbation derived from the watermarking process itself—potentially leading to targeted interference rather than random degradation—we acknowledge the value of these baselines for rigorously supporting the claim of a 'generic removal strategy'. In the revised manuscript, we will include additional experiments applying additive Gaussian noise and JPEG compression calibrated to achieve similar PSNR and LPIPS values as the re-watermarking step, and report the corresponding bit-accuracy drops for comparison across the same combinations. revision: yes
Referee: [Methods] Methods section: full details on the number of trials per combination, variance or error bars on reported accuracies, and any statistical significance tests for the bit-accuracy drops are absent. This prevents full assessment of the reliability of the 0.878-0.953 classifier accuracies and the removal results.

Authors: We concur that including details on the experimental setup, such as the number of trials, measures of variance, and statistical tests, would enhance the reproducibility and credibility of our results. The original experiments were conducted over multiple images per combination to ensure robustness, but these specifics were not fully detailed in the manuscript. We will revise the Methods section to specify the number of trials (e.g., 50-100 images per dataset-victim-attack combination), include error bars or standard deviations in the reported accuracies and bit-accuracy drops, and add statistical significance tests (such as Wilcoxon signed-rank tests or t-tests) to confirm the significance of the observed reductions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical validation is self-contained

full rationale

The paper advances a hypothesis based on an analogy between watermark attacks and watermark embedding, then validates it through direct experimentation on 96 combinations of datasets, victim watermarks, and attack watermarks. Bit-accuracy drops are reported as observed outcomes rather than outputs of any fitted model or self-referential definition. No equations, parameter-fitting steps, or load-bearing self-citations appear in the derivation chain; the central claim rests on external, reproducible trials against multiple independent watermarking schemes. This constitutes a self-contained empirical result against external benchmarks, with no reduction of predictions to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions from digital watermarking and adversarial machine learning; no new free parameters, axioms beyond domain norms, or invented entities are introduced.

axioms (1)

domain assumption Watermarking embeds imperceptible signals that can be detected by a corresponding model
This underpins the analogy between watermark attacks and re-watermarking.

pith-pipeline@v0.9.0 · 5823 in / 1179 out tokens · 56957 ms · 2026-05-19T21:17:01.144463+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

re-watermarking an already watermarked image reliably suppresses the original signal, without requiring gradients, surrogate models, or detection keys
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

96 combinations of dataset, victim, and attack watermarks

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

[1]

B. An, M. Ding, T. Rabbani, A. Agrawal, Y . Xu, C. Deng, S. Zhu, A. Mohamed, Y . Wen, T. Goldstein, and F. Huang. W A VES: Benchmarking the robustness of image watermarks. InForty-first International Conference on Machine Learning, 2024

work page 2024
[2]

Ballé, D

J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston. Variational image compression with a scale hyperprior. InInternational Conference on Learning Representations, 2018

work page 2018
[3]

Flux.2: Next generation image generation

Black Forest Labs. Flux.2: Next generation image generation. urlhttps://bfl.ai/models/flux-2, 2025. Accessed May 2026

work page 2025
[4]

T. Bui, S. Agarwal, and J. Collomosse. TrustMark: Robust watermarking and watermark removal for arbitrary resolution images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18629–18639, Oct. 2025

work page 2025
[5]

T. Bui, S. Agarwal, N. Yu, and J. Collomosse. RoSteALS: Robust steganography using autoencoder latent space. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 933–942, 2023

work page 2023
[6]

Chang, B

X. Chang, B. Chen, W. Ding, and X. Liao. A DNN robust video watermarking method in dual-tree complex wavelet transform domain.Journal of Information Security and Applications, 85:103868, 2024

work page 2024
[7]

Y . Chen, J. Tian, X. Chen, and J. Zhou. Effective ambiguity attack against passport-based dnn intellectual property protection schemes through fully connected layer substitution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8123–8132, June 2023

work page 2023
[8]

Christ, S

M. Christ, S. Gunn, and O. Zamir. Undetectable watermarks for language models. InProceedings of Thirty Seventh Conference on Learning Theory, volume 247, pages 1125–1139. PMLR, 30 Jun–03 Jul 2024

work page 2024
[9]

H. Ci, Y . Song, P. Yang, J. Xie, and M. Z. Shou. WMAdapter: Adding WaterMark control to latent diffusion models. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 10901–10919. PMLR, 13–19 Jul 2025

work page 2025
[10]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

work page 2009
[11]

Esser, S

P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, and R. Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 12606–12633. PMLR, 21–27 Jul 2024

work page 2024
[12]

European Parliament and Council. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act).Official Journal of the European Union, L 2024/1689, 2024

work page 2024
[13]

Fernandez, G

P. Fernandez, G. Couairon, H. Jégou, M. Douze, and T. Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22466–22477, October 2023

work page 2023
[14]

S. Gunn, X. Zhao, and D. Song. An undetectable watermark for generative image models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[15]

S. Guo, Y . Zhong, and X. Hu. People are more susceptible to misinformation with realistic AI-synthesized images that provide strong evidence to headlines.Harvard Kennedy School Misinformation Review, 11 2025

work page 2025
[16]

J. Hayes. On visible adversarial perturbations & digital watermarking. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1710–1717, 2018. 10

work page 2018
[17]

L. Lei, K. Gai, J. Yu, L. Zhu, and Q. Wu. Secure and efficient watermarking for latent diffusion models in model distribution scenarios. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI ’25, 2025

work page 2025
[18]

Liang, G

X. Liang, G. Liu, Y . Si, X. Hu, and Z. Qian. ScreenMark: watermarking arbitrary visual content on screen. InProceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

work page 2025
[19]

Lin and M

J. Lin and M. Juarez. A crack in the bark: Leveraging public knowledge to remove Tree-Ring watermarks. In34th USENIX Security Symposium (USENIX Security 25), pages 7331–7348. USENIX Association, Aug. 2025

work page 2025
[20]

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. InComputer Vision – ECCV 2014, pages 740–755, 2014

work page 2014
[21]

Loshchilov and F

I. Loshchilov and F. Hutter. SGDR: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations, 2017

work page 2017
[22]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019

work page 2019
[23]

Y . Luo, K. Lin, C. Gu, J. Hou, L. Wen, and L. Ping. Lost in overlap: Exploring logit-based watermark collision in LLMs. In L. Chiruzzo, A. Ritter, and L. Wang, editors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 620–637, Apr. 2025

work page 2025
[24]

Z. Meng, B. Peng, and J. Dong. Latent watermark: Inject and detect watermarks in latent diffusion space. IEEE Transactions on Multimedia, 27:3399–3410, 2025

work page 2025
[25]

Midjourney version 7

Midjourney, Inc. Midjourney version 7. https://docs.midjourney.com/docs/version, April 2025. Accessed May 2026

work page 2025
[26]

Addendum to GPT-4o system card: 4o image generation

OpenAI. Addendum to GPT-4o system card: 4o image generation. Technical report, OpenAI, March 2025. Accessed May 2026

work page 2025
[27]

M. Pan, Z. Wang, X. Dong, V . Sehwag, L. Lyu, and X. Lin. Finding needles in a haystack: A black-box approach to invisible watermark detection. InComputer Vision – ECCV 2024, pages 253–270. Springer Nature Switzerland, 2025

work page 2024
[28]

Petrov, S

A. Petrov, S. Agarwal, P. Torr, A. Bibi, and J. Collomosse. On the coexistence and ensembling of watermarks. InAdvances in Neural Information Processing Systems, volume 38, 2025

work page 2025
[29]

Rebuffi, T

S.-A. Rebuffi, T. Tran, V . Lacatusu, P. Fernandez, T. Souˇcek, N. Jovanovi´c, T. Sander, H. Elsahar, and A. Mourachko. Learning to watermark in the latent space of generative models. arXiv:2601.16140 [cs.CV], 2026

work page arXiv 2026
[30]

Rezaei, M

A. Rezaei, M. Akbari, S. R. Alvar, A. Fatemi, and Y . Zhang. Lawa: Using latent space for in-generation image watermarking. InEuropean Conference on Computer Vision, pages 118–136. Springer, 2024

work page 2024
[31]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, June 2022

work page 2022
[32]

Saberi, V

M. Saberi, V . S. Sadasivan, K. Rezaei, A. Kumar, A. Chegini, W. Wang, and S. Feizi. Robustness of AI-image detectors: Fundamental limits and practical attacks. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[33]

Sander, P

T. Sander, P. Fernandez, A. O. Durmus, T. Furon, and M. Douze. Watermark anything with localized messages. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[34]

Senthuran, Y

V . Senthuran, Y . Xiang, I. Natgunanathan, and U. Thayasivam. The self-re-watermarking trap: From exploit to resilience. InInternational Conference on Learning Representations, 2026

work page 2026
[35]

I. F. Serzhenko, L. A. Khaertdinova, M. A. Pautov, and A. V . Antsiferova. Watermark overwriting attack on StegaStamp algorithm. arXiv:2505.01474 [cs.CR], 2025

work page arXiv 2025
[36]

Shamshad, T

F. Shamshad, T. Bakr, Y . Shaaban, N. Hussein, K. Nandakumar, and N. Lukas. First-place solution to NeurIPS 2024 Invisible Watermark Removal Challenge. arXiv:2508.21072 [cs.CV], 2025

work page arXiv 2024
[37]

arXiv preprint arXiv:2512.16874 , year=

T. Souˇcek, P. Fernandez, H. Elsahar, S.-A. Rebuffi, V . Lacatusu, T. Tran, T. Sander, and A. Mourachko. Pixel Seal: Adversarial-only training for invisible image and video watermarking. arXiv:2512.16874 [cs.CV], 2025. 11

work page arXiv 2025
[38]

Tancik, B

M. Tancik, B. Mildenhall, and R. Ng. StegaStamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

work page 2020
[39]

J. Wang, W. Huang, J. Zhang, X. Luo, and B. Ma. Adversarial watermark: A robust and reliable watermark against removal.Journal of Information Security and Applications, 82:103750, 2024

work page 2024
[40]

Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models. arXiv:2210.14895 [cs.CV], 2023

work page arXiv 2023
[41]

Y . Wen, J. Kirchenbauer, J. Geiping, and T. Goldstein. Tree-rings watermarks: Invisible fingerprints for diffusion images. InAdvances in Neural Information Processing Systems, volume 36, pages 58047–58063, 2023

work page 2023
[42]

S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie. ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16133–16142, June 2023

work page 2023
[43]

J. Xu, X. Liu, Y . Wu, Y . Tong, Q. Li, M. Ding, J. Tang, and Y . Dong. ImageReward: Learning and evaluating human preferences for text-to-image generation. InAdvances in Neural Information Processing Systems, volume 36, pages 15903–15935, 2023

work page 2023
[44]

Yakushev, Y

A. Yakushev, Y . Markin, D. Obydenkov, A. Frolov, S. Fomin, M. Akopyan, A. Kozachok, and A. Gaynov. Docmarking: Real-time screen-cam robust document image watermarking. In2022 Ivannikov Ispras Open Conference (ISPRAS), pages 142–150, 2022

work page 2022
[45]

Z. Yuan, X. Zhang, Z. Wang, and Z. Yin. Semi-fragile neural network watermarking for content authenti- cation and tampering localization.Expert Systems with Applications, 236:121315, 2024

work page 2024
[46]

K. A. Zhang, L. Xu, A. Cuesta-Infante, and K. Veeramachaneni. Robust invisible video watermarking with attention. arXiv:1909.01285 [cs.MM], 2019

work page arXiv 1909
[47]

Zhang, X

L. Zhang, X. Liu, A. V . i Martin, C. X. Bearfield, Y . Brun, and H. Guan. Attack-resilient image water- marking using stable diffusion. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[48]

X. Zhao, K. Zhang, Z. Su, S. Vasan, I. Grishchenko, C. Kruegel, G. Vigna, Y .-X. Wang, and L. Li. Invisible image watermarks are provably removable using generative AI. InAdvances in Neural Information Processing Systems, volume 37, pages 8643–8672, 2024

work page 2024
[49]

Watermark collision

J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei. HiDDeN: Hiding data with deep networks. InComputer Vision – ECCV 2018, pages 682–697, 2018. Broader Impacts This work exposes a previously underappreciated attack surface in invisible image watermarking: that watermark encoders themselves can be repurposed as cheap, effective removal tools, and that the ident...

work page 2018
[50]

beautiful,

datasets. [1] show that each of these datasets is characterized by a unique distribution of prompt words, where DiffusionDB emphasizes quality descriptors (e.g., “beautiful,” “highly detailed”), whereas MS-COCO focuses on object descriptions. Following [1], we selected a subset of images as follows:

work page
[51]

Keep only the reference images and their prompt strings, discarding all other metadata

work page
[52]

Tokenize prompts using OpenCLIP’s tokenizer and keep only samples whose token count falls in(0,75](since longer prompts are truncated by Stable Diffusion [31]). 14

work page
[53]

Rank images by aesthetic score [43] and select the top 500 (prioritizing high-quality images for which watermarking is most practically relevant). Baseline Attack ConfigurationsAs a baseline for the performance of our W AW attack, we evaluate each victim watermark against the strongest attacks identified in [1], each applied across a range of strengths: •...

work page
[54]

(quality levels 1–7). • Rinsing:Regen-2xDiff and Regen-4xDiff iteratively noise and denoise the image via Stable Diffusion v1.4 two and four times, respectively (20, 60, 100, and 10–50 timesteps per pass). D Classifier Analysis and Ablation D.1 Using a Different Diffusion Backbone (SD 3.5) We re-evaluate the classifier on images generated with the more re...

work page
[55]

unwatermarked

(lr= 2×10 −5, weight decay=0.01) with a cosine schedule and 10% linear warmup [21]. We maintain an effective batch size of 16 through gradient accumulation and utilize mixed-precision (fp16) training throughout. Final regularization included label smoothing of 0.1 and early stopping with a patience of five epochs. We experiment with two further training s...

work page arXiv

[1] [1]

B. An, M. Ding, T. Rabbani, A. Agrawal, Y . Xu, C. Deng, S. Zhu, A. Mohamed, Y . Wen, T. Goldstein, and F. Huang. W A VES: Benchmarking the robustness of image watermarks. InForty-first International Conference on Machine Learning, 2024

work page 2024

[2] [2]

Ballé, D

J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston. Variational image compression with a scale hyperprior. InInternational Conference on Learning Representations, 2018

work page 2018

[3] [3]

Flux.2: Next generation image generation

Black Forest Labs. Flux.2: Next generation image generation. urlhttps://bfl.ai/models/flux-2, 2025. Accessed May 2026

work page 2025

[4] [4]

T. Bui, S. Agarwal, and J. Collomosse. TrustMark: Robust watermarking and watermark removal for arbitrary resolution images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18629–18639, Oct. 2025

work page 2025

[5] [5]

T. Bui, S. Agarwal, N. Yu, and J. Collomosse. RoSteALS: Robust steganography using autoencoder latent space. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 933–942, 2023

work page 2023

[6] [6]

Chang, B

X. Chang, B. Chen, W. Ding, and X. Liao. A DNN robust video watermarking method in dual-tree complex wavelet transform domain.Journal of Information Security and Applications, 85:103868, 2024

work page 2024

[7] [7]

Y . Chen, J. Tian, X. Chen, and J. Zhou. Effective ambiguity attack against passport-based dnn intellectual property protection schemes through fully connected layer substitution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8123–8132, June 2023

work page 2023

[8] [8]

Christ, S

M. Christ, S. Gunn, and O. Zamir. Undetectable watermarks for language models. InProceedings of Thirty Seventh Conference on Learning Theory, volume 247, pages 1125–1139. PMLR, 30 Jun–03 Jul 2024

work page 2024

[9] [9]

H. Ci, Y . Song, P. Yang, J. Xie, and M. Z. Shou. WMAdapter: Adding WaterMark control to latent diffusion models. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 10901–10919. PMLR, 13–19 Jul 2025

work page 2025

[10] [10]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009

work page 2009

[11] [11]

Esser, S

P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, and R. Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 12606–12633. PMLR, 21–27 Jul 2024

work page 2024

[12] [12]

European Parliament and Council. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act).Official Journal of the European Union, L 2024/1689, 2024

work page 2024

[13] [13]

Fernandez, G

P. Fernandez, G. Couairon, H. Jégou, M. Douze, and T. Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22466–22477, October 2023

work page 2023

[14] [14]

S. Gunn, X. Zhao, and D. Song. An undetectable watermark for generative image models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[15] [15]

S. Guo, Y . Zhong, and X. Hu. People are more susceptible to misinformation with realistic AI-synthesized images that provide strong evidence to headlines.Harvard Kennedy School Misinformation Review, 11 2025

work page 2025

[16] [16]

J. Hayes. On visible adversarial perturbations & digital watermarking. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1710–1717, 2018. 10

work page 2018

[17] [17]

L. Lei, K. Gai, J. Yu, L. Zhu, and Q. Wu. Secure and efficient watermarking for latent diffusion models in model distribution scenarios. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI ’25, 2025

work page 2025

[18] [18]

Liang, G

X. Liang, G. Liu, Y . Si, X. Hu, and Z. Qian. ScreenMark: watermarking arbitrary visual content on screen. InProceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, 2025

work page 2025

[19] [19]

Lin and M

J. Lin and M. Juarez. A crack in the bark: Leveraging public knowledge to remove Tree-Ring watermarks. In34th USENIX Security Symposium (USENIX Security 25), pages 7331–7348. USENIX Association, Aug. 2025

work page 2025

[20] [20]

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. InComputer Vision – ECCV 2014, pages 740–755, 2014

work page 2014

[21] [21]

Loshchilov and F

I. Loshchilov and F. Hutter. SGDR: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations, 2017

work page 2017

[22] [22]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019

work page 2019

[23] [23]

Y . Luo, K. Lin, C. Gu, J. Hou, L. Wen, and L. Ping. Lost in overlap: Exploring logit-based watermark collision in LLMs. In L. Chiruzzo, A. Ritter, and L. Wang, editors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 620–637, Apr. 2025

work page 2025

[24] [24]

Z. Meng, B. Peng, and J. Dong. Latent watermark: Inject and detect watermarks in latent diffusion space. IEEE Transactions on Multimedia, 27:3399–3410, 2025

work page 2025

[25] [25]

Midjourney version 7

Midjourney, Inc. Midjourney version 7. https://docs.midjourney.com/docs/version, April 2025. Accessed May 2026

work page 2025

[26] [26]

Addendum to GPT-4o system card: 4o image generation

OpenAI. Addendum to GPT-4o system card: 4o image generation. Technical report, OpenAI, March 2025. Accessed May 2026

work page 2025

[27] [27]

M. Pan, Z. Wang, X. Dong, V . Sehwag, L. Lyu, and X. Lin. Finding needles in a haystack: A black-box approach to invisible watermark detection. InComputer Vision – ECCV 2024, pages 253–270. Springer Nature Switzerland, 2025

work page 2024

[28] [28]

Petrov, S

A. Petrov, S. Agarwal, P. Torr, A. Bibi, and J. Collomosse. On the coexistence and ensembling of watermarks. InAdvances in Neural Information Processing Systems, volume 38, 2025

work page 2025

[29] [29]

Rebuffi, T

S.-A. Rebuffi, T. Tran, V . Lacatusu, P. Fernandez, T. Souˇcek, N. Jovanovi´c, T. Sander, H. Elsahar, and A. Mourachko. Learning to watermark in the latent space of generative models. arXiv:2601.16140 [cs.CV], 2026

work page arXiv 2026

[30] [30]

Rezaei, M

A. Rezaei, M. Akbari, S. R. Alvar, A. Fatemi, and Y . Zhang. Lawa: Using latent space for in-generation image watermarking. InEuropean Conference on Computer Vision, pages 118–136. Springer, 2024

work page 2024

[31] [31]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, June 2022

work page 2022

[32] [32]

Saberi, V

M. Saberi, V . S. Sadasivan, K. Rezaei, A. Kumar, A. Chegini, W. Wang, and S. Feizi. Robustness of AI-image detectors: Fundamental limits and practical attacks. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[33] [33]

Sander, P

T. Sander, P. Fernandez, A. O. Durmus, T. Furon, and M. Douze. Watermark anything with localized messages. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[34] [34]

Senthuran, Y

V . Senthuran, Y . Xiang, I. Natgunanathan, and U. Thayasivam. The self-re-watermarking trap: From exploit to resilience. InInternational Conference on Learning Representations, 2026

work page 2026

[35] [35]

I. F. Serzhenko, L. A. Khaertdinova, M. A. Pautov, and A. V . Antsiferova. Watermark overwriting attack on StegaStamp algorithm. arXiv:2505.01474 [cs.CR], 2025

work page arXiv 2025

[36] [36]

Shamshad, T

F. Shamshad, T. Bakr, Y . Shaaban, N. Hussein, K. Nandakumar, and N. Lukas. First-place solution to NeurIPS 2024 Invisible Watermark Removal Challenge. arXiv:2508.21072 [cs.CV], 2025

work page arXiv 2024

[37] [37]

arXiv preprint arXiv:2512.16874 , year=

T. Souˇcek, P. Fernandez, H. Elsahar, S.-A. Rebuffi, V . Lacatusu, T. Tran, T. Sander, and A. Mourachko. Pixel Seal: Adversarial-only training for invisible image and video watermarking. arXiv:2512.16874 [cs.CV], 2025. 11

work page arXiv 2025

[38] [38]

Tancik, B

M. Tancik, B. Mildenhall, and R. Ng. StegaStamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

work page 2020

[39] [39]

J. Wang, W. Huang, J. Zhang, X. Luo, and B. Ma. Adversarial watermark: A robust and reliable watermark against removal.Journal of Information Security and Applications, 82:103750, 2024

work page 2024

[40] [40]

Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models. arXiv:2210.14895 [cs.CV], 2023

work page arXiv 2023

[41] [41]

Y . Wen, J. Kirchenbauer, J. Geiping, and T. Goldstein. Tree-rings watermarks: Invisible fingerprints for diffusion images. InAdvances in Neural Information Processing Systems, volume 36, pages 58047–58063, 2023

work page 2023

[42] [42]

S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie. ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16133–16142, June 2023

work page 2023

[43] [43]

J. Xu, X. Liu, Y . Wu, Y . Tong, Q. Li, M. Ding, J. Tang, and Y . Dong. ImageReward: Learning and evaluating human preferences for text-to-image generation. InAdvances in Neural Information Processing Systems, volume 36, pages 15903–15935, 2023

work page 2023

[44] [44]

Yakushev, Y

A. Yakushev, Y . Markin, D. Obydenkov, A. Frolov, S. Fomin, M. Akopyan, A. Kozachok, and A. Gaynov. Docmarking: Real-time screen-cam robust document image watermarking. In2022 Ivannikov Ispras Open Conference (ISPRAS), pages 142–150, 2022

work page 2022

[45] [45]

Z. Yuan, X. Zhang, Z. Wang, and Z. Yin. Semi-fragile neural network watermarking for content authenti- cation and tampering localization.Expert Systems with Applications, 236:121315, 2024

work page 2024

[46] [46]

K. A. Zhang, L. Xu, A. Cuesta-Infante, and K. Veeramachaneni. Robust invisible video watermarking with attention. arXiv:1909.01285 [cs.MM], 2019

work page arXiv 1909

[47] [47]

Zhang, X

L. Zhang, X. Liu, A. V . i Martin, C. X. Bearfield, Y . Brun, and H. Guan. Attack-resilient image water- marking using stable diffusion. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[48] [48]

X. Zhao, K. Zhang, Z. Su, S. Vasan, I. Grishchenko, C. Kruegel, G. Vigna, Y .-X. Wang, and L. Li. Invisible image watermarks are provably removable using generative AI. InAdvances in Neural Information Processing Systems, volume 37, pages 8643–8672, 2024

work page 2024

[49] [49]

Watermark collision

J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei. HiDDeN: Hiding data with deep networks. InComputer Vision – ECCV 2018, pages 682–697, 2018. Broader Impacts This work exposes a previously underappreciated attack surface in invisible image watermarking: that watermark encoders themselves can be repurposed as cheap, effective removal tools, and that the ident...

work page 2018

[50] [50]

beautiful,

datasets. [1] show that each of these datasets is characterized by a unique distribution of prompt words, where DiffusionDB emphasizes quality descriptors (e.g., “beautiful,” “highly detailed”), whereas MS-COCO focuses on object descriptions. Following [1], we selected a subset of images as follows:

work page

[51] [51]

Keep only the reference images and their prompt strings, discarding all other metadata

work page

[52] [52]

Tokenize prompts using OpenCLIP’s tokenizer and keep only samples whose token count falls in(0,75](since longer prompts are truncated by Stable Diffusion [31]). 14

work page

[53] [53]

Rank images by aesthetic score [43] and select the top 500 (prioritizing high-quality images for which watermarking is most practically relevant). Baseline Attack ConfigurationsAs a baseline for the performance of our W AW attack, we evaluate each victim watermark against the strongest attacks identified in [1], each applied across a range of strengths: •...

work page

[54] [54]

(quality levels 1–7). • Rinsing:Regen-2xDiff and Regen-4xDiff iteratively noise and denoise the image via Stable Diffusion v1.4 two and four times, respectively (20, 60, 100, and 10–50 timesteps per pass). D Classifier Analysis and Ablation D.1 Using a Different Diffusion Backbone (SD 3.5) We re-evaluate the classifier on images generated with the more re...

work page

[55] [55]

unwatermarked

(lr= 2×10 −5, weight decay=0.01) with a cosine schedule and 10% linear warmup [21]. We maintain an effective batch size of 16 through gradient accumulation and utilize mixed-precision (fp16) training throughout. Final regularization included label smoothing of 0.1 and early stopping with a patience of five epochs. We experiment with two further training s...

work page arXiv