Watermarks Attack Watermarks: Re-Watermarking as a Generic Removal Strategy
Pith reviewed 2026-05-19 21:17 UTC · model grok-4.3
The pith
Re-watermarking an already watermarked image reliably suppresses the original signal.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Watermark attacks are analogous to watermarking itself because both seek imperceptible changes that trigger a detector. Re-watermarking an already watermarked image therefore reliably suppresses the original signal. Rigorous experiments over 96 dataset-victim-attack combinations confirm the effect without requiring gradients, surrogate models, or keys. A simple classifier detects watermark presence and identity at 0.878-0.953 accuracy. The combined pipeline reduces bit accuracy by at least 25 percent and up to 48 percent.
What carries the argument
The analogy that watermark attacks and watermarking both apply imperceptible perturbations to trigger detectors, allowing one watermark to interfere with another.
If this is right
- Re-watermarking suppresses the original signal without gradients, surrogate models, or detection keys.
- A classifier identifies existing watermarks with overall accuracies of 0.878-0.953.
- Combining identification and re-watermarking reduces bit accuracy by 25 to 48 percent.
- Current watermarking schemes for provenance and copyright protection face a simple generic attack.
Where Pith is reading between the lines
- Watermark designers may need to test resistance specifically against other watermarks as a new class of interference.
- The ability to first identify which watermark is present turns detection into an enabler for targeted removal.
- The same re-watermarking logic could be tested on audio, video, or text detectors that rely on imperceptible triggers.
- Multiple successive re-watermarkings might produce compounding suppression effects worth measuring.
Load-bearing premise
Any imperceptible change meant to trigger a detector, including a new watermark, will interfere with an existing watermark's detection independent of the schemes used.
What would settle it
An experiment on a new combination of victim and attack watermarks in which re-watermarking leaves original detection accuracy or bit accuracy essentially unchanged.
Figures
read the original abstract
Watermarking combines an imperceptible change to an input image that will trigger a detector, to assert provenance and protect intellectual property. The literature has shown great interest in attacks on watermarking schemes: attackers are clearly motivated to steal copyrighted material or circumvent legislated deepfake protections. In this work, we make a simple-yet-powerful observation: that such attacks on watermarking-like watermarks themselves-seek an imperceptible change to an input image (now already watermarked) that will trigger a detector. This analogy comparing watermark attacks to watermarking itself is highly suggestive: that watermarks could be used to attack watermarks. Our first contribution validates this hypothesis. In rigorous experiments spanning 96 combinations of dataset, victim, and attack watermarks, we show that simply re-watermarking an already watermarked image reliably suppresses the original signal, without requiring gradients, surrogate models, or detection keys. Our second contribution is a simple classifier for detecting the presence and identity of an existing watermark in a given image. Surprisingly, experimental findings demonstrate outstanding overall accuracies 0.878-0.953. This result is of independent interest as a security vulnerability: research shows that method-specific attacks achieve substantially stronger removal than black-box attacks. Taken together, watermark identification combined with re-watermarking successfully reduces bit accuracy by at least 25% and up to 48%. Our work constitutes a cheap, generic, and highly effective attack pipeline, calling into question the reliability of current watermarking schemes to such a simple attack, as well as the value of existing sophisticated attacks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that re-watermarking an already watermarked image reliably suppresses the original watermark signal, serving as a generic removal strategy. This is supported by experiments across 96 dataset-victim-attack combinations showing bit-accuracy drops of 25-48% without gradients, surrogates, or keys. A secondary contribution is a classifier detecting watermark presence and identity with accuracies of 0.878-0.953, which combined with re-watermarking yields the reported suppression.
Significance. If the central claim holds after controls, the work would be significant for digital watermarking and IP protection research. The breadth of 96 empirical combinations provides substantial coverage across schemes and datasets. The watermark-detection classifier is of independent interest as a demonstrated vulnerability. The approach is cheap and black-box, directly challenging the robustness of existing watermarking methods and the necessity of sophisticated attacks.
major comments (2)
- [Experimental evaluation] Experimental evaluation: the results across the 96 combinations do not include control baselines using generic imperceptible perturbations (e.g., additive Gaussian noise or JPEG compression) calibrated to the same PSNR/LPIPS as the re-watermarking step. Without this isolation, it remains unclear whether the 25-48% bit-accuracy drop arises from scheme-specific embedding conflict or from generic signal degradation, which is load-bearing for the claim of a distinct 'generic removal strategy'.
- [Methods] Methods section: full details on the number of trials per combination, variance or error bars on reported accuracies, and any statistical significance tests for the bit-accuracy drops are absent. This prevents full assessment of the reliability of the 0.878-0.953 classifier accuracies and the removal results.
minor comments (2)
- [Abstract] The abstract states 'rigorous experiments' but does not name the specific datasets, which would aid immediate comprehension.
- [Introduction] Notation distinguishing victim watermark W_v and attack watermark W_a is introduced but could be defined more formally on first use in the introduction.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive suggestions. The comments highlight important aspects for strengthening the experimental validation and methodological transparency of our work on re-watermarking as an attack strategy. We address each major comment below and outline the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Experimental evaluation] Experimental evaluation: the results across the 96 combinations do not include control baselines using generic imperceptible perturbations (e.g., additive Gaussian noise or JPEG compression) calibrated to the same PSNR/LPIPS as the re-watermarking step. Without this isolation, it remains unclear whether the 25-48% bit-accuracy drop arises from scheme-specific embedding conflict or from generic signal degradation, which is load-bearing for the claim of a distinct 'generic removal strategy'.
Authors: We agree that control experiments with generic perturbations matched for perceptual similarity (PSNR and LPIPS) are necessary to isolate whether the observed bit-accuracy reductions stem from specific watermark embedding conflicts or from non-specific signal degradation. While our re-watermarking method applies a structured perturbation derived from the watermarking process itself—potentially leading to targeted interference rather than random degradation—we acknowledge the value of these baselines for rigorously supporting the claim of a 'generic removal strategy'. In the revised manuscript, we will include additional experiments applying additive Gaussian noise and JPEG compression calibrated to achieve similar PSNR and LPIPS values as the re-watermarking step, and report the corresponding bit-accuracy drops for comparison across the same combinations. revision: yes
-
Referee: [Methods] Methods section: full details on the number of trials per combination, variance or error bars on reported accuracies, and any statistical significance tests for the bit-accuracy drops are absent. This prevents full assessment of the reliability of the 0.878-0.953 classifier accuracies and the removal results.
Authors: We concur that including details on the experimental setup, such as the number of trials, measures of variance, and statistical tests, would enhance the reproducibility and credibility of our results. The original experiments were conducted over multiple images per combination to ensure robustness, but these specifics were not fully detailed in the manuscript. We will revise the Methods section to specify the number of trials (e.g., 50-100 images per dataset-victim-attack combination), include error bars or standard deviations in the reported accuracies and bit-accuracy drops, and add statistical significance tests (such as Wilcoxon signed-rank tests or t-tests) to confirm the significance of the observed reductions. revision: yes
Circularity Check
No significant circularity; empirical validation is self-contained
full rationale
The paper advances a hypothesis based on an analogy between watermark attacks and watermark embedding, then validates it through direct experimentation on 96 combinations of datasets, victim watermarks, and attack watermarks. Bit-accuracy drops are reported as observed outcomes rather than outputs of any fitted model or self-referential definition. No equations, parameter-fitting steps, or load-bearing self-citations appear in the derivation chain; the central claim rests on external, reproducible trials against multiple independent watermarking schemes. This constitutes a self-contained empirical result against external benchmarks, with no reduction of predictions to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Watermarking embeds imperceptible signals that can be detected by a corresponding model
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
re-watermarking an already watermarked image reliably suppresses the original signal, without requiring gradients, surrogate models, or detection keys
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
96 combinations of dataset, victim, and attack watermarks
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
B. An, M. Ding, T. Rabbani, A. Agrawal, Y . Xu, C. Deng, S. Zhu, A. Mohamed, Y . Wen, T. Goldstein, and F. Huang. W A VES: Benchmarking the robustness of image watermarks. InForty-first International Conference on Machine Learning, 2024
work page 2024
- [2]
-
[3]
Flux.2: Next generation image generation
Black Forest Labs. Flux.2: Next generation image generation. urlhttps://bfl.ai/models/flux-2, 2025. Accessed May 2026
work page 2025
-
[4]
T. Bui, S. Agarwal, and J. Collomosse. TrustMark: Robust watermarking and watermark removal for arbitrary resolution images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18629–18639, Oct. 2025
work page 2025
-
[5]
T. Bui, S. Agarwal, N. Yu, and J. Collomosse. RoSteALS: Robust steganography using autoencoder latent space. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 933–942, 2023
work page 2023
- [6]
-
[7]
Y . Chen, J. Tian, X. Chen, and J. Zhou. Effective ambiguity attack against passport-based dnn intellectual property protection schemes through fully connected layer substitution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8123–8132, June 2023
work page 2023
- [8]
-
[9]
H. Ci, Y . Song, P. Yang, J. Xie, and M. Z. Shou. WMAdapter: Adding WaterMark control to latent diffusion models. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 10901–10919. PMLR, 13–19 Jul 2025
work page 2025
-
[10]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009
work page 2009
-
[11]
P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, and R. Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 12606–12633. PMLR, 21–27 Jul 2024
work page 2024
-
[12]
European Parliament and Council. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act).Official Journal of the European Union, L 2024/1689, 2024
work page 2024
-
[13]
P. Fernandez, G. Couairon, H. Jégou, M. Douze, and T. Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22466–22477, October 2023
work page 2023
-
[14]
S. Gunn, X. Zhao, and D. Song. An undetectable watermark for generative image models. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[15]
S. Guo, Y . Zhong, and X. Hu. People are more susceptible to misinformation with realistic AI-synthesized images that provide strong evidence to headlines.Harvard Kennedy School Misinformation Review, 11 2025
work page 2025
-
[16]
J. Hayes. On visible adversarial perturbations & digital watermarking. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1710–1717, 2018. 10
work page 2018
-
[17]
L. Lei, K. Gai, J. Yu, L. Zhu, and Q. Wu. Secure and efficient watermarking for latent diffusion models in model distribution scenarios. InProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI ’25, 2025
work page 2025
- [18]
- [19]
-
[20]
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. InComputer Vision – ECCV 2014, pages 740–755, 2014
work page 2014
-
[21]
I. Loshchilov and F. Hutter. SGDR: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations, 2017
work page 2017
-
[22]
I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019
work page 2019
-
[23]
Y . Luo, K. Lin, C. Gu, J. Hou, L. Wen, and L. Ping. Lost in overlap: Exploring logit-based watermark collision in LLMs. In L. Chiruzzo, A. Ritter, and L. Wang, editors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 620–637, Apr. 2025
work page 2025
-
[24]
Z. Meng, B. Peng, and J. Dong. Latent watermark: Inject and detect watermarks in latent diffusion space. IEEE Transactions on Multimedia, 27:3399–3410, 2025
work page 2025
-
[25]
Midjourney, Inc. Midjourney version 7. https://docs.midjourney.com/docs/version, April 2025. Accessed May 2026
work page 2025
-
[26]
Addendum to GPT-4o system card: 4o image generation
OpenAI. Addendum to GPT-4o system card: 4o image generation. Technical report, OpenAI, March 2025. Accessed May 2026
work page 2025
-
[27]
M. Pan, Z. Wang, X. Dong, V . Sehwag, L. Lyu, and X. Lin. Finding needles in a haystack: A black-box approach to invisible watermark detection. InComputer Vision – ECCV 2024, pages 253–270. Springer Nature Switzerland, 2025
work page 2024
- [28]
-
[29]
S.-A. Rebuffi, T. Tran, V . Lacatusu, P. Fernandez, T. Souˇcek, N. Jovanovi´c, T. Sander, H. Elsahar, and A. Mourachko. Learning to watermark in the latent space of generative models. arXiv:2601.16140 [cs.CV], 2026
- [30]
-
[31]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, June 2022
work page 2022
- [32]
- [33]
-
[34]
V . Senthuran, Y . Xiang, I. Natgunanathan, and U. Thayasivam. The self-re-watermarking trap: From exploit to resilience. InInternational Conference on Learning Representations, 2026
work page 2026
- [35]
-
[36]
F. Shamshad, T. Bakr, Y . Shaaban, N. Hussein, K. Nandakumar, and N. Lukas. First-place solution to NeurIPS 2024 Invisible Watermark Removal Challenge. arXiv:2508.21072 [cs.CV], 2025
-
[37]
arXiv preprint arXiv:2512.16874 , year=
T. Souˇcek, P. Fernandez, H. Elsahar, S.-A. Rebuffi, V . Lacatusu, T. Tran, T. Sander, and A. Mourachko. Pixel Seal: Adversarial-only training for invisible image and video watermarking. arXiv:2512.16874 [cs.CV], 2025. 11
- [38]
-
[39]
J. Wang, W. Huang, J. Zhang, X. Luo, and B. Ma. Adversarial watermark: A robust and reliable watermark against removal.Journal of Information Security and Applications, 82:103750, 2024
work page 2024
- [40]
-
[41]
Y . Wen, J. Kirchenbauer, J. Geiping, and T. Goldstein. Tree-rings watermarks: Invisible fingerprints for diffusion images. InAdvances in Neural Information Processing Systems, volume 36, pages 58047–58063, 2023
work page 2023
-
[42]
S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie. ConvNeXt V2: Co-designing and scaling ConvNets with masked autoencoders. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16133–16142, June 2023
work page 2023
-
[43]
J. Xu, X. Liu, Y . Wu, Y . Tong, Q. Li, M. Ding, J. Tang, and Y . Dong. ImageReward: Learning and evaluating human preferences for text-to-image generation. InAdvances in Neural Information Processing Systems, volume 36, pages 15903–15935, 2023
work page 2023
-
[44]
A. Yakushev, Y . Markin, D. Obydenkov, A. Frolov, S. Fomin, M. Akopyan, A. Kozachok, and A. Gaynov. Docmarking: Real-time screen-cam robust document image watermarking. In2022 Ivannikov Ispras Open Conference (ISPRAS), pages 142–150, 2022
work page 2022
-
[45]
Z. Yuan, X. Zhang, Z. Wang, and Z. Yin. Semi-fragile neural network watermarking for content authenti- cation and tampering localization.Expert Systems with Applications, 236:121315, 2024
work page 2024
- [46]
- [47]
-
[48]
X. Zhao, K. Zhang, Z. Su, S. Vasan, I. Grishchenko, C. Kruegel, G. Vigna, Y .-X. Wang, and L. Li. Invisible image watermarks are provably removable using generative AI. InAdvances in Neural Information Processing Systems, volume 37, pages 8643–8672, 2024
work page 2024
-
[49]
J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei. HiDDeN: Hiding data with deep networks. InComputer Vision – ECCV 2018, pages 682–697, 2018. Broader Impacts This work exposes a previously underappreciated attack surface in invisible image watermarking: that watermark encoders themselves can be repurposed as cheap, effective removal tools, and that the ident...
work page 2018
-
[50]
datasets. [1] show that each of these datasets is characterized by a unique distribution of prompt words, where DiffusionDB emphasizes quality descriptors (e.g., “beautiful,” “highly detailed”), whereas MS-COCO focuses on object descriptions. Following [1], we selected a subset of images as follows:
-
[51]
Keep only the reference images and their prompt strings, discarding all other metadata
-
[52]
Tokenize prompts using OpenCLIP’s tokenizer and keep only samples whose token count falls in(0,75](since longer prompts are truncated by Stable Diffusion [31]). 14
-
[53]
Rank images by aesthetic score [43] and select the top 500 (prioritizing high-quality images for which watermarking is most practically relevant). Baseline Attack ConfigurationsAs a baseline for the performance of our W AW attack, we evaluate each victim watermark against the strongest attacks identified in [1], each applied across a range of strengths: •...
-
[54]
(quality levels 1–7). • Rinsing:Regen-2xDiff and Regen-4xDiff iteratively noise and denoise the image via Stable Diffusion v1.4 two and four times, respectively (20, 60, 100, and 10–50 timesteps per pass). D Classifier Analysis and Ablation D.1 Using a Different Diffusion Backbone (SD 3.5) We re-evaluate the classifier on images generated with the more re...
-
[55]
(lr= 2×10 −5, weight decay=0.01) with a cosine schedule and 10% linear warmup [21]. We maintain an effective batch size of 16 through gradient accumulation and utilize mixed-precision (fp16) training throughout. Final regularization included label smoothing of 0.1 and early stopping with a patience of five epochs. We experiment with two further training s...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.