The Forensic Cost of Watermark Removal
Pith reviewed 2026-05-07 17:00 UTC · model grok-4.3
The pith
Watermark removal methods leave statistical artifacts that a classifier can detect at a false positive rate of one in a thousand.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Every standard watermark removal pipeline produces distinct statistical artifacts in the resulting images. A classifier trained on those artifacts reaches state-of-the-art detection performance at a false-positive rate of 10^{-3} for every removal method examined. No existing attack incorporates countermeasures against this leakage. When leading watermarking schemes are measured under the combined criteria of attack success, perceptual quality, and forensic detectability, none satisfies all three simultaneously. The work therefore establishes forensic stealthiness as an essential property any removal attack must possess.
What carries the argument
Statistical artifacts generated by the removal process itself, which serve as training features for a binary classifier that flags removal attempts.
Load-bearing premise
The observed statistical artifacts are produced by the removal process in general rather than by the particular implementations, datasets, or training procedures used in the experiments.
What would settle it
Construction of a single removal method that achieves high attack success, high perceptual quality, and detection rates no better than random by classifiers trained on the reported artifacts would show the artifacts are not inherent.
Figures
read the original abstract
Current watermark removal methods are evaluated on two axes: attack success rate and perceptual quality. We show this is insufficient. While state-of-the-art attacks successfully degrade the watermark signal without visible distortion, they leave distinct statistical artifacts that betray the removal attempt. We name this overlooked axis Watermark Removal Detection (WRD) and demonstrate that a modern classifier trained on these artifacts achieves state-of-the-art detection rates at $10^{-3}$ FPR across every removal method tested. No existing attack accounts for this forensic leakage. We benchmark leading watermarking schemes against standard removal pipelines under the extended evaluation triple of attack success, perceptual quality, and forensic detectability, and find that no current method balances all three. Our results establish forensic stealthiness as a necessary requirement for watermark removal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that watermark removal methods leave detectable statistical artifacts beyond attack success rate and perceptual quality. It introduces Watermark Removal Detection (WRD) as a third evaluation axis and reports that a modern classifier trained on these artifacts achieves state-of-the-art detection at 10^{-3} FPR across all tested removal methods. Benchmarking shows no existing combination of watermarking scheme and removal pipeline balances the three axes, establishing forensic stealthiness as a necessary requirement for removal attacks.
Significance. If the results generalize, the work adds a practically important forensic dimension to watermark security evaluation. The empirical demonstration that classifiers can exploit removal-induced artifacts at low FPR provides a concrete, falsifiable benchmark that could drive more robust designs for both embedding and removal. The absence of free parameters or circular fitting in the core claim is a strength.
major comments (2)
- [Experiments section (results on detection rates)] The central claim that the artifacts are inherent to watermark removal (rather than specific to the tested pipelines) is load-bearing for the conclusion that 'no current method balances all three' and that forensic stealthiness is necessary. The experiments section reports detection 'across every removal method tested' but does not include leave-one-method-out evaluation, ablation isolating the removal operator from shared post-processing or architecture choices, or a derivation showing why removal must alter detectable higher-order statistics. Without these, the classifier may be learning implementation signatures rather than a general forensic trace.
- [Benchmarking results (triple-axis evaluation)] Table reporting the triple-axis benchmark (attack success, perceptual quality, WRD) concludes that no method balances all three, yet lacks details on dataset sizes, number of trials, statistical significance tests, or controls for confounding factors such as content distribution or training procedure overlap. This undermines the strength of the 'no current method' claim.
minor comments (2)
- [Methods] Clarify the exact architecture and training procedure of the WRD classifier (e.g., backbone, loss, data augmentation) in the methods section to aid reproducibility.
- [Abstract and Results] The abstract states 'state-of-the-art detection rates' without naming the prior detectors being compared; add explicit baseline references in the results.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. The points raised regarding the generality of the artifacts and the completeness of the benchmarking details are important for strengthening the manuscript. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: The central claim that the artifacts are inherent to watermark removal (rather than specific to the tested pipelines) is load-bearing for the conclusion that 'no current method balances all three' and that forensic stealthiness is necessary. The experiments section reports detection 'across every removal method tested' but does not include leave-one-method-out evaluation, ablation isolating the removal operator from shared post-processing or architecture choices, or a derivation showing why removal must alter detectable higher-order statistics. Without these, the classifier may be learning implementation signatures rather than a general forensic trace.
Authors: We agree that demonstrating the generality of the artifacts is crucial. To address this, we will add a leave-one-method-out evaluation in the revised experiments section, training the classifier on subsets excluding one removal method at a time and reporting detection performance on the held-out method. This will help show that the classifier learns general traces rather than method-specific signatures. Additionally, we will include ablations that isolate the removal operator by controlling for post-processing steps and architecture choices where possible. Regarding a formal derivation, our paper focuses on empirical demonstration; we provide discussion on how watermark removal inherently disrupts statistical properties of the image (e.g., by introducing inconsistencies in higher-order moments due to the optimization or generative processes), but a rigorous mathematical proof is not included and would constitute significant additional theoretical work. We will clarify the empirical nature of our claims in the text to avoid overstatement. revision: partial
-
Referee: Table reporting the triple-axis benchmark (attack success, perceptual quality, WRD) concludes that no method balances all three, yet lacks details on dataset sizes, number of trials, statistical significance tests, or controls for confounding factors such as content distribution or training procedure overlap. This undermines the strength of the 'no current method' claim.
Authors: We acknowledge the need for more rigorous reporting in the benchmarking results. In the revised manuscript, we will provide full details on the dataset sizes used for each experiment, the number of independent trials (including random seeds for classifier training and evaluation), results of statistical significance tests (such as t-tests or bootstrap confidence intervals for the reported detection rates at 10^{-3} FPR), and explicit controls for confounding factors, including ensuring disjoint content distributions between training and test sets and avoiding overlap in training procedures. These details will be added to the experimental setup description and the caption of the relevant table. revision: yes
Circularity Check
No significant circularity; empirical study self-contained
full rationale
The paper's core contribution is an empirical demonstration: a classifier trained on observed statistical artifacts from tested watermark removal pipelines achieves high detection rates at low FPR. No equations, derivations, or predictions reduce by construction to fitted parameters or self-definitions. The evaluation triple (attack success, perceptual quality, forensic detectability) is defined externally via standard metrics and cross-method testing rather than circularly. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The work remains within observable experimental results without renaming known patterns or smuggling assumptions via prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Watermark removal methods produce distinct statistical artifacts in image data
Reference graph
Works this paper leans on
-
[1]
Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, and Furong Huang. 2024. WAVES: benchmarking the robustness of image water- marks. InProceedings of the 41st International Conference on Machine Learning (ICML’24)
2024
-
[2]
Tu Bui, Shruti Agarwal, and John Collomosse. 2025. TrustMark: Robust Wa- termarking and Watermark Removal for Arbitrary Resolution Images. InIEEE International Conference on Computer Vision (ICCV)
2025
-
[3]
Nicholas Carlini and David Wagner. 2017. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. InProceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 3–14
2017
-
[4]
Wes Castro and Zeki Yalniz. 2025. Video Invisible Watermarking at Scale. En- gineering at Meta. https://engineering.fb.com/2025/11/04/video-engineering/ video-invisible-watermarking-at-scale/ Accessed: 2026-02-20
2025
-
[5]
Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On The Detection of Synthetic Images Generated by Diffusion Models. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2023
-
[6]
Ingemar J. Cox, Matthew L. Miller, Jeffrey A. Bloom, Jessica Fridrich, and Ton Kalker. 2008. Chapter 2 - Applications and Properties. InDigital Watermark- ing and Steganography (Second Edition)(second edition ed.), Ingemar J. Cox, Matthew L. Miller, Jeffrey A. Bloom, Jessica Fridrich, and Ton Kalker (Eds.). Morgan Kaufmann, Burlington, 15–59. doi:10.1016/...
-
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition. IEEE, 248–255
2009
-
[8]
European Parliament and Council of the European Union. 2024. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://eur-lex.europa.eu/eli/reg/2024/1689/oj
2024
- [9]
-
[10]
Gautier Evennou, Vivien Chappelier, Ewa Kijak, and Teddy Furon. 2024. SWIFT: Semantic Watermarking for Image Forgery Thwarting. InProceedings of the IEEE International Workshop on Information Forensics and Security (WIFS)
2024
-
[11]
Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, and Teddy Furon. 2023. The Stable Signature: Rooting Watermarks in Latent Diffusion Models. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE. doi:10.1109/ICCV51070.2023.02053
-
[12]
Zeki Yalniz, and Alexandre Mourachko
Pierre Fernandez, Hady Elsahar, I. Zeki Yalniz, and Alexandre Mourachko
- [13]
-
[14]
Teddy Furon and Patrick Bas. [n. d.]. Broken Arrows.EURASIP Journal on Information Security2008 ([n. d.]), ID 597040
-
[15]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. InAdvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA
2017
-
[16]
Zhaoyang Jia, Han Fang, and Weiming Zhang. 2021. MBRS: Enhancing Robust- ness of DNN-based Watermarking by Mini-Batch of Real and Simulated JPEG Compression. 41–49. doi:10.1145/3474085.3475324
-
[17]
Black Forest Labs. 2024. FLUX. https://github.com/black-forest-labs/flux
2024
-
[18]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InComputer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 8693), David J. Fl...
- [19]
-
[20]
Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. 2025. Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances. InThe Thirteenth International Conference on Learn- ing Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=16O8GCm8Wn
2025
-
[21]
Rui Ma, Mengxi Guo, Yi Hou, Fan Yang, Yuan Li, Huizhu Jia, and Xiaodong Xie
-
[22]
InProceedings of the 30th ACM International Conference on Multi- media
Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms. InProceedings of the 30th ACM International Conference on Multi- media. 1532–1542
-
[23]
Thibault Maho, Teddy Furon, and Erwan Le Merrer. 2021. SurFree: A Fast Surrogate-Free Black-Box Attack. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foun- dation / IEEE, 10430–10439. doi:10.1109/CVPR46437.2021.01029
-
[24]
Federico Nesti, Alessandro Biondi, and Giorgio Buttazzo. 2023. Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting.IEEE Transactions on Neural Networks and Learning Systems(2023). doi:10.1109/TNNLS.2021.3105238
-
[25]
Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. 2022. Diffusion Models for Adversarial Purification. InInternational Conference on Machine Learning (ICML)
2022
- [26]
-
[27]
Jonas Ricker, Denis Lukovnikov, and Asja Fischer. 2024. AEROBLADE: Training- Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. doi:10.1109/CVPR52733. 2024.00872
-
[28]
Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, and Alexandre Mourachko. 2025. Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models. InAdvances in Neural Information Processing Systems
2025
-
[29]
Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. 2023. DIRE for Diffusion-Generated Image Detection. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 22388–22398. doi:10.1109/ICCV51070.2023.02051
-
[30]
Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. 2023. Tree- Rings Watermarks: Invisible Fingerprints for Diffusion Images. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 58047–58063. https://proceedings.neurips.cc/paper_fil...
2023
-
[31]
Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. 2023. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24,
2023
-
[32]
In: 2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pp
IEEE, 16133–16142. doi:10.1109/CVPR52729.2023.01548
-
[33]
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, and Song Han. 2025. SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025
2025
-
[34]
Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018
2018
-
[35]
Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak
Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. 2024. Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models. InForty-first International Conference on Machine Learning
2024
-
[36]
Efros, Eli Shechtman, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. [n. d.]. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. The Forensic Cost of Watermark Removal Conference 2026, April 1, 2026, Placeholder
2018
-
[37]
Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. 2018. HiDDeN: Hiding Data With Deep Networks. InComputer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV(Munich, Germany). Springer-Verlag, Berlin, Heidelberg, 682–697. doi:10.1007/978-3-030- 01267-0_40
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.