arxiv: 2604.25491 · v1 · submitted 2026-04-28 · 💻 cs.CV · cs.AI

The Forensic Cost of Watermark Removal

Gautier Evennou , Ewa Kijak This is my paper

Pith reviewed 2026-05-07 17:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords watermark removalforensic detectionstatistical artifactsimage watermarkingadversarial removalcontent authenticationmachine learning forensics

0 comments

The pith

Watermark removal methods leave statistical artifacts that a classifier can detect at a false positive rate of one in a thousand.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Evaluations of watermark removal have focused only on whether the watermark disappears and whether the image still looks natural. The paper demonstrates that removal operations also imprint consistent statistical patterns on the output images. A modern classifier trained on these patterns identifies the removal attempt across all tested methods while keeping false alarms low. Because the patterns appear regardless of the specific removal pipeline, the authors conclude that forensic detectability must become a third required dimension of evaluation alongside success rate and visual quality.

Core claim

Every standard watermark removal pipeline produces distinct statistical artifacts in the resulting images. A classifier trained on those artifacts reaches state-of-the-art detection performance at a false-positive rate of 10^{-3} for every removal method examined. No existing attack incorporates countermeasures against this leakage. When leading watermarking schemes are measured under the combined criteria of attack success, perceptual quality, and forensic detectability, none satisfies all three simultaneously. The work therefore establishes forensic stealthiness as an essential property any removal attack must possess.

What carries the argument

Statistical artifacts generated by the removal process itself, which serve as training features for a binary classifier that flags removal attempts.

Load-bearing premise

The observed statistical artifacts are produced by the removal process in general rather than by the particular implementations, datasets, or training procedures used in the experiments.

What would settle it

Construction of a single removal method that achieves high attack success, high perceptual quality, and detection rates no better than random by classifiers trained on the reported artifacts would show the artifacts are not inherent.

Figures

Figures reproduced from arXiv: 2604.25491 by Ewa Kijak, Gautier Evennou.

**Figure 1.** Figure 1: Watermark removal attacks samples, with residuals and Fourier spectrum. WMForger and Diffpure have the smallest view at source ↗

**Figure 2.** Figure 2: Detector robustness under post-processing. ROC view at source ↗

read the original abstract

Current watermark removal methods are evaluated on two axes: attack success rate and perceptual quality. We show this is insufficient. While state-of-the-art attacks successfully degrade the watermark signal without visible distortion, they leave distinct statistical artifacts that betray the removal attempt. We name this overlooked axis Watermark Removal Detection (WRD) and demonstrate that a modern classifier trained on these artifacts achieves state-of-the-art detection rates at $10^{-3}$ FPR across every removal method tested. No existing attack accounts for this forensic leakage. We benchmark leading watermarking schemes against standard removal pipelines under the extended evaluation triple of attack success, perceptual quality, and forensic detectability, and find that no current method balances all three. Our results establish forensic stealthiness as a necessary requirement for watermark removal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a real forensic leakage in watermark removal but the evidence that it's inherent rather than method-specific is still thin.

read the letter

The main point is that removal attacks which succeed on the usual metrics still leave statistical patterns a classifier can pick up at 10^{-3} FPR. The authors name this Watermark Removal Detection and show that none of the tested pipelines manage to clear the watermark, keep perceptual quality, and avoid this new trace at once. That triple-axis framing is the concrete addition here, and running the same detector across multiple removal methods gives a practical demonstration that the leakage shows up in current practice. The benchmarking against standard watermarking schemes is straightforward and useful for anyone who evaluates these systems. The soft spot is the jump from “these methods leave detectable artifacts” to “forensic stealthiness is now a necessary requirement for any removal attack.” The stress-test concern holds: the paper does not appear to include leave-one-method-out checks, a theoretical argument that removal must alter higher-order moments, or controls that separate the removal step from shared post-processing or architecture choices. Without that, the classifier could simply be learning signatures of the particular GANs, diffusion models, or pipelines that were tested rather than a general forensic cost. The abstract and results sections do not supply enough detail on dataset construction or confounding factors to rule this out. This work is aimed at people doing media forensics or building robust watermarking; the observation is worth knowing even if the generality claim needs tightening. It should go to peer review so the experiments can be stress-tested on that point.

Referee Report

2 major / 2 minor

Summary. The paper claims that watermark removal methods leave detectable statistical artifacts beyond attack success rate and perceptual quality. It introduces Watermark Removal Detection (WRD) as a third evaluation axis and reports that a modern classifier trained on these artifacts achieves state-of-the-art detection at 10^{-3} FPR across all tested removal methods. Benchmarking shows no existing combination of watermarking scheme and removal pipeline balances the three axes, establishing forensic stealthiness as a necessary requirement for removal attacks.

Significance. If the results generalize, the work adds a practically important forensic dimension to watermark security evaluation. The empirical demonstration that classifiers can exploit removal-induced artifacts at low FPR provides a concrete, falsifiable benchmark that could drive more robust designs for both embedding and removal. The absence of free parameters or circular fitting in the core claim is a strength.

major comments (2)

[Experiments section (results on detection rates)] The central claim that the artifacts are inherent to watermark removal (rather than specific to the tested pipelines) is load-bearing for the conclusion that 'no current method balances all three' and that forensic stealthiness is necessary. The experiments section reports detection 'across every removal method tested' but does not include leave-one-method-out evaluation, ablation isolating the removal operator from shared post-processing or architecture choices, or a derivation showing why removal must alter detectable higher-order statistics. Without these, the classifier may be learning implementation signatures rather than a general forensic trace.
[Benchmarking results (triple-axis evaluation)] Table reporting the triple-axis benchmark (attack success, perceptual quality, WRD) concludes that no method balances all three, yet lacks details on dataset sizes, number of trials, statistical significance tests, or controls for confounding factors such as content distribution or training procedure overlap. This undermines the strength of the 'no current method' claim.

minor comments (2)

[Methods] Clarify the exact architecture and training procedure of the WRD classifier (e.g., backbone, loss, data augmentation) in the methods section to aid reproducibility.
[Abstract and Results] The abstract states 'state-of-the-art detection rates' without naming the prior detectors being compared; add explicit baseline references in the results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The points raised regarding the generality of the artifacts and the completeness of the benchmarking details are important for strengthening the manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: The central claim that the artifacts are inherent to watermark removal (rather than specific to the tested pipelines) is load-bearing for the conclusion that 'no current method balances all three' and that forensic stealthiness is necessary. The experiments section reports detection 'across every removal method tested' but does not include leave-one-method-out evaluation, ablation isolating the removal operator from shared post-processing or architecture choices, or a derivation showing why removal must alter detectable higher-order statistics. Without these, the classifier may be learning implementation signatures rather than a general forensic trace.

Authors: We agree that demonstrating the generality of the artifacts is crucial. To address this, we will add a leave-one-method-out evaluation in the revised experiments section, training the classifier on subsets excluding one removal method at a time and reporting detection performance on the held-out method. This will help show that the classifier learns general traces rather than method-specific signatures. Additionally, we will include ablations that isolate the removal operator by controlling for post-processing steps and architecture choices where possible. Regarding a formal derivation, our paper focuses on empirical demonstration; we provide discussion on how watermark removal inherently disrupts statistical properties of the image (e.g., by introducing inconsistencies in higher-order moments due to the optimization or generative processes), but a rigorous mathematical proof is not included and would constitute significant additional theoretical work. We will clarify the empirical nature of our claims in the text to avoid overstatement. revision: partial
Referee: Table reporting the triple-axis benchmark (attack success, perceptual quality, WRD) concludes that no method balances all three, yet lacks details on dataset sizes, number of trials, statistical significance tests, or controls for confounding factors such as content distribution or training procedure overlap. This undermines the strength of the 'no current method' claim.

Authors: We acknowledge the need for more rigorous reporting in the benchmarking results. In the revised manuscript, we will provide full details on the dataset sizes used for each experiment, the number of independent trials (including random seeds for classifier training and evaluation), results of statistical significance tests (such as t-tests or bootstrap confidence intervals for the reported detection rates at 10^{-3} FPR), and explicit controls for confounding factors, including ensuring disjoint content distributions between training and test sets and avoiding overlap in training procedures. These details will be added to the experimental setup description and the caption of the relevant table. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical study self-contained

full rationale

The paper's core contribution is an empirical demonstration: a classifier trained on observed statistical artifacts from tested watermark removal pipelines achieves high detection rates at low FPR. No equations, derivations, or predictions reduce by construction to fitted parameters or self-definitions. The evaluation triple (attack success, perceptual quality, forensic detectability) is defined externally via standard metrics and cross-method testing rather than circularly. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The work remains within observable experimental results without renaming known patterns or smuggling assumptions via prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities are described in the provided text.

axioms (1)

domain assumption Watermark removal methods produce distinct statistical artifacts in image data
This premise underpins the claim that a classifier can detect removal attempts.

pith-pipeline@v0.9.0 · 5418 in / 1183 out tokens · 68059 ms · 2026-05-07T17:00:26.613793+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 14 canonical work pages

[1]

Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, and Furong Huang. 2024. WAVES: benchmarking the robustness of image water- marks. InProceedings of the 41st International Conference on Machine Learning (ICML’24)

2024
[2]

Tu Bui, Shruti Agarwal, and John Collomosse. 2025. TrustMark: Robust Wa- termarking and Watermark Removal for Arbitrary Resolution Images. InIEEE International Conference on Computer Vision (ICCV)

2025
[3]

Nicholas Carlini and David Wagner. 2017. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. InProceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 3–14

2017
[4]

Wes Castro and Zeki Yalniz. 2025. Video Invisible Watermarking at Scale. En- gineering at Meta. https://engineering.fb.com/2025/11/04/video-engineering/ video-invisible-watermarking-at-scale/ Accessed: 2026-02-20

2025
[5]

Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On The Detection of Synthetic Images Generated by Diffusion Models. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

2023
[6]

Cox, Matthew L

Ingemar J. Cox, Matthew L. Miller, Jeffrey A. Bloom, Jessica Fridrich, and Ton Kalker. 2008. Chapter 2 - Applications and Properties. InDigital Watermark- ing and Steganography (Second Edition)(second edition ed.), Ingemar J. Cox, Matthew L. Miller, Jeffrey A. Bloom, Jessica Fridrich, and Ton Kalker (Eds.). Morgan Kaufmann, Burlington, 15–59. doi:10.1016/...

work page doi:10.1016/b978-012372585-1.50005-x 2008
[7]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition. IEEE, 248–255

2009
[8]

European Parliament and Council of the European Union. 2024. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://eur-lex.europa.eu/eli/reg/2024/1689/oj

2024
[9]

Gautier Evennou, Vivien Chappelier, and Ewa Kijak. 2025. Fast, Secure, and High-Capacity Image Watermarking with Autoencoded Text Vectors. arXiv:2510.00799 [cs.CR] https://arxiv.org/abs/2510.00799

work page arXiv 2025
[10]

Gautier Evennou, Vivien Chappelier, Ewa Kijak, and Teddy Furon. 2024. SWIFT: Semantic Watermarking for Image Forgery Thwarting. InProceedings of the IEEE International Workshop on Information Forensics and Security (WIFS)

2024
[11]

Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, and Teddy Furon. 2023. The Stable Signature: Rooting Watermarks in Latent Diffusion Models. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE. doi:10.1109/ICCV51070.2023.02053

work page doi:10.1109/iccv51070.2023.02053 2023
[12]

Zeki Yalniz, and Alexandre Mourachko

Pierre Fernandez, Hady Elsahar, I. Zeki Yalniz, and Alexandre Mourachko
[13]

Video Seal: Open and Efficient Video Watermarking.arXiv preprint arXiv:2412.09492(2024)

work page arXiv 2024
[14]

Teddy Furon and Patrick Bas. [n. d.]. Broken Arrows.EURASIP Journal on Information Security2008 ([n. d.]), ID 597040
[15]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. InAdvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA

2017
[16]

Zhaoyang Jia, Han Fang, and Weiming Zhang. 2021. MBRS: Enhancing Robust- ness of DNN-based Watermarking by Mini-Batch of Real and Simulated JPEG Compression. 41–49. doi:10.1145/3474085.3475324

work page doi:10.1145/3474085.3475324 2021
[17]

Black Forest Labs. 2024. FLUX. https://github.com/black-forest-labs/flux

2024
[18]

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8693 LNCS, 740–755

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InComputer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 8693), David J. Fl...

work page doi:10.1007/978-3-319-10602-1_48 2014
[19]

Yepeng Liu, Yiren Song, Hai Ci, Yu Zhang, Haofan Wang, Mike Zheng Shou, and Yuheng Bu. 2024. Image watermarks are removable using controllable regeneration from clean noise.arXiv preprint arXiv:2410.05470(2024)

work page arXiv 2024
[20]

Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. 2025. Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances. InThe Thirteenth International Conference on Learn- ing Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=16O8GCm8Wn

2025
[21]

Rui Ma, Mengxi Guo, Yi Hou, Fan Yang, Yuan Li, Huizhu Jia, and Xiaodong Xie
[22]

InProceedings of the 30th ACM International Conference on Multi- media

Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms. InProceedings of the 30th ACM International Conference on Multi- media. 1532–1542
[23]

Thibault Maho, Teddy Furon, and Erwan Le Merrer. 2021. SurFree: A Fast Surrogate-Free Black-Box Attack. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foun- dation / IEEE, 10430–10439. doi:10.1109/CVPR46437.2021.01029

work page doi:10.1109/cvpr46437.2021.01029 2021
[24]

Federico Nesti, Alessandro Biondi, and Giorgio Buttazzo. 2023. Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting.IEEE Transactions on Neural Networks and Learning Systems(2023). doi:10.1109/TNNLS.2021.3105238

work page doi:10.1109/tnnls.2021.3105238 2023
[25]

Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. 2022. Diffusion Models for Adversarial Purification. InInternational Conference on Machine Learning (ICML)

2022
[26]

Aleksandar Petrov, Pierre Fernandez, Tomáš Souček, and Hady Elsahar. 2025. We Can Hide More Bits: The Unused Watermarking Capacity in Theory and in Practice. arXiv:2510.12812 [cs.CR] https://arxiv.org/abs/2510.12812

work page arXiv 2025
[27]

Jonas Ricker, Denis Lukovnikov, and Asja Fischer. 2024. AEROBLADE: Training- Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. doi:10.1109/CVPR52733. 2024.00872

work page doi:10.1109/cvpr52733 2024
[28]

Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, and Alexandre Mourachko. 2025. Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models. InAdvances in Neural Information Processing Systems

2025
[29]

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. 2023. DIRE for Diffusion-Generated Image Detection. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 22388–22398. doi:10.1109/ICCV51070.2023.02051

work page doi:10.1109/iccv51070.2023.02051 2023
[30]

Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. 2023. Tree- Rings Watermarks: Invisible Fingerprints for Diffusion Images. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 58047–58063. https://proceedings.neurips.cc/paper_fil...

2023
[31]

Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. 2023. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24,

2023
[32]

In: 2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pp

IEEE, 16133–16142. doi:10.1109/CVPR52729.2023.01548

work page doi:10.1109/cvpr52729.2023.01548 2023
[33]

Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, and Song Han. 2025. SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025

2025
[34]

Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018

2018
[35]

Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak

Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. 2024. Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models. InForty-first International Conference on Machine Learning

2024
[36]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. [n. d.]. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. The Forensic Cost of Watermark Removal Conference 2026, April 1, 2026, Placeholder

2018
[37]

Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. 2018. HiDDeN: Hiding Data With Deep Networks. InComputer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV(Munich, Germany). Springer-Verlag, Berlin, Heidelberg, 682–697. doi:10.1007/978-3-030- 01267-0_40

work page doi:10.1007/978-3-030- 2018