pith. machine review for the scientific record. sign in

arxiv: 2604.25491 · v1 · submitted 2026-04-28 · 💻 cs.CV · cs.AI

The Forensic Cost of Watermark Removal

Pith reviewed 2026-05-07 17:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords watermark removalforensic detectionstatistical artifactsimage watermarkingadversarial removalcontent authenticationmachine learning forensics
0
0 comments X

The pith

Watermark removal methods leave statistical artifacts that a classifier can detect at a false positive rate of one in a thousand.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Evaluations of watermark removal have focused only on whether the watermark disappears and whether the image still looks natural. The paper demonstrates that removal operations also imprint consistent statistical patterns on the output images. A modern classifier trained on these patterns identifies the removal attempt across all tested methods while keeping false alarms low. Because the patterns appear regardless of the specific removal pipeline, the authors conclude that forensic detectability must become a third required dimension of evaluation alongside success rate and visual quality.

Core claim

Every standard watermark removal pipeline produces distinct statistical artifacts in the resulting images. A classifier trained on those artifacts reaches state-of-the-art detection performance at a false-positive rate of 10^{-3} for every removal method examined. No existing attack incorporates countermeasures against this leakage. When leading watermarking schemes are measured under the combined criteria of attack success, perceptual quality, and forensic detectability, none satisfies all three simultaneously. The work therefore establishes forensic stealthiness as an essential property any removal attack must possess.

What carries the argument

Statistical artifacts generated by the removal process itself, which serve as training features for a binary classifier that flags removal attempts.

Load-bearing premise

The observed statistical artifacts are produced by the removal process in general rather than by the particular implementations, datasets, or training procedures used in the experiments.

What would settle it

Construction of a single removal method that achieves high attack success, high perceptual quality, and detection rates no better than random by classifiers trained on the reported artifacts would show the artifacts are not inherent.

Figures

Figures reproduced from arXiv: 2604.25491 by Ewa Kijak, Gautier Evennou.

Figure 1
Figure 1. Figure 1: Watermark removal attacks samples, with residuals and Fourier spectrum. WMForger and Diffpure have the smallest view at source ↗
Figure 2
Figure 2. Figure 2: Detector robustness under post-processing. ROC view at source ↗
read the original abstract

Current watermark removal methods are evaluated on two axes: attack success rate and perceptual quality. We show this is insufficient. While state-of-the-art attacks successfully degrade the watermark signal without visible distortion, they leave distinct statistical artifacts that betray the removal attempt. We name this overlooked axis Watermark Removal Detection (WRD) and demonstrate that a modern classifier trained on these artifacts achieves state-of-the-art detection rates at $10^{-3}$ FPR across every removal method tested. No existing attack accounts for this forensic leakage. We benchmark leading watermarking schemes against standard removal pipelines under the extended evaluation triple of attack success, perceptual quality, and forensic detectability, and find that no current method balances all three. Our results establish forensic stealthiness as a necessary requirement for watermark removal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that watermark removal methods leave detectable statistical artifacts beyond attack success rate and perceptual quality. It introduces Watermark Removal Detection (WRD) as a third evaluation axis and reports that a modern classifier trained on these artifacts achieves state-of-the-art detection at 10^{-3} FPR across all tested removal methods. Benchmarking shows no existing combination of watermarking scheme and removal pipeline balances the three axes, establishing forensic stealthiness as a necessary requirement for removal attacks.

Significance. If the results generalize, the work adds a practically important forensic dimension to watermark security evaluation. The empirical demonstration that classifiers can exploit removal-induced artifacts at low FPR provides a concrete, falsifiable benchmark that could drive more robust designs for both embedding and removal. The absence of free parameters or circular fitting in the core claim is a strength.

major comments (2)
  1. [Experiments section (results on detection rates)] The central claim that the artifacts are inherent to watermark removal (rather than specific to the tested pipelines) is load-bearing for the conclusion that 'no current method balances all three' and that forensic stealthiness is necessary. The experiments section reports detection 'across every removal method tested' but does not include leave-one-method-out evaluation, ablation isolating the removal operator from shared post-processing or architecture choices, or a derivation showing why removal must alter detectable higher-order statistics. Without these, the classifier may be learning implementation signatures rather than a general forensic trace.
  2. [Benchmarking results (triple-axis evaluation)] Table reporting the triple-axis benchmark (attack success, perceptual quality, WRD) concludes that no method balances all three, yet lacks details on dataset sizes, number of trials, statistical significance tests, or controls for confounding factors such as content distribution or training procedure overlap. This undermines the strength of the 'no current method' claim.
minor comments (2)
  1. [Methods] Clarify the exact architecture and training procedure of the WRD classifier (e.g., backbone, loss, data augmentation) in the methods section to aid reproducibility.
  2. [Abstract and Results] The abstract states 'state-of-the-art detection rates' without naming the prior detectors being compared; add explicit baseline references in the results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The points raised regarding the generality of the artifacts and the completeness of the benchmarking details are important for strengthening the manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: The central claim that the artifacts are inherent to watermark removal (rather than specific to the tested pipelines) is load-bearing for the conclusion that 'no current method balances all three' and that forensic stealthiness is necessary. The experiments section reports detection 'across every removal method tested' but does not include leave-one-method-out evaluation, ablation isolating the removal operator from shared post-processing or architecture choices, or a derivation showing why removal must alter detectable higher-order statistics. Without these, the classifier may be learning implementation signatures rather than a general forensic trace.

    Authors: We agree that demonstrating the generality of the artifacts is crucial. To address this, we will add a leave-one-method-out evaluation in the revised experiments section, training the classifier on subsets excluding one removal method at a time and reporting detection performance on the held-out method. This will help show that the classifier learns general traces rather than method-specific signatures. Additionally, we will include ablations that isolate the removal operator by controlling for post-processing steps and architecture choices where possible. Regarding a formal derivation, our paper focuses on empirical demonstration; we provide discussion on how watermark removal inherently disrupts statistical properties of the image (e.g., by introducing inconsistencies in higher-order moments due to the optimization or generative processes), but a rigorous mathematical proof is not included and would constitute significant additional theoretical work. We will clarify the empirical nature of our claims in the text to avoid overstatement. revision: partial

  2. Referee: Table reporting the triple-axis benchmark (attack success, perceptual quality, WRD) concludes that no method balances all three, yet lacks details on dataset sizes, number of trials, statistical significance tests, or controls for confounding factors such as content distribution or training procedure overlap. This undermines the strength of the 'no current method' claim.

    Authors: We acknowledge the need for more rigorous reporting in the benchmarking results. In the revised manuscript, we will provide full details on the dataset sizes used for each experiment, the number of independent trials (including random seeds for classifier training and evaluation), results of statistical significance tests (such as t-tests or bootstrap confidence intervals for the reported detection rates at 10^{-3} FPR), and explicit controls for confounding factors, including ensuring disjoint content distributions between training and test sets and avoiding overlap in training procedures. These details will be added to the experimental setup description and the caption of the relevant table. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical study self-contained

full rationale

The paper's core contribution is an empirical demonstration: a classifier trained on observed statistical artifacts from tested watermark removal pipelines achieves high detection rates at low FPR. No equations, derivations, or predictions reduce by construction to fitted parameters or self-definitions. The evaluation triple (attack success, perceptual quality, forensic detectability) is defined externally via standard metrics and cross-method testing rather than circularly. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The work remains within observable experimental results without renaming known patterns or smuggling assumptions via prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities are described in the provided text.

axioms (1)
  • domain assumption Watermark removal methods produce distinct statistical artifacts in image data
    This premise underpins the claim that a classifier can detect removal attempts.

pith-pipeline@v0.9.0 · 5418 in / 1183 out tokens · 68059 ms · 2026-05-07T17:00:26.613793+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 14 canonical work pages

  1. [1]

    Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, and Furong Huang. 2024. WAVES: benchmarking the robustness of image water- marks. InProceedings of the 41st International Conference on Machine Learning (ICML’24)

  2. [2]

    Tu Bui, Shruti Agarwal, and John Collomosse. 2025. TrustMark: Robust Wa- termarking and Watermark Removal for Arbitrary Resolution Images. InIEEE International Conference on Computer Vision (ICCV)

  3. [3]

    Nicholas Carlini and David Wagner. 2017. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. InProceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 3–14

  4. [4]

    Wes Castro and Zeki Yalniz. 2025. Video Invisible Watermarking at Scale. En- gineering at Meta. https://engineering.fb.com/2025/11/04/video-engineering/ video-invisible-watermarking-at-scale/ Accessed: 2026-02-20

  5. [5]

    Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On The Detection of Synthetic Images Generated by Diffusion Models. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  6. [6]

    Cox, Matthew L

    Ingemar J. Cox, Matthew L. Miller, Jeffrey A. Bloom, Jessica Fridrich, and Ton Kalker. 2008. Chapter 2 - Applications and Properties. InDigital Watermark- ing and Steganography (Second Edition)(second edition ed.), Ingemar J. Cox, Matthew L. Miller, Jeffrey A. Bloom, Jessica Fridrich, and Ton Kalker (Eds.). Morgan Kaufmann, Burlington, 15–59. doi:10.1016/...

  7. [7]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition. IEEE, 248–255

  8. [8]

    European Parliament and Council of the European Union. 2024. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://eur-lex.europa.eu/eli/reg/2024/1689/oj

  9. [9]

    Gautier Evennou, Vivien Chappelier, and Ewa Kijak. 2025. Fast, Secure, and High-Capacity Image Watermarking with Autoencoded Text Vectors. arXiv:2510.00799 [cs.CR] https://arxiv.org/abs/2510.00799

  10. [10]

    Gautier Evennou, Vivien Chappelier, Ewa Kijak, and Teddy Furon. 2024. SWIFT: Semantic Watermarking for Image Forgery Thwarting. InProceedings of the IEEE International Workshop on Information Forensics and Security (WIFS)

  11. [11]

    Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, and Teddy Furon. 2023. The Stable Signature: Rooting Watermarks in Latent Diffusion Models. InIEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE. doi:10.1109/ICCV51070.2023.02053

  12. [12]

    Zeki Yalniz, and Alexandre Mourachko

    Pierre Fernandez, Hady Elsahar, I. Zeki Yalniz, and Alexandre Mourachko

  13. [13]

    Video Seal: Open and Efficient Video Watermarking.arXiv preprint arXiv:2412.09492(2024)

  14. [14]

    Teddy Furon and Patrick Bas. [n. d.]. Broken Arrows.EURASIP Journal on Information Security2008 ([n. d.]), ID 597040

  15. [15]

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. InAdvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA

  16. [16]

    Zhaoyang Jia, Han Fang, and Weiming Zhang. 2021. MBRS: Enhancing Robust- ness of DNN-based Watermarking by Mini-Batch of Real and Simulated JPEG Compression. 41–49. doi:10.1145/3474085.3475324

  17. [17]

    Black Forest Labs. 2024. FLUX. https://github.com/black-forest-labs/flux

  18. [18]

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8693 LNCS, 740–755

    Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. InComputer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 8693), David J. Fl...

  19. [19]

    Yepeng Liu, Yiren Song, Hai Ci, Yu Zhang, Haofan Wang, Mike Zheng Shou, and Yuheng Bu. 2024. Image watermarks are removable using controllable regeneration from clean noise.arXiv preprint arXiv:2410.05470(2024)

  20. [20]

    Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. 2025. Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances. InThe Thirteenth International Conference on Learn- ing Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=16O8GCm8Wn

  21. [21]

    Rui Ma, Mengxi Guo, Yi Hou, Fan Yang, Yuan Li, Huizhu Jia, and Xiaodong Xie

  22. [22]

    InProceedings of the 30th ACM International Conference on Multi- media

    Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms. InProceedings of the 30th ACM International Conference on Multi- media. 1532–1542

  23. [23]

    Thibault Maho, Teddy Furon, and Erwan Le Merrer. 2021. SurFree: A Fast Surrogate-Free Black-Box Attack. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foun- dation / IEEE, 10430–10439. doi:10.1109/CVPR46437.2021.01029

  24. [24]

    Federico Nesti, Alessandro Biondi, and Giorgio Buttazzo. 2023. Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting.IEEE Transactions on Neural Networks and Learning Systems(2023). doi:10.1109/TNNLS.2021.3105238

  25. [25]

    Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. 2022. Diffusion Models for Adversarial Purification. InInternational Conference on Machine Learning (ICML)

  26. [26]

    Aleksandar Petrov, Pierre Fernandez, Tomáš Souček, and Hady Elsahar. 2025. We Can Hide More Bits: The Unused Watermarking Capacity in Theory and in Practice. arXiv:2510.12812 [cs.CR] https://arxiv.org/abs/2510.12812

  27. [27]

    Jonas Ricker, Denis Lukovnikov, and Asja Fischer. 2024. AEROBLADE: Training- Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error . In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. doi:10.1109/CVPR52733. 2024.00872

  28. [28]

    Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, and Alexandre Mourachko. 2025. Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models. InAdvances in Neural Information Processing Systems

  29. [29]

    Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. 2023. DIRE for Diffusion-Generated Image Detection. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 22388–22398. doi:10.1109/ICCV51070.2023.02051

  30. [30]

    Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. 2023. Tree- Rings Watermarks: Invisible Fingerprints for Diffusion Images. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 58047–58063. https://proceedings.neurips.cc/paper_fil...

  31. [31]

    Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. 2023. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24,

  32. [32]
  33. [33]

    Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, and Song Han. 2025. SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025

  34. [34]

    Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018

  35. [35]

    Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak

    Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. 2024. Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models. InForty-first International Conference on Machine Learning

  36. [36]

    Efros, Eli Shechtman, and Oliver Wang

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. [n. d.]. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. The Forensic Cost of Watermark Removal Conference 2026, April 1, 2026, Placeholder

  37. [37]

    Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. 2018. HiDDeN: Hiding Data With Deep Networks. InComputer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV(Munich, Germany). Springer-Verlag, Berlin, Heidelberg, 682–697. doi:10.1007/978-3-030- 01267-0_40