Do Modern Post-Hoc Watermarking Methods Beat Broken-Arrows?
Pith reviewed 2026-06-29 17:27 UTC · model grok-4.3
The pith
Classic watermarking methods provide superior security compared to modern neural-network post-hoc schemes while maintaining robustness against image transformations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through direct experimental comparison, classic watermarking outperforms modern techniques in terms of security while maintaining robustness in a realistic scenario.
What carries the argument
A fair comparison protocol applying classic augmentations and recent sophisticated attacks to measure robustness and security metrics across watermarking schemes.
If this is right
- Security should be prioritized over extremely low false-alarm rates in watermarking design for AI images.
- Classic methods remain competitive or superior for deployment where attack resistance matters.
- Modern neural methods require further development to match or exceed classic security levels.
- Evaluation of watermarking must include sophisticated attacks beyond basic transformations.
Where Pith is reading between the lines
- If the attack set is representative, resources might be better spent refining classic methods rather than developing new neural ones.
- Future work could test these methods against additional real-world threats like model-specific attacks not covered here.
- Combining elements from both classic and modern approaches might yield improved overall performance.
Load-bearing premise
The specific augmentations and sophisticated attacks selected in the experiments represent the main threats that matter for real-world watermarking deployment.
What would settle it
An experiment showing modern neural watermarking methods resisting a new set of attacks that successfully defeat the classic methods, or vice versa in a different attack suite.
Figures
read the original abstract
With the rapid proliferation of generative models, such as diffusion models, digital watermarking has emerged as a crucial solution for identifying AI-generated images. Modern post-hoc watermarking schemes use neural networks to achieve an extremely low false-alarm rate while remaining robust to common image transformations. However, there is a lack of comparison between these modern methods and classic ones, particularly in real-world scenarios where robustness and security take precedence over achieving an extremely low false-alarm probability. In this paper, we propose a fair comparison of robustness and security between modern and classic post-hoc watermarking across various types of classic augmentations and recent sophisticated attacks. Our experiments show that, in a realistic scenario, classic watermarking outperforms modern techniques in terms of security while maintaining robustness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares modern neural-network-based post-hoc watermarking schemes against classic methods (e.g., Broken Arrows) for detecting AI-generated images. It asserts that, under a realistic threat model consisting of standard image augmentations and recent sophisticated attacks, classic watermarking achieves superior security while preserving comparable robustness.
Significance. If the empirical ranking is shown to be stable under a well-justified threat model, the result would temper enthusiasm for neural post-hoc detectors and indicate that simpler, non-learned schemes may remain preferable when security against removal or forgery is the primary requirement.
major comments (2)
- [§4–5] §4–5 (Experimental methodology and results): The manuscript states that it performs “a fair comparison … across various types of classic augmentations and recent sophisticated attacks” but supplies neither an explicit threat model nor an argument that the chosen attack set is representative of deployment-relevant adversaries. No sensitivity analysis or adaptive-attack results are reported to demonstrate that the security ranking is stable when the attacker is allowed to target the neural detector directly.
- [Table 2 / Figure 3] Table 2 / Figure 3 (security metrics): The abstract and results claim “outperforms … in terms of security” yet the provided text contains no numerical values for false-positive rates under attack, bit-error rates after removal attempts, or statistical significance tests. Without these quantities the central security claim cannot be verified.
minor comments (2)
- [Abstract] Abstract: The claim of experimental superiority is stated without any quantitative metrics, dataset sizes, or attack descriptions; adding one or two key numbers would make the abstract self-contained.
- [§2] Notation: The distinction between “robustness” (survival under benign transformations) and “security” (resistance to adversarial removal) is used throughout but never formally defined; a short paragraph in §2 would eliminate ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and outline the revisions we will make to strengthen the presentation of the threat model and security metrics.
read point-by-point responses
-
Referee: [§4–5] §4–5 (Experimental methodology and results): The manuscript states that it performs “a fair comparison … across various types of classic augmentations and recent sophisticated attacks” but supplies neither an explicit threat model nor an argument that the chosen attack set is representative of deployment-relevant adversaries. No sensitivity analysis or adaptive-attack results are reported to demonstrate that the security ranking is stable when the attacker is allowed to target the neural detector directly.
Authors: We agree that a dedicated threat-model subsection would improve clarity and verifiability. In the revised manuscript we will insert a new subsection at the beginning of §4 that (i) formally states the adversary’s goals, knowledge, and capabilities, (ii) justifies the selected augmentations and sophisticated attacks as representative of realistic deployment adversaries, and (iii) reports a sensitivity analysis over the principal attack parameters. We note that our existing attack suite already includes recent non-adaptive sophisticated methods; however, we will also add a short discussion of the computational cost and practical difficulty of fully adaptive attacks against each detector and, if space permits, include a limited set of adaptive-attack results. revision: partial
-
Referee: [Table 2 / Figure 3] Table 2 / Figure 3 (security metrics): The abstract and results claim “outperforms … in terms of security” yet the provided text contains no numerical values for false-positive rates under attack, bit-error rates after removal attempts, or statistical significance tests. Without these quantities the central security claim cannot be verified.
Authors: The quantitative results are presented in Table 2 and Figure 3. To make the security claims self-contained in the narrative, we will revise §§5–6 to explicitly quote the key numerical values (false-positive rates under each attack, bit-error rates after removal, and any other security metrics) and will add the results of statistical significance tests (e.g., paired t-tests or Wilcoxon tests with p-values) comparing the classic and neural methods. revision: yes
Circularity Check
No circularity: purely empirical comparison with no derivations
full rationale
The paper is an experimental study that performs direct comparisons of robustness and security metrics between classic and modern watermarking methods under chosen augmentations and attacks. No equations, parameter fittings, uniqueness theorems, or ansatzes are presented; the central claim follows from tabulated experimental outcomes rather than any derivation chain. The choice of attack suite is an experimental design decision (subject to external validity critique) but does not constitute a self-definitional, fitted-input, or self-citation reduction. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2016.Watermarking Security
Patrick Bas, Teddy Furon, François Cayre, Gwenaël Doërr, and Benjamin Mathon. 2016.Watermarking Security. Springer Singapore. 10 20 30 40 50 60 70 PSNR (dB) 0.0 0.2 0.4 0.6 0.8 1.0 ASR Videoseal JPEG WmForger CGBA DDNAttack Purification V AE WMInTheSand 10 20 30 40 50 60 70 PSNR (dB) 0.0 0.2 0.4 0.6 0.8 1.0 ASR TrustMark JPEG WmForger CGBA DDNAttack Purific...
2016
-
[2]
Tu Bui, Shruti Agarwal, and John Collomosse. 2025. TrustMark: Robust Water- marking and Watermark Removal for Arbitrary Resolution Images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 18629–18639
2025
-
[3]
C2PA. 2024. C2PA: The Coalition for Content Provenance and Authenticity. https://c2pa.org
2024
-
[4]
François Cayre, Caroline Fontaine, and Teddy Furon. 2005. Watermarking security part I: theory, Vol. 5681. SPIE, 746. https://inria.hal.science/inria-00083329
2005
-
[5]
Vivien Chappelier, Mathieu Desoubeaux, and Jonathan Delhumeau. 2018. Pro- cede d’enregistrement d’un contenu multimedia, procede de detection d’une marque au sein d’un contenu multimedia, dispositifs et programme d’ordinateurs correspondants
2018
-
[6]
China. 2023. Chinese AI Governance Rules. http://www.cac.gov.cn/2023-07/13/ c_1690898327029107.htm
2023
-
[7]
Chun-Hsien Chou and Yun-Chin Li. 1995. A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile. 5, 6 (1995), 467–476. https://ieeexplore.ieee.org/document/475889
1995
-
[8]
Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On the detection of synthetic images generated by diffusion models. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5
2023
- [9]
-
[10]
Ingemar J. Cox. 2008.Digital watermarking and steganography(2nd ed ed.). Morgan Kaufmann Publishers, Amsterdam Boston
2008
-
[11]
Europe. 2023. European AI Act. https://artificialintelligenceact.eu/. IH&MMSec ’26, June 17–19, 2026, Firenze, Italy Gesny et al
2023
-
[12]
Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, and Teddy Furon. 2023. The Stable Signature: Rooting Watermarks in Latent Diffusion Models.ICCV(2023)
2023
-
[13]
Zeki Yalniz, and Alexandre Mourachko
Pierre Fernandez, Hady Elsahar, I. Zeki Yalniz, and Alexandre Mourachko. 2024. Video Seal: Open and Efficient Video Watermarking
2024
-
[14]
Pierre Fernandez, Tomáš Souček, Nikola Jovanović, Hady Elsahar, Sylvestre- Alvise Rebuffi, Valeriu Lacatusu, Tuan Tran, and Alexandre Mourachko. 2025. Geometric Image Synchronization with Deep Watermarking
2025
-
[15]
Teddy Furon and Patrick Bas. 2012. A New Measure of Watermarking Security Applied on DC-DM QIM. InInformation Hiding(Berkeley, United States, 2012-05). TBA. https://hal.science/hal-00702689
2012
-
[16]
Teddy Furon and Patrick Bas. 2008. Broken Arrows.EURASIP Journal on Infor- mation Security2008 (Oct. 2008), ID 597040. https://hal.science/hal-00335311
2008
- [17]
-
[18]
Enoal Gesny, Eva Giboulot, Teddy Furon, and Vivien Chappelier. 2026. Guidance Watermarking for Diffusion Models. InThe Fourteenth International Conference on Learning Representations. https://openreview.net/forum?id=5ifzhjMCKq
2026
-
[19]
Explaining and Harnessing Adversarial Examples
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. arXiv:1412.6572 [stat] http://arxiv.org/abs/ 1412.6572
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[20]
Sven Gowal, Rudy Bunel, Florian Stimberg, David Stutz, Guillermo Ortiz-Jimenez, Christina Kouridi, Mel Vecerik, Jamie Hayes, Sylvestre-Alvise Rebuffi, Paul Bernard, Chris Gamble, Miklós Z. Horváth, Fabian Kaczmarczyck, Alex Kaskasoli, Aleksandar Petrov, Ilia Shumailov, Meghana Thotakuri, Olivia Wiles, Jessica Yung, Zahra Ahmed, Victor Martin, Simon Rosen,...
-
[21]
Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval(Vancouver, British Columbia, Canada)(MIR ’08). Association for Com- puting Machinery, New York, NY, USA, 39–43. https://doi.org/10.1145/1460096. 1460104
-
[22]
Chloé Imadache, Eva Giboulot, and Teddy Furon. 2025. Evaluating the security of public surrogate watermark detectors. InICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2025-04). https: //ieeexplore.ieee.org/document/10889821/ ISSN: 2379-190X
-
[23]
T. Kalker. 2001. Considerations on watermarking security. In2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564)(2001-10). 201–206. https://ieeexplore.ieee.org/document/962734
2001
- [24]
-
[25]
Neri Merhav and Erez Sabbag. 2006. Optimal Watermark Embedding and Detec- tion Strategies Under Limited Detection Resources. In2006 IEEE International Symposium on Information Theory(2006-07). 173–177. arXiv:0705.1919 [cs] http://arxiv.org/abs/0705.1919
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[26]
Miller, I.J
M.L. Miller, I.J. Cox, and J.A. Bloom. 2000. Informed embedding: exploiting image and detector information during watermark insertion. InProceedings 2000 International Conference on Image Processing (Cat. No.00CH37101)(Vancouver, BC, Canada, 2000). IEEE. http://ieeexplore.ieee.org/document/899260/
2000
-
[27]
Miller and Jeffrey A
Matt L. Miller and Jeffrey A. Bloom. 2000. Computing the Probability of False Watermark Detection. InInformation Hiding(Berlin, Heidelberg, 2000), Andreas Pfitzmann (Ed.). Springer, 146–158
2000
-
[28]
Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. 2022. Diffusion Models for Adversarial Purification. InInternational Conference on Machine Learning (ICML)
2022
-
[29]
Stéphane Pateux and Gaëtan Le Guelvouit. 2003. Practical watermarking scheme based on wide spread spectrum and game theory. 18, 4 (2003), 283–296. https: //www.sciencedirect.com/science/article/pii/S0923596502001455
2003
-
[30]
Aleksandar Petrov, Pierre Fernandez, Tomas Soucek, and Hady Elsahar. 2026. We Can Hide More Bits: The Unused Watermarking Capacity in Theory and in Practice. https://openreview.net/forum?id=Ry8jLSYIUG
2026
-
[31]
Md Farhamdur Reza, Ali Rahmati, Tianfu Wu, and Huaiyu Dai. 2023. CGBA: Curvature-aware Geometric Black-box Attack. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 124–133
2023
-
[32]
Geoffrey B. Rhoads. 2010. Detecting embedded signals in media content using coincidence metrics
2010
-
[33]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684–10695
2022
-
[34]
Jérôme Rony, Luiz Gustavo Hafemann, Luiz Soares de Oliveira, Ismail Ben Ayed, and Eric Granger. 2019. Decoupling Direction and Norm for Efficient Gradient- Based L2 Adversarial Attacks and Defenses. 4317–4325. Algorithm 1Watermark In The Sand (WIS) Attack Require:Imagex, VAE(E,D), OracleO, Threshold𝛽 Ensure:Adversarial Imagex 𝐴 1:x 0 ←x 2:𝑖←0 // Phase 1: ...
2019
-
[35]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2022. Denoising Diffusion Implicit Models. arXiv:2010.02502 [cs.LG] https://arxiv.org/abs/2010.02502
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[36]
Tran, and Alexandre Mourachko
Tomas Soucek, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan A. Tran, and Alexandre Mourachko. 2025. Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=yb5JOOmfxA
2025
-
[37]
USA. 2023. Ensuring Safe, Secure, and Trustworthy AI. https: //www.whitehouse.gov/wp-content/uploads/2023/07/Ensuring-Safe-Secure- and-Trustworthy-AI.pdf
2023
-
[38]
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, and Song Han. 2024. Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer. arXiv:2410.10629 [cs.CV] https://arxiv.org/abs/2410.10629
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weiming Zhang, and Nenghai Yu
- [40]
-
[41]
Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak
Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. 2024. Watermarks in the sand: impossibility of strong watermarking for language models. InProceedings of the 41st International Con- ference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2429
2024
-
[42]
Efros, Eli Shechtman, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang
-
[43]
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. arXiv:1801.03924 [cs] http://arxiv.org/abs/1801.03924
work page internal anchor Pith review Pith/arXiv arXiv
-
[44]
Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. 2018. HiDDeN: Hiding Data With Deep Networks. InComputer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Vol. 11219. Springer International Publishing, Cham, 682–697. https://link.springer.com/10.1007/978- 3-030-01267-0_40 Series Title: Lecture N...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.