pith. sign in

arxiv: 2605.19511 · v1 · pith:MAZTPD6Knew · submitted 2026-05-19 · 💻 cs.CV

Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image Editing

Pith reviewed 2026-05-20 06:42 UTC · model grok-4.3

classification 💻 cs.CV
keywords watermark preservationtext-guided image editingdiffusion modelsdigital watermarkingimage manipulationgenerative editingimage provenance
0
0 comments X

The pith

Watermarked images can be text-guided edited while keeping the embedded watermark intact by adding a decoding loss during editor training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that semantic edits on watermarked images need not erase the watermark signal. It introduces SafeMark, which fine-tunes diffusion editors by adding a thresholded watermark-decoding loss to their objective so that valid edits also leave the watermark recoverable. A sympathetic reader cares because this compatibility would let generative editing pipelines retain image provenance without extra post-processing steps. The work shows the result holds across datasets, editing methods, and common distortions while edit quality stays high.

Core claim

The paper claims that adding a thresholded watermark-decoding loss directly to the training objective of a diffusion-based text-guided editor makes the final edited image preserve high bit accuracy on the original watermark. This follows from an information-theoretic bound: high bit accuracy on the output lower-bounds the mutual information the editing channel preserves between the watermark and the edited result, which in turn controls recoverability.

What carries the argument

SafeMark, the training modification that inserts a thresholded watermark-decoding loss into the diffusion editor's objective so that semantic edits automatically satisfy watermark integrity.

If this is right

  • High watermark bit accuracy is achieved across multiple datasets and text-guided editing methods.
  • Semantic edit quality and robustness to post-edit distortions remain comparable to the unmodified editor.
  • The approach requires no changes to the underlying diffusion architecture and works with any differentiable editor.
  • Trustworthy provenance becomes feasible inside generative editing pipelines without separate watermarking steps after editing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same loss-augmentation idea could be tested on non-diffusion editors such as GAN-based or flow-based image manipulators.
  • If watermark preservation generalizes to video or 3-D editing, provenance tracking could extend to those media as well.
  • Real-world user studies measuring whether people notice any quality drop would help assess practical adoption.

Load-bearing premise

Adding the thresholded watermark-decoding loss during fine-tuning will not degrade the semantic validity or quality of the text-guided edits while still guaranteeing high bit accuracy on the final output.

What would settle it

An experiment that applies SafeMark-trained editors to a held-out set of watermarked images and finds that either watermark bit accuracy falls below 90 percent on the edited outputs or the edits no longer match the guiding text prompts in semantic content.

Figures

Figures reproduced from arXiv: 2605.19511 by Jianbing Ni, Lingshuang Liu, Qi Li, Xiangman Li, Xiaodong Wu, Zelin Zhang.

Figure 1
Figure 1. Figure 1: Overview of SafeMark. Given a watermarked input image xorig and an editing prompt p, the diffusion editor produces the final edited image xedit. SafeMark enforces (i) semantic alignment to a reference edit xref = Eθ,0(xorig, p) and (ii) watermark preservation by decoding wˆ = σ(Dϕ(xedit)) and applying a thresholded loss when decoding accuracy drops below τ . 5.2 Limitations of Existing Robust Watermarking … view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of image manipulation performance with and without [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of image manipulation performance with and without [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Failure heatmap of watermark preservation for [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation study of the watermark threshold [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Additional qualitative examples of SafeMark. We report representative generated/editing results across CelebA-HiDDeN, AFHQ-Dog-HiDDeN, LSUN-Church-VINE, and LSUN-Bedroom￾VINE using DiffusionCLIP, Asyrp, and EffDiff. Columns correspond to Orig (original watermarked image), Mani (direct editing without protection), and SafeMark (editing with our protection). The value below each image is the decoded watermar… view at source ↗
Figure 7
Figure 7. Figure 7: Additional qualitative examples of SafeMark. We report representative generated/editing results across Human-Signature, Dog-Signature, Church-SleeperMark, and Bedroom-SleeperMark using DiffusionCLIP, Asyrp, and EffDiff. Columns correspond to Orig (original watermarked image), Mani (direct editing without protection), and SafeMark (editing with our protection). The value below each image is the decoded wate… view at source ↗
read the original abstract

This paper investigates a fundamental yet underexplored question: can watermarked images remain editable without compromising watermark integrity? We propose SafeMark, a framework for watermark-preserving text-guided image manipulation that explicitly integrates watermark integrity into the editing process. Specifically, SafeMark adds a thresholded watermark-decoding loss directly to the diffusion editor's training objective, fine-tuning the editor so that semantically valid edits also preserve the embedded watermark at the final output. This design admits a clean information-theoretic justification: maintaining high bit-accuracy on the edited image lower-bounds the mutual information that the editor channel preserves between watermark and edited output, the quantity that fundamentally controls watermark recoverability. SafeMark is compatible with differentiable diffusion-based editors, and requires no architectural modification. Extensive evaluations across multiple datasets, text-guided editing methods, and post-edit distortion settings demonstrate that SafeMark achieves high watermark bit accuracy across diverse editing settings while maintaining high-quality semantic edits, without sacrificing robustness to common post-edit distortions. These results demonstrate that semantic editability and watermark integrity are fundamentally compatible, enabling trustworthy image provenance in generative editing pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes SafeMark, a framework for watermark-preserving text-guided image editing. It adds a thresholded watermark-decoding loss to the training objective of differentiable diffusion-based editors, fine-tuning them so that semantically valid edits also preserve the embedded watermark. The approach is justified information-theoretically by arguing that high bit-accuracy lower-bounds the mutual information preserved by the editor channel between the watermark and the edited output. Evaluations across multiple datasets, editing methods, and post-edit distortions are reported to show high watermark bit accuracy while maintaining edit quality and robustness.

Significance. If the empirical claims hold, the work provides a practical demonstration that semantic editability and watermark integrity can be made compatible without architectural changes to existing editors. The information-theoretic framing is a conceptual strength if the derivation is made explicit and the decoder reliability post-edit is verified. This could support trustworthy provenance tracking in generative pipelines, but significance is limited by the absence of detailed quantitative results, ablations, or trade-off measurements in the provided description.

major comments (3)
  1. [§5 (Experiments)] The central claim of fundamental compatibility rests on the assumption that the thresholded watermark-decoding loss can be added without degrading semantic fidelity or prompt adherence. The manuscript should provide quantitative evidence (e.g., CLIP scores, human preference studies, or edit success rates) comparing SafeMark to the baseline editor in §5 or the corresponding experiment section; without such data the compatibility conclusion remains under-supported.
  2. [§3.2 (Method/Theory)] The information-theoretic justification states that high bit-accuracy lower-bounds mutual information. The derivation steps, including how bit-accuracy is treated as an independent measure versus a quantity fitted to the same data used for loss thresholding, should be expanded in §3.2 or the theoretical analysis section to confirm it is not circular.
  3. [§3.2 (Loss Definition)] The threshold for the watermark-decoding loss is identified as a free parameter. The paper should report sensitivity analysis or a principled selection method for this threshold and demonstrate that it does not introduce bias toward low-level patterns that conflict with prompt-driven changes.
minor comments (2)
  1. [§3] Clarify notation for the watermark decoder and the exact form of the thresholded loss to improve reproducibility.
  2. [§5] Add explicit comparison tables showing bit accuracy and edit quality metrics side-by-side for baseline and SafeMark across all tested methods and datasets.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the presentation of results and theory.

read point-by-point responses
  1. Referee: [§5 (Experiments)] The central claim of fundamental compatibility rests on the assumption that the thresholded watermark-decoding loss can be added without degrading semantic fidelity or prompt adherence. The manuscript should provide quantitative evidence (e.g., CLIP scores, human preference studies, or edit success rates) comparing SafeMark to the baseline editor in §5 or the corresponding experiment section; without such data the compatibility conclusion remains under-supported.

    Authors: We agree that explicit quantitative comparisons are needed to fully support the compatibility claim. The manuscript states that SafeMark maintains high-quality semantic edits, but direct metrics versus the unmodified baseline editor were not tabulated in §5. In the revised manuscript we will add CLIP-score comparisons, edit success rates, and (where space permits) a small human preference study in §5 to quantify that prompt adherence and semantic fidelity remain comparable to the baseline. revision: yes

  2. Referee: [§3.2 (Method/Theory)] The information-theoretic justification states that high bit-accuracy lower-bounds mutual information. The derivation steps, including how bit-accuracy is treated as an independent measure versus a quantity fitted to the same data used for loss thresholding, should be expanded in §3.2 or the theoretical analysis section to confirm it is not circular.

    Authors: We thank the referee for highlighting the need for a clearer derivation. Bit accuracy is evaluated post-editing on held-out images and serves as an empirical lower bound on preserved mutual information via standard information-theoretic inequalities. To eliminate any concern of circularity, we will expand §3.2 with the explicit steps: (i) definition of the editor as a channel, (ii) relation of bit accuracy to mutual information via Fano’s inequality, and (iii) explicit statement that the accuracy metric is computed on test data independent of the training-time threshold choice. This will make the argument non-circular and self-contained. revision: yes

  3. Referee: [§3.2 (Loss Definition)] The threshold for the watermark-decoding loss is identified as a free parameter. The paper should report sensitivity analysis or a principled selection method for this threshold and demonstrate that it does not introduce bias toward low-level patterns that conflict with prompt-driven changes.

    Authors: We acknowledge that the threshold is a hyperparameter whose sensitivity should be documented. The current manuscript selects it via preliminary validation to balance the two objectives. In the revision we will add a sensitivity plot (threshold vs. watermark bit accuracy and CLIP score) in §3.2 or the appendix, describe the validation-based selection procedure, and show that prompt adherence remains high across the operating range, thereby confirming that the threshold does not systematically favor low-level artifacts over semantic edits. revision: yes

Circularity Check

1 steps flagged

Information-theoretic justification reduces to the optimized bit-accuracy metric by construction

specific steps
  1. fitted input called prediction [Abstract]
    "This design admits a clean information-theoretic justification: maintaining high bit-accuracy on the edited image lower-bounds the mutual information that the editor channel preserves between watermark and edited output, the quantity that fundamentally controls watermark recoverability."

    The framework explicitly optimizes the diffusion editor with a thresholded watermark-decoding loss to enforce high bit accuracy on the output. The subsequent claim that this bit accuracy lower-bounds mutual information (and thereby controls recoverability) is therefore a direct restatement of the training objective rather than an independent derivation; bit accuracy is the fitted quantity being measured on the same edited images.

full rationale

The paper's central claim of fundamental compatibility between editability and watermark integrity rests on adding a thresholded decoding loss during fine-tuning and then reporting high bit accuracy plus preserved edit quality. The information-theoretic step is presented as independent justification but essentially restates that the quantity being directly optimized (bit accuracy) implies recoverability. No derivation of the lower bound is supplied in the abstract, and bit accuracy is not an external benchmark. However, the paper reports extensive cross-dataset and cross-method experiments, so the compatibility result retains some independent empirical content rather than being purely definitional.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full paper would likely reveal additional parameters and assumptions. The threshold value in the loss and the assumption that the loss does not harm edit quality are the main unexamined elements.

free parameters (1)
  • threshold for watermark-decoding loss
    The threshold determines when the loss activates and is chosen to balance edit quality against watermark preservation.
axioms (1)
  • domain assumption High bit accuracy on the edited image lower-bounds the mutual information preserved by the editor channel between watermark and output.
    This information-theoretic step is invoked to justify that the loss ensures recoverability.

pith-pipeline@v0.9.0 · 5739 in / 1273 out tokens · 46037 ms · 2026-05-20T06:42:38.297917+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 5 internal anchors

  1. [1]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

  2. [2]

    Photorealistic text-to- image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to- image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022

  3. [3]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022

  4. [4]

    Lawa: Using latent space for in-generation image watermarking

    Ahmad Rezaei, Mohammad Akbari, Saeed Ranjbar Alvar, Arezou Fatemi, and Yong Zhang. Lawa: Using latent space for in-generation image watermarking. InEuropean Conference on Computer Vision, pages 118–136. Springer, 2024

  5. [5]

    Wouaf: Weight modulation for user attribution and fingerprinting in text-to-image diffusion models

    Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, and Yezhou Yang. Wouaf: Weight modulation for user attribution and fingerprinting in text-to-image diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8974–8983, 2024

  6. [6]

    Hidden: Hiding data with deep networks

    Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. Hidden: Hiding data with deep networks. In Proceedings of the European conference on computer vision (ECCV), pages 657–672, 2018

  7. [7]

    Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression

    Zhaoyang Jia, Han Fang, and Weiming Zhang. Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression. InProceedings of the 29th ACM International Conference on Multimedia, pages 41–49, 2021

  8. [8]

    Stegastamp: Invisible hyperlinks in physical photographs

    Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp: Invisible hyperlinks in physical photographs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117–2126, 2020

  9. [9]

    The stable signature: Rooting watermarks in latent diffusion models

    Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22466–22477, 2023

  10. [10]

    Tree-rings watermarks: Invisible fingerprints for diffusion images.Advances in Neural Information Processing Systems, 36:58047–58063, 2023

    Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-rings watermarks: Invisible fingerprints for diffusion images.Advances in Neural Information Processing Systems, 36:58047–58063, 2023

  11. [11]

    Robust-wide: Robust watermarking against instruction-driven image editing

    Runyi Hu, Jie Zhang, Ting Xu, Jiwei Li, and Tianwei Zhang. Robust-wide: Robust watermarking against instruction-driven image editing. InEuropean Conference on Computer Vision, pages 20–37. Springer, 2024. 10

  12. [12]

    Jigmark: A black-box approach for enhancing image watermarks against diffusion model edits.arXiv preprint arXiv:2406.03720, 2024

    Minzhou Pan, Yi Zeng, Zongyuan Ge, and Ruoxi Jia. Jigmark: A black-box approach for enhancing image watermarks against diffusion model edits.arXiv preprint arXiv:2406.03720, 2024

  13. [13]

    Robust watermarking using generative priors against image editing: From benchmarking to advances.arXiv preprint arXiv:2410.18775, 2024

    Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. Robust watermarking using generative priors against image editing: From benchmarking to advances.arXiv preprint arXiv:2410.18775, 2024

  14. [14]

    Editguard: Versatile image watermarking for tamper localization and copyright protection

    Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, and Jian Zhang. Editguard: Versatile image watermarking for tamper localization and copyright protection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11964–11974, 2024

  15. [15]

    Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models

    Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, and Zhengzhong Tu. Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 8213–8224, 2025

  16. [16]

    SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

    Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021

  17. [17]

    Instructpix2pix: Learning to follow image editing instructions

    Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023

  18. [18]

    Diffusionclip: Text-guided diffusion models for robust image manipulation

    Gwanghyun Kim, Taesung Kwon, and Jong Chul Ye. Diffusionclip: Text-guided diffusion models for robust image manipulation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2426–2435, 2022

  19. [19]

    Diffusion models already have a semantic latent space.arXiv preprint arXiv:2210.10960,

    Mingi Kwon, Jaeseok Jeong, and Youngjung Uh. Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960, 2022

  20. [20]

    Prompt-to-Prompt Image Editing with Cross Attention Control

    Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to- prompt image editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022

  21. [21]

    Rosteals: Robust steganography using autoencoder latent space

    Tu Bui, Shruti Agarwal, Ning Yu, and John Collomosse. Rosteals: Robust steganography using autoencoder latent space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 933–942, 2023

  22. [22]

    Trustmark: Universal watermarking for arbitrary resolution images.arXiv preprint arXiv:2311.18297, 2023

    Tu Bui, Shruti Agarwal, and John Collomosse. Trustmark: Universal watermarking for arbitrary resolution images.arXiv preprint arXiv:2311.18297, 2023

  23. [23]

    Waves: Benchmarking the robustness of image watermarks.arXiv preprint arXiv:2401.08573, 2024

    Bo An, Ming Ding, Mohammad Rabbani, Shipra Agrawal, Hu Xu, Ke Deng, Nanning Zhu, Amr Mo- hamed, Galen McMahan, Shyam Raman, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Waves: Benchmarking the robustness of image watermarks.arXiv preprint arXiv:2401.08573, 2024

  24. [24]

    LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

    Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Con- struction of a large-scale image dataset using deep learning with humans in the loop.arXiv preprint arXiv:1506.03365, 2015

  25. [25]

    Deep learning face attributes in the wild

    Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015

  26. [26]

    Stargan v2: Diverse image synthesis for multiple domains

    Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. Stargan v2: Diverse image synthesis for multiple domains. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020

  27. [27]

    Towards real-time text-driven image manipulation with unconditional diffusion models.arXiv preprint arXiv:2304.04344, 2023

    Nikita Starodubcev, Dmitry Baranchuk, Valentin Khrulkov, and Artem Babenko. Towards real-time text-driven image manipulation with unconditional diffusion models.arXiv preprint arXiv:2304.04344, 2023

  28. [28]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017

  29. [29]

    Improved techniques for training gans.Advances in neural information processing systems, 29, 2016

    Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016

  30. [30]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 11

  31. [31]

    Drag your gan: Interactive point-based manipulation on the generative image manifold

    Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, and Christian Theobalt. Drag your gan: Interactive point-based manipulation on the generative image manifold. InACM SIGGRAPH 2023 conference proceedings, pages 1–11, 2023

  32. [32]

    Editgan: High-precision semantic image editing.Advances in Neural Information Processing Systems, 34:16331– 16345, 2021

    Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, and Sanja Fidler. Editgan: High-precision semantic image editing.Advances in Neural Information Processing Systems, 34:16331– 16345, 2021

  33. [33]

    Omniguard: Hybrid manipulation localization via augmented versatile deep image watermarking

    Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, and Jian Zhang. Omniguard: Hybrid manipulation localization via augmented versatile deep image watermarking. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 3008–3018, 2025

  34. [34]

    Attack-resilient image watermarking using stable diffusion.Advances in Neural Information Processing Systems, 37: 38480–38507, 2024

    Lijun Zhang, Xiao Liu, Antoni V Martin, Cindy X Bearfield, Yuriy Brun, and Hui Guan. Attack-resilient image watermarking using stable diffusion.Advances in Neural Information Processing Systems, 37: 38480–38507, 2024

  35. [35]

    C2pa technical specification, version 1.3

    Coalition for Content Provenance and Authenticity (C2PA). C2pa technical specification, version 1.3. https://c2pa.org/specifications/specifications/1.3/index.html, 2023. Accessed: 2026- 02-02

  36. [36]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014. A Background A.1 Diffusion-Based Image Editing Diffusion models have not only revolutionized image synthesis but have also reshaped image editing by providing a flexible, semantically rich generative prior. Traditional editing techniques, ranging from Photo...

  37. [37]

    = 0) gives H(W b | ˆWb)≤H 2(1−a b). Expanding H(W| ˆW) by the chain rule and using that conditioning on a richerσ-algebra reduces entropy, H(W| ˆW) = BX b=1 H(W b |W <b, ˆW)≤ BX b=1 H(W b | ˆWb)≤ BX b=1 H2(1−a b), where the first inequality holds because (W<b, ˆW) contains ˆWb as a sub-component. By concavity of H2 on [0,1] (Jensen),P b H2(1−a b)≤B H 2(1−...