arxiv: 2605.10600 · v1 · submitted 2026-05-11 · 💻 cs.CR

Recognition: no theorem link

Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing

Desen Sun , Jason Hon , Howe Wang , Saarth Rajan , Meng Xu , Sihang Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:39 UTC · model grok-4.3

classification 💻 cs.CR

keywords image editingdiffusion modelssecurity attackshidden payloadbranding injectiongenerative AIadversarial hints

0 comments

The pith

Nearly invisible hints embedded in images get automatically re-rendered as branding by generative editing models without any prompt mention.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that in the standard two-step process of first generating an image from text and then refining it with image-to-image editing, a nearly invisible visual hint such as a logo can be hidden in the input and still get picked up and added to the final output by the model. This happens even when the editing instructions say nothing about the hint, making the injection hard for users to notice. The authors demonstrate this in two attack settings, one where the service injects the hint into returned images and one where a poisoned model is distributed, and they measure success rates while keeping the changes visually imperceptible.

Core claim

A nearly invisible hint, like branding information embedded in an input image, can be recognized by downstream generative models and subsequently re-rendered onto semantically related objects, even when the user prompt does not explicitly mention it.

What carries the argument

Hint embedding, in which subtle visual payloads are placed in images so that text-guided diffusion models detect and propagate them across editing steps onto related content.

Load-bearing premise

Current text-to-image and image-to-image diffusion models will reliably detect and propagate subtle embedded visual hints across editing steps without explicit prompting.

What would settle it

Multiple trials of the described image-editing workflow in which outputs never display the embedded logo or branding when the prompt makes no reference to it.

Figures

Figures reproduced from arXiv: 2605.10600 by Desen Sun, Howe Wang, Jason Hon, Meng Xu, Saarth Rajan, Sihang Liu.

**Figure 2.** Figure 2: Logo render success rate under various image entropy levels. MountainFlowerBurger Bag Laptop Car 0 25 50 Success Rate (%) Success Rate CLIP Similarity 0.00 0.05 0.10 CLIP Similarity [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 5.** Figure 5: Phishing-based attack scenario. An attacker builds a phishing image generation service that injects hidden logos into [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Logos used for evaluation. logo-related attacks [9], [22] [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: Examples from HQ-Edit dataset [35]. Phase 1 output image contains attack’s logo and Phase 2 renders the logo into [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 10.** Figure 10: CLIP score comparison between no-attack, the [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: Fraction of invisible injection of logos under different [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 15.** Figure 15: Poison-based attack scenario. An attacker distributes a poisoned model that is later used by a victim. The victim uses [PITH_FULL_IMAGE:figures/full_fig_p009_15.png] view at source ↗

**Figure 16.** Figure 16: Attack success rates of the poison-based attack. [PITH_FULL_IMAGE:figures/full_fig_p011_16.png] view at source ↗

**Figure 17.** Figure 17: (a) Logo injection rate and (b) logo invisibility rate [PITH_FULL_IMAGE:figures/full_fig_p011_17.png] view at source ↗

**Figure 18.** Figure 18: Workflow of attack mitigation mechanism. With the mitigation model, the hidden logo is removed and the editing [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗

**Figure 19.** Figure 19: Mitigation success rates for phishing-based attack. [PITH_FULL_IMAGE:figures/full_fig_p012_19.png] view at source ↗

**Figure 20.** Figure 20: Example images from the HQ-Edit dataset under the phishing-based attack, including images before and after mitigation. [PITH_FULL_IMAGE:figures/full_fig_p013_20.png] view at source ↗

**Figure 21.** Figure 21: CLIP scores before and after applying mitigation to [PITH_FULL_IMAGE:figures/full_fig_p013_21.png] view at source ↗

read the original abstract

With the rapid advancement of generative AI, users increasingly rely on image-generation models for image design and creation. To achieve faithful outputs, users typically engage in multi-turn interactions during image refinement: a text-to-image generation phase followed by a text-guided image-to-image editing phase. In this paper, we investigate a novel security vulnerability associated with such a workflow. Our key insight is that a nearly invisible hint, like branding information (e.g., a logo), embedded in an input image can be recognized by downstream generative models and subsequently re-rendered onto semantically related objects, even when the user prompt does not explicitly mention it. This form of hidden payload injection makes the attack stealthy. We study two realistic attack scenarios. The first is a phishing-based setting, in which an attacker controls an online image generation service and injects hidden content into generated images before they are returned to users. The second is a poison-based setting, where an attacker distributes a compromised text-to-image diffusion model whose output contains hidden content. We evaluate both attacks using six injected payloads, including well-known logos and customized designs, and demonstrate that the two attacks can achieve success rates of 44.4% and 32.2% on average, respectively, while ensuring the injected logos are visually imperceptible. We also develop a mitigation solution that achieves an average success rate of 87.4% and 92.3% against the phishing-based and poison-based attacks, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a plausible new attack where invisible hints in images get re-rendered as branding during text-guided edits, but missing no-hint controls and thin eval details make the success rates hard to trust.

read the letter

The main point is that you can hide a logo or similar payload in an input image so that downstream diffusion models pick it up and slap it onto edited outputs even when the user's prompt never mentions it. They test this in a phishing setup where the service injects the hint and a poisoning setup where the model itself is compromised, getting average success rates of 44% and 32% across six payloads while keeping the hints visually imperceptible. They also show a mitigation that cuts those rates sharply.

Referee Report

2 major / 1 minor

Summary. The paper claims that nearly invisible branding hints (e.g., logos) embedded in input images can be recognized by downstream text-to-image diffusion models and re-rendered onto semantically related objects during text-guided editing, even without explicit prompt mention. It evaluates this vulnerability in two scenarios: a phishing attack where an online service injects the hint into generated images, and a poisoning attack via a compromised diffusion model. Across six payloads, the attacks achieve average success rates of 44.4% (phishing) and 32.2% (poisoning) while remaining imperceptible; a mitigation is proposed that reduces success to 12.6% and 7.7%, respectively.

Significance. If validated, the work identifies a stealthy attack vector in iterative generative AI workflows that could erode trust in image-generation services and models. It extends adversarial ML research on diffusion models by showing propagation of subtle visual cues without explicit prompting. The empirical demonstration of imperceptible injection and the proposed mitigation provide concrete, actionable insights for securing multi-turn editing pipelines.

major comments (2)

[Evaluation (abstract and results sections)] The reported success rates (44.4% phishing, 32.2% poison) are presented without a no-hint control baseline in which identical editing prompts and models are run on clean images. Absent this comparison, the results cannot distinguish hint-driven re-rendering from spontaneous logo hallucination by the base model, which is load-bearing for the central claim that the embedded hint causes the observed effect.
[Evaluation methodology] No details are provided on sample size, exact success measurement protocol (human judgment criteria, automated detection, or inter-rater agreement), statistical tests, or controls for post-hoc selection across the six payloads. This absence prevents assessment of whether the quantitative claims are robust.

minor comments (1)

[Abstract] The abstract states average success rates but does not report per-payload breakdowns, standard deviations, or variance, which would clarify consistency of the effect.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important aspects of our evaluation that require clarification and strengthening. We address each major comment below.

read point-by-point responses

Referee: [Evaluation (abstract and results sections)] The reported success rates (44.4% phishing, 32.2% poison) are presented without a no-hint control baseline in which identical editing prompts and models are run on clean images. Absent this comparison, the results cannot distinguish hint-driven re-rendering from spontaneous logo hallucination by the base model, which is load-bearing for the central claim that the embedded hint causes the observed effect.

Authors: We agree that a no-hint control baseline is essential to isolate the causal contribution of the embedded hints from any potential spontaneous logo generation by the base diffusion model. In the revised manuscript, we will add a dedicated control experiment in which the identical editing prompts and models are applied to clean images without any injected hints. This will allow direct comparison and provide stronger evidence that the observed re-rendering rates are driven by the hints. revision: yes
Referee: [Evaluation methodology] No details are provided on sample size, exact success measurement protocol (human judgment criteria, automated detection, or inter-rater agreement), statistical tests, or controls for post-hoc selection across the six payloads. This absence prevents assessment of whether the quantitative claims are robust.

Authors: We acknowledge that additional methodological details are needed to enable full assessment of robustness. In the revised manuscript, we will expand the evaluation section to specify the sample sizes employed for each payload and scenario, the exact success measurement protocol (including the human judgment criteria used to determine successful logo re-rendering), any inter-rater agreement metrics, the statistical tests applied, and the a priori rationale for selecting the six payloads (to address potential concerns about post-hoc selection). revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements

full rationale

The paper's central claims consist of observed attack success rates (44.4% phishing, 32.2% poison) and mitigation rates obtained by running concrete embedding and editing experiments on diffusion models. These quantities are not derived from any equations, fitted parameters, or self-referential definitions; they are reported as measured outcomes under the described attack scenarios. No load-bearing step reduces to its own inputs by construction, and the provided text contains no self-citations or uniqueness theorems that would create circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical security demonstration rather than a theoretical derivation; the load-bearing premise is the observed behavior of diffusion models rather than any fitted constants or new postulated entities.

axioms (1)

domain assumption Generative image models recognize and re-render subtle embedded visual hints from input images during editing steps even without explicit textual mention
This is the core insight stated in the abstract that enables the re-rendering attack.

pith-pipeline@v0.9.0 · 5576 in / 1387 out tokens · 56145 ms · 2026-05-12T04:39:32.044464+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 6 internal anchors

[1]

Create with adobe firefly generative AI,

Adobe, “Create with adobe firefly generative AI,” https://www.adobe. com/products/firefly.html, 2023

work page 2023
[2]

DALL·E 3 system card,

OpenAI, “DALL·E 3 system card,” https://cdn.openai.com/papers/ DALL E 3 System Card.pdf, 2023

work page 2023
[3]

B. F. Labs, “Flux,” https://github.com/black-forest-labs/flux, 2024

work page 2024
[4]

Qwen-Image Technical Report

C. Wu, J. Li, J. Zhou, J. Lin, K. Gao, K. Yan, S. ming Yin, S. Bai, X. Xu, Y . Chen, Y . Chen, Z. Tang, Z. Zhang, Z. Wang, A. Yang, B. Yu, C. Cheng, D. Liu, D. Li, H. Zhang, H. Meng, H. Wei, J. Ni, K. Chen, K. Cao, L. Peng, L. Qu, M. Wu, P. Wang, S. Yu, T. Wen, W. Feng, X. Xu, Y . Wang, Y . Zhang, Y . Zhu, Y . Wu, Y . Cai, and Z. Liu, “Qwen-image technica...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

GPT-4o system card,

OpenAI, “GPT-4o system card,” https://openai.com/index/ gpt-4o-system-card/, 2024

work page 2024
[6]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

B. F. Labs, S. Batifol, A. Blattmann, F. Boesel, S. Consul, C. Diagne, T. Dockhorn, J. English, Z. English, P. Esser, S. Kulal, K. Lacey, Y . Levi, C. Li, D. Lorenz, J. M ¨uller, D. Podell, R. Rombach, H. Saini, A. Sauer, and L. Smith, “Flux.1 kontext: Flow matching for in-context image generation and editing in latent space,” 2025. [Online]. Available: h...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Step1X-Edit: A Practical Framework for General Image Editing

S. Liu, Y . Han, P. Xing, F. Yin, R. Wang, W. Cheng, J. Liao, Y . Wang, H. Fu, C. Han, G. Li, Y . Peng, Q. Sun, J. Wu, Y . Cai, Z. Ge, R. Ming, L. Xia, X. Zeng, Y . Zhu, B. Jiao, X. Zhang, G. Yu, and D. Jiang, “Step1X-Edit: A practical framework for general image editing,”arXiv preprint arXiv:2504.17761, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

arXiv preprint arXiv:2508.10711 (2025) 2, 4, 10, 12, 13

N. Team, C. Han, G. Li, J. Wu, Q. Sun, Y . Cai, Y . Peng, Z. Ge, D. Zhou, H. Tang, H. Zhou, K. Liu, A. Huang, B. Wang, C. Miao, D. Sun, E. Yu, F. Yin, G. Yu, H. Nie, H. Lv, H. Hu, J. Wang, J. Zhou, J. Sun, K. Tan, K. An, K. Lin, L. Zhao, M. Chen, P. Xing, R. Wang, S. Liu, S. Xia, T. You, W. Ji, X. Zeng, X. Han, X. Zhang, Y . Wei, Y . Xu, Y . Jiang, Y . Wa...

work page arXiv 2025
[9]

Silent branding attack: Trigger-free data poisoning attack on text-to-image diffusion models,

S. Jang, J. S. Choi, J. Jo, K. Lee, and S. J. Hwang, “Silent branding attack: Trigger-free data poisoning attack on text-to-image diffusion models,”arXiv preprint arXiv:2503.09669, 2025

work page arXiv 2025
[10]

Implicit bias injection attacks against text-to-image diffusion models,

H. Huang, X. Jin, J. Miao, and Y . Wu, “Implicit bias injection attacks against text-to-image diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 28 779–28 789

work page 2025
[11]

EvilEdit: Backdooring text-to-image diffusion models in one second,

H. Wang, S. Guo, J. He, K. Chen, S. Zhang, T. Zhang, and T. Xiang, “EvilEdit: Backdooring text-to-image diffusion models in one second,” inACM Multimedia 2024, 2024. [Online]. Available: https://openreview.net/forum?id=ibEaSS6bQn

work page 2024
[12]

TrojanEdit: Multimodal backdoor attack against image editing model,

J. Guo, R. Zhang, W. Jiang, Y . Zhu, F. Chen, J. Li, J. He, and H. Li, “TrojanEdit: Multimodal backdoor attack against image editing model,” Neurocomputing, vol. 681, p. 133346, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231226007435

work page 2026
[13]

arXiv preprint arXiv:2503.11519 (2025)

H. Cheng, E. Xiao, Y . Wang, L. Zhang, Q. Zhang, J. Cao, K. Xu, M. Sun, X. Hao, J. Gu, and R. Xu, “Exploring typographic visual prompts injection threats in cross-modality generation models,” 2025. [Online]. Available: https://arxiv.org/abs/2503.11519

work page arXiv 2025
[14]

When the prompt becomes visual: Vision-centric jailbreak attacks for large image editing models,

J. Hou, Y . Sun, R. Jin, H. Han, F. Liu, W. K. V . Chan, and A. J. Wang, “When the prompt becomes visual: Vision-centric jailbreak attacks for large image editing models,” 2026. [Online]. Available: https://arxiv.org/abs/2602.10179

work page arXiv 2026
[15]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inProceedings of the 34th International Conference on Neural Infor- mation Processing Systems, ser. NeuIPS ’20. Red Hook, NY , USA: Curran Associates Inc., 2020

work page 2020
[16]

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. M ¨uller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boesel, D. Podell, T. Dockhorn, Z. English, K. Lacey, A. Goodwin, Y . Marek, and R. Rombach, “Scaling rectified flow transformers for high-resolution image synthesis,” 2024. [Online]. Available: https://arxiv.org/abs/2403.03206

work page internal anchor Pith review arXiv 2024
[17]

Nano banana can be prompt engineered for ex- tremely nuanced ai image generation,

M. Woolf, “Nano banana can be prompt engineered for ex- tremely nuanced ai image generation,” https://minimaxir.com/2025/11/ nano-banana-prompts/, 2025

work page 2025
[18]

Understanding implosion in text-to-image generative models,

W. Ding, C. Y . Li, S. Shan, B. Y . Zhao, and H. Zheng, “Understanding implosion in text-to-image generative models,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1211–1225. [Online]. Available: https://doi.org/10.1145/3658644.3690205

work page doi:10.1145/3658644.3690205 2024
[19]

The stronger the diffusion model, the easier the backdoor: Data poisoning to induce copyright breaches without adjusting finetuning pipeline,

H. Wang, Q. Shen, Y . Tong, Y . Zhang, and K. Kawaguchi, “The stronger the diffusion model, the easier the backdoor: Data poisoning to induce copyright breaches without adjusting finetuning pipeline,” inForty-first International Conference on Machine Learning (ICML), 2024. [Online]. Available: https://openreview.net/forum?id=ZvFLbEPv6x

work page 2024
[20]

REDEditing: Relationship-Driven precise backdoor poisoning on text-to-image diffusion models,

C. Guo, J. Fu, J. Fang, K. Wang, and G. Feng, “REDEditing: Relationship-Driven precise backdoor poisoning on text-to-image diffusion models,” 2025. [Online]. Available: https://arxiv.org/abs/2504. 14554

work page 2025
[21]

From trojan horses to castle walls: Unveiling bilateral data poisoning effects in diffusion models,

Z. Pan, Y . Yao, G. Liu, B. Shen, H. V . Zhao, R. R. Kompella, and S. Liu, “From trojan horses to castle walls: Unveiling bilateral data poisoning effects in diffusion models,” inAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, Eds., vol. 37. Curran Associates, Inc., 2024...

work page 2024
[22]

Attacks on approximate caches in text-to-image diffusion models,

D. Sun, S. Jie, and S. Liu, “Attacks on approximate caches in text-to-image diffusion models,” 2026. [Online]. Available: https: //arxiv.org/abs/2508.20424

work page arXiv 2026
[23]

On the feasibility of poisoning text-to-image ai models via adversarial mislabeling,

S. Wu, R. Bhaskar, A. Y . J. Ha, S. Shan, H. Zheng, and B. Y . Zhao, “On the feasibility of poisoning text-to-image ai models via adversarial mislabeling,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 2848–2862. [Online]. Available: ...

work page doi:10.1145/3719027.3744845 2025
[24]

AdvI2I: Adversarial image attack on image-to-image diffusion models,

Y . Zeng, Y . Cao, B. Cao, Y . Chang, J. Chen, and L. Lin, “AdvI2I: Adversarial image attack on image-to-image diffusion models,” in Forty-second International Conference on Machine Learning (ICML),

work page
[25]

Available: https://openreview.net/forum?id=It1AkQ6xEJ

[Online]. Available: https://openreview.net/forum?id=It1AkQ6xEJ

work page
[26]

Pixel is not a barrier: An effective evasion attack for pixel-domain diffusion models,

C.-Y . Shih, L.-X. Peng, J.-W. Liao, E. Chu, C.-F. Chou, and J.-C. Chen, “Pixel is not a barrier: An effective evasion attack for pixel-domain diffusion models,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, pp. 6905–6913, Apr. 2025. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/32741

work page 2025
[27]

SneakyPrompt: Jailbreaking text-to-image generative models,

Y . Yang, B. Hui, H. Yuan, N. Gong, and Y . Cao, “SneakyPrompt: Jailbreaking text-to-image generative models,” inIEEE Symposium on Security and Privacy (SP), 2024, pp. 897–912

work page 2024
[28]

Nightshade: Prompt-specific poisoning attacks on text-to-image gener- ative models,

S. Shan, W. Ding, J. Passananti, S. Wu, H. Zheng, and B. Y . Zhao, “Nightshade: Prompt-specific poisoning attacks on text-to-image gener- ative models,” inIEEE Symposium on Security and Privacy (SP), 2024, pp. 807–825

work page 2024
[29]

SurrogatePrompt: Bypassing the safety filter of text-to-image models 14 via substitution,

Z. Ba, J. Zhong, J. Lei, P. Cheng, Q. Wang, Z. Qin, Z. Wang, and K. Ren, “SurrogatePrompt: Bypassing the safety filter of text-to-image models 14 via substitution,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1166–1180. [Online]. Availab...

work page doi:10.1145/3658644.3690346 2024
[30]

Jailbreaking prompt attack: A controllable adversarial attack against diffusion models,

J. Ma, Y . Li, Z. Xiao, A. Cao, J. Zhang, C. Ye, and J. Zhao, “Jailbreaking prompt attack: A controllable adversarial attack against diffusion models,” inFindings of the Association for Computational Linguistics: NAACL 2025, L. Chiruzzo, A. Ritter, and L. Wang, Eds. Albuquerque, New Mexico: Association for Computational Linguistics, Apr. 2025, pp. 3141–31...

work page 2025
[31]

The dual power of interpretable token embeddings: Jailbreaking attacks and defenses for diffusion model unlearning,

S. Chen, Y . Zhang, S. Liu, and Q. Qu, “The dual power of interpretable token embeddings: Jailbreaking attacks and defenses for diffusion model unlearning,” 2025. [Online]. Available: https://arxiv.org/abs/2504.21307

work page arXiv 2025
[32]

Great, now write an article about that: The crescendo{Multi-Turn}{LLM}jailbreak attack,

M. Russinovich, A. Salem, and R. Eldan, “Great, now write an article about that: The crescendo{Multi-Turn}{LLM}jailbreak attack,” in34th USENIX Security Symposium (USENIX Security), 2025, pp. 2421–2440

work page 2025
[33]

When memory becomes a vulnerability: Towards multi- turn jailbreak attacks against text-to-image generation systems,

S. Zhao, J. Liu, Y . Li, R. Hu, X. Jia, W. Fan, X. Wu, X. Li, J. Zhang, W. Donget al., “When memory becomes a vulnerability: Towards multi- turn jailbreak attacks against text-to-image generation systems,”arXiv preprint arXiv:2504.20376, 2025

work page arXiv 2025
[34]

A robust image watermarking technique based on quantization noise visibility thresholds,

F. Autrusseau and P. Le Callet, “A robust image watermarking technique based on quantization noise visibility thresholds,”Signal Processing, vol. 87, no. 6, pp. 1363–1383, 2007. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0165168406004154

work page 2007
[35]

Human visual system models in digital image watermarking,

D. Levick `y and P. Foriˇs, “Human visual system models in digital image watermarking,”Radioengineering, vol. 13, no. 4, pp. 38–43, 2004

work page 2004
[36]

HQ-edit: A high-quality dataset for instruction- based image editing,

M. Hui, S. Yang, B. Zhao, Y . Shi, H. Wang, P. Wang, C. Xie, and Y . Zhou, “HQ-edit: A high-quality dataset for instruction- based image editing,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https: //openreview.net/forum?id=mZptYYttFj

work page 2025
[37]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948

work page 1948
[38]

CLIPScore: A reference-free evaluation metric for image captioning,

J. Hessel, A. Holtzman, M. Forbes, R. Le Bras, and Y . Choi, “CLIPScore: A reference-free evaluation metric for image captioning,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2021, pp. 7514–7528. [Online]. Available: https://aclanthology.org/2021.emnlp-main.595

work page 2021
[39]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, C. Zheng, D. Liu, F. Zhou, F. Huang, F. Hu, H. Ge, H. Wei, H. Lin, J. Tang, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Bao, K. Yang, L. Yu, L. Deng, M. Li, M. Xue, M. Li, P. Zhang, P. Wang, Q. Zhu, R. Men, R. Gao, S. Liu, S. Luo, T. Li, T. Ta...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

A jnd model using a texture-edge selector based on faber-schauder wavelet lifting scheme,

M. Amar, R. Harba, H. Douzi, F. Ros, M. El Hajji, R. Riad, and K. Gourrame, “A jnd model using a texture-edge selector based on faber-schauder wavelet lifting scheme,” inImage and Signal Processing, A. Mansouri, F. Nouboud, A. Chalifour, D. Mammass, J. Meunier, and A. Elmoataz, Eds. Cham: Springer International Publishing, 2016, pp. 328–336

work page 2016
[41]

Rethinking and conceptualizing just noticeable difference estimation by residual learn- ing,

Q. Jiang, F. Liu, Z. Wang, S. Wang, and W. Lin, “Rethinking and conceptualizing just noticeable difference estimation by residual learn- ing,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 10, pp. 9515–9527, 2024

work page 2024
[42]

MaskAttn-SDXL: Controllable region-level text-to-image generation,

Y . Chang, J. Chen, A. Cheng, and P. Bogdan, “MaskAttn-SDXL: Controllable region-level text-to-image generation,” 2025. [Online]. Available: https://arxiv.org/abs/2509.15357

work page arXiv 2025
[43]

UIBDiffusion: Universal imperceptible backdoor attack for diffusion models,

Y . Han, B. Zhao, R. Chu, F. Luo, B. Sikdar, and Y . Lao, “UIBDiffusion: Universal imperceptible backdoor attack for diffusion models,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 19 186–19 196

work page 2025
[44]

VII: Visual instruction injection for jailbreaking image-to-video generation models,

B. Zheng, Y . Xiang, Z. Hong, Z. Lin, C. Yu, T. Liu, and X. You, “VII: Visual instruction injection for jailbreaking image-to-video generation models,” 2026. [Online]. Available: https://arxiv.org/abs/2602.20999

work page arXiv 2026
[45]

C2PA: Verifying media content sources,

Coalition for Content Provenance and Authenticity, “C2PA: Verifying media content sources,” https://c2pa.org/, 2026, accessed: 2026-05-04

work page 2026
[46]

Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025

S. Gowal, R. Bunel, F. Stimberg, D. Stutz, G. Ortiz-Jimenez, C. Kouridi, M. Vecerik, J. Hayes, S.-A. Rebuffi, P. Bernard, C. Gamble, M. Z. Horv´ath, F. Kaczmarczyck, A. Kaskasoli, A. Petrov, I. Shumailov, M. Thotakuri, O. Wiles, J. Yung, Z. Ahmed, V . Martin, S. Rosen, C. Sav ˇcak, A. Senoner, N. Vyas, and P. Kohli, “SynthID-Image: Image watermarking at i...

work page arXiv 2025
[47]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” inICLR 2022, April

work page 2022
[48]

Available: https://www.microsoft.com/en-us/research/ publication/lora-low-rank-adaptation-of-large-language-models/

[Online]. Available: https://www.microsoft.com/en-us/research/ publication/lora-low-rank-adaptation-of-large-language-models/

work page
[49]

DiffusionDB: A large-scale prompt gallery dataset for text-to- image generative models,

Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau, “DiffusionDB: A large-scale prompt gallery dataset for text-to- image generative models,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association f...

work page 2023
[50]

Tree- Rings watermarks: Invisible fingerprints for diffusion images,

Y . Wen, J. Kirchenbauer, J. Geiping, and T. Goldstein, “Tree- Rings watermarks: Invisible fingerprints for diffusion images,” in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023. [Online]. Available: https://openreview.net/forum?id= Z57JrmubNl

work page 2023
[51]

The stable signature: Rooting watermarks in latent diffusion models,

P. Fernandez, G. Couairon, H. J ´egou, M. Douze, and T. Furon, “The stable signature: Rooting watermarks in latent diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 22 466–22 477

work page 2023
[52]

Pricing,

OpenAI, “Pricing,” https://developers.openai.com/api/docs/pricing, 2026

work page 2026
[53]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R ¨adle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K. V . Alwala, N. Carion, C.-Y . Wu, R. Girshick, P. Doll ´ar, and C. Feichtenhofer, “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024. [Online]. Available: https://arxiv.org/abs/2408. 00714

work page internal anchor Pith review Pith/arXiv arXiv 2024
[54]

What are model parameters? a complete guide to neural network parameters,

Articsledge, “What are model parameters? a complete guide to neural network parameters,” https://www.articsledge.com/post/ model-parameters, 2026

work page 2026
[55]

Kounavis, and Duen Horng Chau

N. Das, M. Shanbhogue, S.-T. Chen, F. Hohman, S. Li, L. Chen, M. E. Kounavis, and D. H. Chau, “SHIELD: Fast, practical defense and vaccination for deep learning using JPEG compression,” inProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’18. New York, NY , USA: Association for Computing Machinery, ...

work page doi:10.1145/3219819.3219910 2018
[56]

Fea- ture distillation: DNN-oriented JPEG compression against adversarial examples,

Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y . Wang, and W. Wen, “Fea- ture distillation: DNN-oriented JPEG compression against adversarial examples,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 860–868

work page 2019
[57]

Invisible image watermarks are provably removable using generative AI,

X. Zhao, K. Zhang, Z. Su, S. Vasan, I. Grishchenko, C. Kruegel, G. Vigna, Y .-X. Wang, and L. Li, “Invisible image watermarks are provably removable using generative AI,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [Online]. Available: https://openreview.net/forum?id=7hy5fy2OC6 15

work page 2024