pith. sign in

arxiv: 2602.19019 · v2 · submitted 2026-02-22 · 💻 cs.CV

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

Pith reviewed 2026-05-15 20:53 UTC · model grok-4.3

classification 💻 cs.CV
keywords watermarkingdiffusion modelsmulti-concept attributiongenerative AIlatent perturbationquery-based retrievalintellectual property
0
0 comments X

The pith

TokenTrace attributes multiple concepts in one AI-generated image by recovering watermarked tokens from jointly perturbed text and noise embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TokenTrace as a proactive watermarking approach for diffusion models that must handle composite images containing both objects and styles. Prior methods embed signals that cannot be separated when multiple concepts share the same output, leaving attribution ambiguous. The new framework perturbs both the text prompt embedding and the initial latent noise at generation time to create independent signatures. A query-driven retrieval module then accepts the finished image plus a text description of the desired concept and extracts the matching signature without cross-talk. This yields state-of-the-art accuracy on single- and multi-concept tasks while preserving visual fidelity and surviving common edits.

Core claim

TokenTrace embeds secret signatures by simultaneously perturbing the text prompt embedding and the initial latent noise that steer a diffusion model. Retrieval uses a query-based TokenTrace module that receives the generated image together with a textual query naming the concept to check; the module then disentangles and verifies each signature independently from the shared output.

What carries the argument

The query-based TokenTrace module, which receives an image and a textual concept query to selectively recover and verify the corresponding watermarked signature.

If this is right

  • Individual concepts such as a specific object or artistic style can be attributed even when they appear together in one image.
  • The same generated image can be queried multiple times to produce separate attribution reports for different concepts.
  • The signatures survive standard image transformations while image quality remains high.
  • The approach outperforms prior single-concept watermarking baselines on both single- and multi-concept test sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The perturbation strategy could be tested on video or 3D diffusion models to see whether frame-to-frame consistency preserves the recoverable signatures.
  • If the query module runs efficiently, online generators could offer on-demand attribution reports to users or rights holders.
  • The method suggests a general route for embedding multiple independent controls in latent spaces that might also improve prompt-based editing precision.

Load-bearing premise

Jointly perturbing the text prompt embedding and initial latent noise produces signatures that remain disentangleable by a query module for each concept without visible quality loss.

What would settle it

A side-by-side comparison in which the same image is generated with and without the dual perturbations, followed by checking whether the query module can still recover the correct concept signatures at high accuracy while the visual difference between the two outputs stays imperceptible to human viewers.

Figures

Figures reproduced from arXiv: 2602.19019 by John Collomosse, Li Zhang, Pengtao Xie, Shruti Agarwal, Vishal Asnani.

Figure 1
Figure 1. Figure 1: t-SNE visualization of predicted concept embed [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of TokenTrace. (a) Concept encoding: A concept secret is fed into a concept encoder to perturb the targeted concept [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative analysis of visual fidelity for watermarked images. (a) Results on abstract artistic style concepts from the WikiArt [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative example of multi-customized concept pre [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative example of multi-general concept prediction. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance on incremental concept learning. The plot [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation study for the length of the bit secret on the [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Generative AI models pose a significant challenge to intellectual property (IP), as they can replicate unique artistic styles and concepts without attribution. While watermarking offers a potential solution, existing methods often fail in complex scenarios where multiple concepts (e.g., an object and an artistic style) are composed within a single image. These methods struggle to disentangle and attribute each concept individually. In this work, we introduce TokenTrace, a novel proactive watermarking framework for robust, multi-concept attribution. Our method embeds secret signatures into the semantic domain by simultaneously perturbing the text prompt embedding and the initial latent noise that guide the diffusion model's generation process. For retrieval, we propose a query-based TokenTrace module that takes the generated image and a textual query specifying which concepts need to be retrieved (e.g., a specific object or style) as inputs. This query-based mechanism allows the module to disentangle and independently verify the presence of multiple concepts from a single generated image. Extensive experiments show that our method achieves state-of-the-art performance on both single-concept (object and style) and multi-concept attribution tasks, significantly outperforming existing baselines while maintaining high visual quality and robustness to common transformations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TokenTrace, a proactive watermarking framework for multi-concept attribution in diffusion models. Signatures are embedded by jointly perturbing the text prompt embedding and initial latent noise; a query-based TokenTrace module then takes a generated image plus a textual query to disentangle and verify the presence of individual concepts (objects or styles). The central claim is that this yields state-of-the-art performance on both single-concept and multi-concept tasks while preserving visual quality and robustness to common transformations.

Significance. If the separability of jointly embedded signatures can be rigorously demonstrated, the work would meaningfully advance IP attribution for composite generations, a setting where prior watermarking methods are known to fail. The query-driven recovery mechanism is a novel construction that could generalize beyond the reported setting, but its practical significance depends on quantitative evidence that the diffusion mixing does not produce unrecoverable crosstalk.

major comments (2)
  1. [§3] §3 (perturbation design): the claim that simultaneous perturbation of prompt embedding and initial noise produces independently recoverable signatures is load-bearing for the multi-concept results, yet the diffusion trajectory mixes these inputs over many steps; no orthogonality argument, crosstalk bound, or ablation (e.g., false-positive rate when querying one concept while another is present) is supplied to show that cross-term interference remains negligible.
  2. [§4] §4 (experiments): the abstract asserts SOTA performance and robustness on multi-concept tasks, but the provided text supplies no quantitative metrics, tables of attribution accuracy, ablation studies, or experimental protocol; without these the central empirical claim cannot be evaluated and the outperformance statement remains unsupported.
minor comments (2)
  1. [§3.1] Notation for the perturbation operators applied to the text embedding and latent noise should be made explicit (e.g., additive delta or learned offset) rather than described at a high level.
  2. [Figure 2] Figure captions and the query-module diagram should include a multi-concept example showing independent retrieval of object and style from the same image to illustrate the disentanglement claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address the major comments point by point below, and we will incorporate the suggested improvements in the revised version.

read point-by-point responses
  1. Referee: [§3] §3 (perturbation design): the claim that simultaneous perturbation of prompt embedding and initial noise produces independently recoverable signatures is load-bearing for the multi-concept results, yet the diffusion trajectory mixes these inputs over many steps; no orthogonality argument, crosstalk bound, or ablation (e.g., false-positive rate when querying one concept while another is present) is supplied to show that cross-term interference remains negligible.

    Authors: We recognize that the mixing in the diffusion process could potentially introduce crosstalk between the embedded signatures. Our design relies on the query-based TokenTrace module to disentangle concepts by conditioning on specific textual queries, which we believe minimizes interference. However, to strengthen this claim, we will include an orthogonality analysis of the perturbations and additional ablation experiments measuring false-positive rates when querying for one concept in the presence of others. These will be added to Section 3 in the revised manuscript. revision: yes

  2. Referee: [§4] §4 (experiments): the abstract asserts SOTA performance and robustness on multi-concept tasks, but the provided text supplies no quantitative metrics, tables of attribution accuracy, ablation studies, or experimental protocol; without these the central empirical claim cannot be evaluated and the outperformance statement remains unsupported.

    Authors: We apologize for any lack of clarity in the experimental presentation. The full manuscript includes detailed experimental results in Section 4, with tables reporting attribution accuracies, robustness metrics, and comparisons to baselines. To address this concern, we will expand the experimental section with more explicit tables, ablation studies, and a clearer description of the protocol to make the SOTA claims fully supported and evaluable. revision: yes

Circularity Check

0 steps flagged

No circularity; new construction with experimental validation

full rationale

The paper presents TokenTrace as a novel proactive watermarking method that embeds signatures by jointly perturbing text prompt embeddings and initial latent noise, then recovers via a query-based module. No equations, derivations, or claims reduce by construction to self-definitions, fitted inputs renamed as predictions, or self-citation chains. Performance assertions rest on external experiments rather than tautological re-derivations of inputs. The method is self-contained against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method description implies but does not quantify any perturbation strength or query threshold.

pith-pipeline@v0.9.0 · 5514 in / 1156 out tokens · 24797 ms · 2026-05-15T20:53:18.657008+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 1 internal anchor

  1. [1]

    Proactive image manipulation detection

    Vishal Asnani, Xi Yin, Tal Hassner, Sijia Liu, and Xiaoming Liu. Proactive image manipulation detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15386–15395, 2022. 2

  2. [2]

    Malp: Manipulation localization using a proactive scheme

    Vishal Asnani, Xi Yin, Tal Hassner, and Xiaoming Liu. Malp: Manipulation localization using a proactive scheme. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12343–12352, 2023. 2

  3. [3]

    ProMark: Proactive diffusion watermarking for causal attribution

    Vishal Asnani, John Collomosse, Tu Bui, Xiaoming Liu, and Shruti Agarwal. ProMark: Proactive diffusion watermarking for causal attribution. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10802–10811, 2024. 1, 2, 5

  4. [4]

    Custommark: Customization of diffusion models for proactive attribution

    Vishal Asnani, John Collomosse, Xiaoming Liu, and Shruti Agarwal. Custommark: Customization of diffusion models for proactive attribution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1512– 1522, 2025. 1, 2, 5

  5. [5]

    Foundation models defining a new era in vision: a survey and outlook

    Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 47(4):2245–2264, 2025. 1

  6. [6]

    EKILA: Synthetic me- dia provenance and attribution for generative art

    Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, An- drew Gilbert, and John Collomosse. EKILA: Synthetic me- dia provenance and attribution for generative art. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 913–922, 2023. 5

  7. [7]

    Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020. 5, 6

  8. [8]

    Anydoor: Zero-shot object-level im- age customization

    Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. Anydoor: Zero-shot object-level im- age customization. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6593–6602, 2024. 2

  9. [9]

    Good models borrow, great models steal: intellectual property rights and generative ai.Policy and Society, 44(1):23–37, 2025

    Simon Chesterman. Good models borrow, great models steal: intellectual property rights and generative ai.Policy and Society, 44(1):23–37, 2025. 2

  10. [10]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 5, 6

  11. [11]

    Scaling rectified flow transformers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InIn- ternational Conference on Machine Learning, pages 12606– 12633. PMLR, 2024. 1

  12. [12]

    Catch you everything everywhere: Guarding textual inversion via concept watermarking,

    Weitao Feng, Jiyan He, Jie Zhang, Tianwei Zhang, Wenbo Zhou, Weiming Zhang, and Nenghai Yu. Catch you every- thing everywhere: Guarding textual inversion via concept watermarking.arXiv preprint arXiv:2309.05940, 2023. 2

  13. [13]

    The stable signature: Rooting watermarks in latent diffusion models

    Pierre Fernandez, Guillaume Couairon, Herv ´e J ´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 22466–22477, 2023. 1, 2

  14. [14]

    An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

    Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patash- nik, Amit H Bermano, Gal Chechik, and Daniel Cohen- Or. An image is worth one word: Personalizing text-to- image generation using textual inversion.arXiv preprint arXiv:2208.01618, 2022. 2

  15. [15]

    Robust DWT-SVD domain image watermarking: embedding data in all frequen- cies

    Emir Ganic and Ahmet M Eskicioglu. Robust DWT-SVD domain image watermarking: embedding data in all frequen- cies. InProceedings of the 2004 Workshop on Multimedia and Security, pages 166–174, 2004. 2

  16. [16]

    Parameter-efficient transfer learning for nlp

    Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019. 4

  17. [17]

    Robin: Robust and invisible watermarks for diffusion models with adversarial optimization.Advances in Neural Information Processing Systems, 37:3937–3963, 2024

    Huayang Huang, Yu Wu, and Qian Wang. Robin: Robust and invisible watermarks for diffusion models with adversarial optimization.Advances in Neural Information Processing Systems, 37:3937–3963, 2024. 2

  18. [18]

    Diffusion model-based image editing: A survey.IEEE transactions on pattern analysis and machine intelligence, 47(6):4409–4437, 2025

    Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey.IEEE transactions on pattern analysis and machine intelligence, 47(6):4409–4437, 2025. 1

  19. [19]

    Vi- sual prompt tuning

    Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean conference on computer vision, pages 709–727. Springer, 2022. 1

  20. [20]

    Transparent medical image ai via an image–text foundation model grounded in medical literature.Nature medicine, 30(4):1154–1165, 2024

    Chanwoo Kim, Soham U Gadgil, Alex J DeGrave, Jesut- ofunmi A Omiye, Zhuo Ran Cai, Roxana Daneshjou, and Su-In Lee. Transparent medical image ai via an image–text foundation model grounded in medical literature.Nature medicine, 30(4):1154–1165, 2024. 1

  21. [21]

    Multi-concept customization of text-to-image diffusion

    Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1931–1941, 2023. 2

  22. [22]

    Photomaker: Customizing re- alistic human photos via stacked id embedding

    Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming- Ming Cheng, and Ying Shan. Photomaker: Customizing re- alistic human photos via stacked id embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8640–8650, 2024. 2

  23. [23]

    Black-box forgery attacks on se- mantic watermarks for diffusion models

    Andreas M ¨uller, Denis Lukovnikov, Jonas Thietke, Asja Fis- cher, and Erwin Quiring. Black-box forgery attacks on se- mantic watermarks for diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 20937–20946, 2025. 2

  24. [24]

    A survey on proactive deepfake defense: Disruption and watermarking.ACM Computing Surveys, 58 (5):1–37, 2025

    Hong-Hanh Nguyen-Le, Van-Tuan Tran, Thuc Nguyen, and Nhien-An Le-Khac. A survey on proactive deepfake defense: Disruption and watermarking.ACM Computing Surveys, 58 (5):1–37, 2025. 2

  25. [25]

    A self-supervised descriptor for image copy detection

    Ed Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal, and Matthijs Douze. A self-supervised descriptor for image copy detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 14532–14542, 2022. 5

  26. [26]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 4, 5

  27. [27]

    Lawa: Using latent space for in-generation image watermarking

    Ahmad Rezaei, Mohammad Akbari, Saeed Ranjbar Alvar, Arezou Fatemi, and Yong Zhang. Lawa: Using latent space for in-generation image watermarking. InEuropean Confer- ence on Computer Vision, pages 118–136. Springer, 2024. 1

  28. [28]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 2, 5

  29. [29]

    Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500– 22510, 2023. 1, 2

  30. [30]

    ALADIN: All layer adaptive instance normalization for fine- grained style similarity

    Dan Ruta, Saeid Motiian, Baldo Faieta, Zhe Lin, Hailin Jin, Alex Filipkowski, Andrew Gilbert, and John Collomosse. ALADIN: All layer adaptive instance normalization for fine- grained style similarity. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11926– 11935, 2021. 5

  31. [31]

    Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022. 2

  32. [32]

    Watermark anything with localized messages

    Tom Sander, Pierre Fernandez, Alain Durmus, Teddy Furon, and Matthijs Douze. Watermark anything with localized messages. InInternational Conference on Learning Repre- sentations (ICLR), 2025. 1, 2

  33. [33]

    A novel technique for digital image water- marking in spatial domain

    Amit Kumar Singh, Nomit Sharma, Mayank Dave, and Anand Mohan. A novel technique for digital image water- marking in spatial domain. In2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing, pages 497–501. IEEE, 2012. 2

  34. [34]

    Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024

    Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shra- may Palta, Micah Goldblum, Jonas Geiping, Abhinav Shri- vastava, and Tom Goldstein. Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024. 4

  35. [35]

    Improved artgan for conditional synthesis of natural image and artwork.IEEE Transactions on Image Processing, 28(1):394–409, 2018

    Wei Ren Tan, Chee Seng Chan, Hernan E Aguirre, and Kiyoshi Tanaka. Improved artgan for conditional synthesis of natural image and artwork.IEEE Transactions on Image Processing, 28(1):394–409, 2018. 5

  36. [36]

    Saliltorn Thongmeensuk. Rethinking copyright exceptions in the era of generative ai: Balancing innovation and intel- lectual property protection.The Journal of World Intellectual Property, 27(2):278–295, 2024. 2

  37. [37]

    Copyright, text & data mining and the in- novation dimension of generative ai.Journal of Intellectual Property Law & Practice, 19(7):557–570, 2024

    Kalpana Tyagi. Copyright, text & data mining and the in- novation dimension of generative ai.Journal of Intellectual Property Law & Practice, 19(7):557–570, 2024. 2

  38. [38]

    Must: Robust image watermarking for multi-source tracing

    Guanjie Wang, Zehua Ma, Chang Liu, Xi Yang, Han Fang, Weiming Zhang, and Nenghai Yu. Must: Robust image watermarking for multi-source tracing. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5364– 5371, 2024. 1

  39. [39]

    ROAR: Reducing inversion error in generative im- age watermarking

    Hanyi Wang, Han Fang, Shi-Lin Wang, and Ee-Chien Chang. ROAR: Reducing inversion error in generative im- age watermarking. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 19742–19751,

  40. [40]

    Faketagger: Robust safeguards against deepfake dis- semination via provenance tracking

    Run Wang, Felix Juefei-Xu, Meng Luo, Yang Liu, and Lina Wang. Faketagger: Robust safeguards against deepfake dis- semination via provenance tracking. InProceedings of the 29th ACM international conference on multimedia, pages 3546–3555, 2021. 2

  41. [41]

    Evaluating data attribution for text-to-image models

    Sheng-Yu Wang, Alexei A Efros, Jun-Yan Zhu, and Richard Zhang. Evaluating data attribution for text-to-image models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7192–7203, 2023. 5

  42. [42]

    Designdiffusion: High- quality text-to-design image generation with diffusion mod- els

    Zhendong Wang, Jianmin Bao, Shuyang Gu, Dong Chen, Wengang Zhou, and Houqiang Li. Designdiffusion: High- quality text-to-design image generation with diffusion mod- els. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 20906–20915, 2025. 2

  43. [43]

    Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models

    Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, and Zhengzhong Tu. Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 8213–8224, 2025. 1, 2

  44. [44]

    Elite: Encoding visual con- cepts into textual embeddings for customized text-to-image generation

    Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, and Wangmeng Zuo. Elite: Encoding visual con- cepts into textual embeddings for customized text-to-image generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15943–15953, 2023. 2

  45. [45]

    Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models

    Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weim- ing Zhang, and Nenghai Yu. Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12162– 12171, 2024. 2

  46. [46]

    Forget-me-not: Learning to for- get in text-to-image diffusion models

    Gong Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, and Humphrey Shi. Forget-me-not: Learning to for- get in text-to-image diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1755–1764, 2024. 2

  47. [47]

    Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking

    Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, and Jian Zhang. Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 3008–3018,

  48. [48]

    Easycontrol: Adding efficient and flexible control for diffusion transformer

    Yuxuan Zhang, Yirui Yuan, Yiren Song, Haofan Wang, and Jiaming Liu. Easycontrol: Adding efficient and flexible control for diffusion transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19513–19524, 2025. 2

  49. [49]

    Invisible image watermarks are provably removable using generative ai.Advances in neural information processing systems, 37:8643–8672, 2024

    Xuandong Zhao, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu- Xiang Wang, and Lei Li. Invisible image watermarks are provably removable using generative ai.Advances in neural information processing systems, 37:8643–8672, 2024. 8

  50. [50]

    Proactive deepfake defence via identity water- marking

    Yuan Zhao, Bo Liu, Ming Ding, Baoping Liu, Tianqing Zhu, and Xin Yu. Proactive deepfake defence via identity water- marking. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 4602–4611, 2023. 2

  51. [51]

    Conditional prompt learning for vision-language mod- els

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language mod- els. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 16816–16825,

  52. [52]

    Hidden: Hiding data with deep networks

    Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. Hidden: Hiding data with deep networks. InProceedings of the European conference on computer vision (ECCV), pages 657–672, 2018. 2

  53. [53]

    Watermark-embedded adversarial examples for copyright protection against diffusion models

    Peifei Zhu, Tsubasa Takahashi, and Hirokatsu Kataoka. Watermark-embedded adversarial examples for copyright protection against diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24420–24430, 2024. 1