TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

John Collomosse; Li Zhang; Pengtao Xie; Shruti Agarwal; Vishal Asnani

arxiv: 2602.19019 · v2 · submitted 2026-02-22 · 💻 cs.CV

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

Li Zhang , Shruti Agarwal , John Collomosse , Pengtao Xie , Vishal Asnani This is my paper

Pith reviewed 2026-05-15 20:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords watermarkingdiffusion modelsmulti-concept attributiongenerative AIlatent perturbationquery-based retrievalintellectual property

0 comments

The pith

TokenTrace attributes multiple concepts in one AI-generated image by recovering watermarked tokens from jointly perturbed text and noise embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TokenTrace as a proactive watermarking approach for diffusion models that must handle composite images containing both objects and styles. Prior methods embed signals that cannot be separated when multiple concepts share the same output, leaving attribution ambiguous. The new framework perturbs both the text prompt embedding and the initial latent noise at generation time to create independent signatures. A query-driven retrieval module then accepts the finished image plus a text description of the desired concept and extracts the matching signature without cross-talk. This yields state-of-the-art accuracy on single- and multi-concept tasks while preserving visual fidelity and surviving common edits.

Core claim

TokenTrace embeds secret signatures by simultaneously perturbing the text prompt embedding and the initial latent noise that steer a diffusion model. Retrieval uses a query-based TokenTrace module that receives the generated image together with a textual query naming the concept to check; the module then disentangles and verifies each signature independently from the shared output.

What carries the argument

The query-based TokenTrace module, which receives an image and a textual concept query to selectively recover and verify the corresponding watermarked signature.

If this is right

Individual concepts such as a specific object or artistic style can be attributed even when they appear together in one image.
The same generated image can be queried multiple times to produce separate attribution reports for different concepts.
The signatures survive standard image transformations while image quality remains high.
The approach outperforms prior single-concept watermarking baselines on both single- and multi-concept test sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The perturbation strategy could be tested on video or 3D diffusion models to see whether frame-to-frame consistency preserves the recoverable signatures.
If the query module runs efficiently, online generators could offer on-demand attribution reports to users or rights holders.
The method suggests a general route for embedding multiple independent controls in latent spaces that might also improve prompt-based editing precision.

Load-bearing premise

Jointly perturbing the text prompt embedding and initial latent noise produces signatures that remain disentangleable by a query module for each concept without visible quality loss.

What would settle it

A side-by-side comparison in which the same image is generated with and without the dual perturbations, followed by checking whether the query module can still recover the correct concept signatures at high accuracy while the visual difference between the two outputs stays imperceptible to human viewers.

Figures

Figures reproduced from arXiv: 2602.19019 by John Collomosse, Li Zhang, Pengtao Xie, Shruti Agarwal, Vishal Asnani.

**Figure 2.** Figure 2: Overview of TokenTrace. (a) Concept encoding: A concept secret is fed into a concept encoder to perturb the targeted concept [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative analysis of visual fidelity for watermarked images. (a) Results on abstract artistic style concepts from the WikiArt [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative example of multi-customized concept pre [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative example of multi-general concept prediction. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Performance on incremental concept learning. The plot [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation study for the length of the bit secret on the [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Generative AI models pose a significant challenge to intellectual property (IP), as they can replicate unique artistic styles and concepts without attribution. While watermarking offers a potential solution, existing methods often fail in complex scenarios where multiple concepts (e.g., an object and an artistic style) are composed within a single image. These methods struggle to disentangle and attribute each concept individually. In this work, we introduce TokenTrace, a novel proactive watermarking framework for robust, multi-concept attribution. Our method embeds secret signatures into the semantic domain by simultaneously perturbing the text prompt embedding and the initial latent noise that guide the diffusion model's generation process. For retrieval, we propose a query-based TokenTrace module that takes the generated image and a textual query specifying which concepts need to be retrieved (e.g., a specific object or style) as inputs. This query-based mechanism allows the module to disentangle and independently verify the presence of multiple concepts from a single generated image. Extensive experiments show that our method achieves state-of-the-art performance on both single-concept (object and style) and multi-concept attribution tasks, significantly outperforming existing baselines while maintaining high visual quality and robustness to common transformations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TokenTrace's dual perturbation of prompt embeddings and latent noise plus query module extends single-concept watermarking to multi-concept cases, but separability of the signatures is not shown and the abstract gives no metrics to back the SOTA claim.

read the letter

The key thing to know about TokenTrace is that it proposes embedding watermarks for multiple concepts by jointly perturbing the text prompt embedding and the initial latent noise in a diffusion model, then using a query-based module to recover which concepts are in the output image. This is a non-trivial step past single-concept methods. The paper does a good job identifying the gap in handling composite images with both objects and styles, and the query-driven retrieval is a practical addition that could allow targeted checks without scanning for everything at once. Framing it around IP protection for generative AI is timely, and the claim of maintaining high visual quality under transformations is the kind of thing that would make it usable if demonstrated. The soft spots are around the core mechanism. Jointly changing the embedding and noise risks creating mixed signals that the diffusion steps then entangle further, so independent recovery for each concept may not be reliable. The stress-test concern about crosstalk is fair; without shown orthogonality or tests for false positives when querying one concept in the presence of another, the multi-concept performance is not convincingly supported. The abstract states SOTA results and robustness but includes no quantitative metrics, ablation studies, or protocol details, which leaves the evidence thin. This paper is aimed at people working on attribution and watermarking techniques for AI-generated content. A reader looking for new ideas in proactive IP tools could pick up the dual perturbation and query approach, though they would want to see the full experimental validation before relying on it. I recommend putting it through peer review. The problem it tackles is real and the construction has enough originality that referees can push on the disentanglement evidence and experimental rigor. It is worth the time rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TokenTrace, a proactive watermarking framework for multi-concept attribution in diffusion models. Signatures are embedded by jointly perturbing the text prompt embedding and initial latent noise; a query-based TokenTrace module then takes a generated image plus a textual query to disentangle and verify the presence of individual concepts (objects or styles). The central claim is that this yields state-of-the-art performance on both single-concept and multi-concept tasks while preserving visual quality and robustness to common transformations.

Significance. If the separability of jointly embedded signatures can be rigorously demonstrated, the work would meaningfully advance IP attribution for composite generations, a setting where prior watermarking methods are known to fail. The query-driven recovery mechanism is a novel construction that could generalize beyond the reported setting, but its practical significance depends on quantitative evidence that the diffusion mixing does not produce unrecoverable crosstalk.

major comments (2)

[§3] §3 (perturbation design): the claim that simultaneous perturbation of prompt embedding and initial noise produces independently recoverable signatures is load-bearing for the multi-concept results, yet the diffusion trajectory mixes these inputs over many steps; no orthogonality argument, crosstalk bound, or ablation (e.g., false-positive rate when querying one concept while another is present) is supplied to show that cross-term interference remains negligible.
[§4] §4 (experiments): the abstract asserts SOTA performance and robustness on multi-concept tasks, but the provided text supplies no quantitative metrics, tables of attribution accuracy, ablation studies, or experimental protocol; without these the central empirical claim cannot be evaluated and the outperformance statement remains unsupported.

minor comments (2)

[§3.1] Notation for the perturbation operators applied to the text embedding and latent noise should be made explicit (e.g., additive delta or learned offset) rather than described at a high level.
[Figure 2] Figure captions and the query-module diagram should include a multi-concept example showing independent retrieval of object and style from the same image to illustrate the disentanglement claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address the major comments point by point below, and we will incorporate the suggested improvements in the revised version.

read point-by-point responses

Referee: [§3] §3 (perturbation design): the claim that simultaneous perturbation of prompt embedding and initial noise produces independently recoverable signatures is load-bearing for the multi-concept results, yet the diffusion trajectory mixes these inputs over many steps; no orthogonality argument, crosstalk bound, or ablation (e.g., false-positive rate when querying one concept while another is present) is supplied to show that cross-term interference remains negligible.

Authors: We recognize that the mixing in the diffusion process could potentially introduce crosstalk between the embedded signatures. Our design relies on the query-based TokenTrace module to disentangle concepts by conditioning on specific textual queries, which we believe minimizes interference. However, to strengthen this claim, we will include an orthogonality analysis of the perturbations and additional ablation experiments measuring false-positive rates when querying for one concept in the presence of others. These will be added to Section 3 in the revised manuscript. revision: yes
Referee: [§4] §4 (experiments): the abstract asserts SOTA performance and robustness on multi-concept tasks, but the provided text supplies no quantitative metrics, tables of attribution accuracy, ablation studies, or experimental protocol; without these the central empirical claim cannot be evaluated and the outperformance statement remains unsupported.

Authors: We apologize for any lack of clarity in the experimental presentation. The full manuscript includes detailed experimental results in Section 4, with tables reporting attribution accuracies, robustness metrics, and comparisons to baselines. To address this concern, we will expand the experimental section with more explicit tables, ablation studies, and a clearer description of the protocol to make the SOTA claims fully supported and evaluable. revision: yes

Circularity Check

0 steps flagged

No circularity; new construction with experimental validation

full rationale

The paper presents TokenTrace as a novel proactive watermarking method that embeds signatures by jointly perturbing text prompt embeddings and initial latent noise, then recovers via a query-based module. No equations, derivations, or claims reduce by construction to self-definitions, fitted inputs renamed as predictions, or self-citation chains. Performance assertions rest on external experiments rather than tautological re-derivations of inputs. The method is self-contained against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method description implies but does not quantify any perturbation strength or query threshold.

pith-pipeline@v0.9.0 · 5514 in / 1156 out tokens · 24797 ms · 2026-05-15T20:53:18.657008+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 1 internal anchor

[1]

Proactive image manipulation detection

Vishal Asnani, Xi Yin, Tal Hassner, Sijia Liu, and Xiaoming Liu. Proactive image manipulation detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15386–15395, 2022. 2

work page 2022
[2]

Malp: Manipulation localization using a proactive scheme

Vishal Asnani, Xi Yin, Tal Hassner, and Xiaoming Liu. Malp: Manipulation localization using a proactive scheme. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12343–12352, 2023. 2

work page 2023
[3]

ProMark: Proactive diffusion watermarking for causal attribution

Vishal Asnani, John Collomosse, Tu Bui, Xiaoming Liu, and Shruti Agarwal. ProMark: Proactive diffusion watermarking for causal attribution. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10802–10811, 2024. 1, 2, 5

work page 2024
[4]

Custommark: Customization of diffusion models for proactive attribution

Vishal Asnani, John Collomosse, Xiaoming Liu, and Shruti Agarwal. Custommark: Customization of diffusion models for proactive attribution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1512– 1522, 2025. 1, 2, 5

work page 2025
[5]

Foundation models defining a new era in vision: a survey and outlook

Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 47(4):2245–2264, 2025. 1

work page 2025
[6]

EKILA: Synthetic me- dia provenance and attribution for generative art

Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, An- drew Gilbert, and John Collomosse. EKILA: Synthetic me- dia provenance and attribution for generative art. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 913–922, 2023. 5

work page 2023
[7]

Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020. 5, 6

work page 1901
[8]

Anydoor: Zero-shot object-level im- age customization

Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. Anydoor: Zero-shot object-level im- age customization. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6593–6602, 2024. 2

work page 2024
[9]

Good models borrow, great models steal: intellectual property rights and generative ai.Policy and Society, 44(1):23–37, 2025

Simon Chesterman. Good models borrow, great models steal: intellectual property rights and generative ai.Policy and Society, 44(1):23–37, 2025. 2

work page 2025
[10]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 5, 6

work page 2009
[11]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InIn- ternational Conference on Machine Learning, pages 12606– 12633. PMLR, 2024. 1

work page 2024
[12]

Catch you everything everywhere: Guarding textual inversion via concept watermarking,

Weitao Feng, Jiyan He, Jie Zhang, Tianwei Zhang, Wenbo Zhou, Weiming Zhang, and Nenghai Yu. Catch you every- thing everywhere: Guarding textual inversion via concept watermarking.arXiv preprint arXiv:2309.05940, 2023. 2

work page arXiv 2023
[13]

The stable signature: Rooting watermarks in latent diffusion models

Pierre Fernandez, Guillaume Couairon, Herv ´e J ´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 22466–22477, 2023. 1, 2

work page 2023
[14]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patash- nik, Amit H Bermano, Gal Chechik, and Daniel Cohen- Or. An image is worth one word: Personalizing text-to- image generation using textual inversion.arXiv preprint arXiv:2208.01618, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[15]

Robust DWT-SVD domain image watermarking: embedding data in all frequen- cies

Emir Ganic and Ahmet M Eskicioglu. Robust DWT-SVD domain image watermarking: embedding data in all frequen- cies. InProceedings of the 2004 Workshop on Multimedia and Security, pages 166–174, 2004. 2

work page 2004
[16]

Parameter-efficient transfer learning for nlp

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019. 4

work page 2019
[17]

Robin: Robust and invisible watermarks for diffusion models with adversarial optimization.Advances in Neural Information Processing Systems, 37:3937–3963, 2024

Huayang Huang, Yu Wu, and Qian Wang. Robin: Robust and invisible watermarks for diffusion models with adversarial optimization.Advances in Neural Information Processing Systems, 37:3937–3963, 2024. 2

work page 2024
[18]

Diffusion model-based image editing: A survey.IEEE transactions on pattern analysis and machine intelligence, 47(6):4409–4437, 2025

Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey.IEEE transactions on pattern analysis and machine intelligence, 47(6):4409–4437, 2025. 1

work page 2025
[19]

Vi- sual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean conference on computer vision, pages 709–727. Springer, 2022. 1

work page 2022
[20]

Transparent medical image ai via an image–text foundation model grounded in medical literature.Nature medicine, 30(4):1154–1165, 2024

Chanwoo Kim, Soham U Gadgil, Alex J DeGrave, Jesut- ofunmi A Omiye, Zhuo Ran Cai, Roxana Daneshjou, and Su-In Lee. Transparent medical image ai via an image–text foundation model grounded in medical literature.Nature medicine, 30(4):1154–1165, 2024. 1

work page 2024
[21]

Multi-concept customization of text-to-image diffusion

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1931–1941, 2023. 2

work page 1931
[22]

Photomaker: Customizing re- alistic human photos via stacked id embedding

Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming- Ming Cheng, and Ying Shan. Photomaker: Customizing re- alistic human photos via stacked id embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8640–8650, 2024. 2

work page 2024
[23]

Black-box forgery attacks on se- mantic watermarks for diffusion models

Andreas M ¨uller, Denis Lukovnikov, Jonas Thietke, Asja Fis- cher, and Erwin Quiring. Black-box forgery attacks on se- mantic watermarks for diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 20937–20946, 2025. 2

work page 2025
[24]

A survey on proactive deepfake defense: Disruption and watermarking.ACM Computing Surveys, 58 (5):1–37, 2025

Hong-Hanh Nguyen-Le, Van-Tuan Tran, Thuc Nguyen, and Nhien-An Le-Khac. A survey on proactive deepfake defense: Disruption and watermarking.ACM Computing Surveys, 58 (5):1–37, 2025. 2

work page 2025
[25]

A self-supervised descriptor for image copy detection

Ed Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal, and Matthijs Douze. A self-supervised descriptor for image copy detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 14532–14542, 2022. 5

work page 2022
[26]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 4, 5

work page 2021
[27]

Lawa: Using latent space for in-generation image watermarking

Ahmad Rezaei, Mohammad Akbari, Saeed Ranjbar Alvar, Arezou Fatemi, and Yong Zhang. Lawa: Using latent space for in-generation image watermarking. InEuropean Confer- ence on Computer Vision, pages 118–136. Springer, 2024. 1

work page 2024
[28]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 2, 5

work page 2022
[29]

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500– 22510, 2023. 1, 2

work page 2023
[30]

ALADIN: All layer adaptive instance normalization for fine- grained style similarity

Dan Ruta, Saeid Motiian, Baldo Faieta, Zhe Lin, Hailin Jin, Alex Filipkowski, Andrew Gilbert, and John Collomosse. ALADIN: All layer adaptive instance normalization for fine- grained style similarity. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11926– 11935, 2021. 5

work page 2021
[31]

Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022. 2

work page 2022
[32]

Watermark anything with localized messages

Tom Sander, Pierre Fernandez, Alain Durmus, Teddy Furon, and Matthijs Douze. Watermark anything with localized messages. InInternational Conference on Learning Repre- sentations (ICLR), 2025. 1, 2

work page 2025
[33]

A novel technique for digital image water- marking in spatial domain

Amit Kumar Singh, Nomit Sharma, Mayank Dave, and Anand Mohan. A novel technique for digital image water- marking in spatial domain. In2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing, pages 497–501. IEEE, 2012. 2

work page 2012
[34]

Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024

Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shra- may Palta, Micah Goldblum, Jonas Geiping, Abhinav Shri- vastava, and Tom Goldstein. Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024. 4

work page arXiv 2024
[35]

Improved artgan for conditional synthesis of natural image and artwork.IEEE Transactions on Image Processing, 28(1):394–409, 2018

Wei Ren Tan, Chee Seng Chan, Hernan E Aguirre, and Kiyoshi Tanaka. Improved artgan for conditional synthesis of natural image and artwork.IEEE Transactions on Image Processing, 28(1):394–409, 2018. 5

work page 2018
[36]

Saliltorn Thongmeensuk. Rethinking copyright exceptions in the era of generative ai: Balancing innovation and intel- lectual property protection.The Journal of World Intellectual Property, 27(2):278–295, 2024. 2

work page 2024
[37]

Copyright, text & data mining and the in- novation dimension of generative ai.Journal of Intellectual Property Law & Practice, 19(7):557–570, 2024

Kalpana Tyagi. Copyright, text & data mining and the in- novation dimension of generative ai.Journal of Intellectual Property Law & Practice, 19(7):557–570, 2024. 2

work page 2024
[38]

Must: Robust image watermarking for multi-source tracing

Guanjie Wang, Zehua Ma, Chang Liu, Xi Yang, Han Fang, Weiming Zhang, and Nenghai Yu. Must: Robust image watermarking for multi-source tracing. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5364– 5371, 2024. 1

work page 2024
[39]

ROAR: Reducing inversion error in generative im- age watermarking

Hanyi Wang, Han Fang, Shi-Lin Wang, and Ee-Chien Chang. ROAR: Reducing inversion error in generative im- age watermarking. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 19742–19751,

work page
[40]

Faketagger: Robust safeguards against deepfake dis- semination via provenance tracking

Run Wang, Felix Juefei-Xu, Meng Luo, Yang Liu, and Lina Wang. Faketagger: Robust safeguards against deepfake dis- semination via provenance tracking. InProceedings of the 29th ACM international conference on multimedia, pages 3546–3555, 2021. 2

work page 2021
[41]

Evaluating data attribution for text-to-image models

Sheng-Yu Wang, Alexei A Efros, Jun-Yan Zhu, and Richard Zhang. Evaluating data attribution for text-to-image models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7192–7203, 2023. 5

work page 2023
[42]

Designdiffusion: High- quality text-to-design image generation with diffusion mod- els

Zhendong Wang, Jianmin Bao, Shuyang Gu, Dong Chen, Wengang Zhou, and Houqiang Li. Designdiffusion: High- quality text-to-design image generation with diffusion mod- els. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 20906–20915, 2025. 2

work page 2025
[43]

Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models

Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, and Zhengzhong Tu. Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 8213–8224, 2025. 1, 2

work page 2025
[44]

Elite: Encoding visual con- cepts into textual embeddings for customized text-to-image generation

Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, and Wangmeng Zuo. Elite: Encoding visual con- cepts into textual embeddings for customized text-to-image generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15943–15953, 2023. 2

work page 2023
[45]

Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models

Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weim- ing Zhang, and Nenghai Yu. Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12162– 12171, 2024. 2

work page 2024
[46]

Forget-me-not: Learning to for- get in text-to-image diffusion models

Gong Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, and Humphrey Shi. Forget-me-not: Learning to for- get in text-to-image diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1755–1764, 2024. 2

work page 2024
[47]

Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, and Jian Zhang. Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 3008–3018,

work page
[48]

Easycontrol: Adding efficient and flexible control for diffusion transformer

Yuxuan Zhang, Yirui Yuan, Yiren Song, Haofan Wang, and Jiaming Liu. Easycontrol: Adding efficient and flexible control for diffusion transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19513–19524, 2025. 2

work page 2025
[49]

Invisible image watermarks are provably removable using generative ai.Advances in neural information processing systems, 37:8643–8672, 2024

Xuandong Zhao, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu- Xiang Wang, and Lei Li. Invisible image watermarks are provably removable using generative ai.Advances in neural information processing systems, 37:8643–8672, 2024. 8

work page 2024
[50]

Proactive deepfake defence via identity water- marking

Yuan Zhao, Bo Liu, Ming Ding, Baoping Liu, Tianqing Zhu, and Xin Yu. Proactive deepfake defence via identity water- marking. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 4602–4611, 2023. 2

work page 2023
[51]

Conditional prompt learning for vision-language mod- els

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language mod- els. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 16816–16825,

work page
[52]

Hidden: Hiding data with deep networks

Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. Hidden: Hiding data with deep networks. InProceedings of the European conference on computer vision (ECCV), pages 657–672, 2018. 2

work page 2018
[53]

Watermark-embedded adversarial examples for copyright protection against diffusion models

Peifei Zhu, Tsubasa Takahashi, and Hirokatsu Kataoka. Watermark-embedded adversarial examples for copyright protection against diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24420–24430, 2024. 1

work page 2024

[1] [1]

Proactive image manipulation detection

Vishal Asnani, Xi Yin, Tal Hassner, Sijia Liu, and Xiaoming Liu. Proactive image manipulation detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15386–15395, 2022. 2

work page 2022

[2] [2]

Malp: Manipulation localization using a proactive scheme

Vishal Asnani, Xi Yin, Tal Hassner, and Xiaoming Liu. Malp: Manipulation localization using a proactive scheme. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12343–12352, 2023. 2

work page 2023

[3] [3]

ProMark: Proactive diffusion watermarking for causal attribution

Vishal Asnani, John Collomosse, Tu Bui, Xiaoming Liu, and Shruti Agarwal. ProMark: Proactive diffusion watermarking for causal attribution. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10802–10811, 2024. 1, 2, 5

work page 2024

[4] [4]

Custommark: Customization of diffusion models for proactive attribution

Vishal Asnani, John Collomosse, Xiaoming Liu, and Shruti Agarwal. Custommark: Customization of diffusion models for proactive attribution. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1512– 1522, 2025. 1, 2, 5

work page 2025

[5] [5]

Foundation models defining a new era in vision: a survey and outlook

Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 47(4):2245–2264, 2025. 1

work page 2025

[6] [6]

EKILA: Synthetic me- dia provenance and attribution for generative art

Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, An- drew Gilbert, and John Collomosse. EKILA: Synthetic me- dia provenance and attribution for generative art. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 913–922, 2023. 5

work page 2023

[7] [7]

Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020. 5, 6

work page 1901

[8] [8]

Anydoor: Zero-shot object-level im- age customization

Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. Anydoor: Zero-shot object-level im- age customization. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6593–6602, 2024. 2

work page 2024

[9] [9]

Good models borrow, great models steal: intellectual property rights and generative ai.Policy and Society, 44(1):23–37, 2025

Simon Chesterman. Good models borrow, great models steal: intellectual property rights and generative ai.Policy and Society, 44(1):23–37, 2025. 2

work page 2025

[10] [10]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 5, 6

work page 2009

[11] [11]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InIn- ternational Conference on Machine Learning, pages 12606– 12633. PMLR, 2024. 1

work page 2024

[12] [12]

Catch you everything everywhere: Guarding textual inversion via concept watermarking,

Weitao Feng, Jiyan He, Jie Zhang, Tianwei Zhang, Wenbo Zhou, Weiming Zhang, and Nenghai Yu. Catch you every- thing everywhere: Guarding textual inversion via concept watermarking.arXiv preprint arXiv:2309.05940, 2023. 2

work page arXiv 2023

[13] [13]

The stable signature: Rooting watermarks in latent diffusion models

Pierre Fernandez, Guillaume Couairon, Herv ´e J ´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 22466–22477, 2023. 1, 2

work page 2023

[14] [14]

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patash- nik, Amit H Bermano, Gal Chechik, and Daniel Cohen- Or. An image is worth one word: Personalizing text-to- image generation using textual inversion.arXiv preprint arXiv:2208.01618, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022

[15] [15]

Robust DWT-SVD domain image watermarking: embedding data in all frequen- cies

Emir Ganic and Ahmet M Eskicioglu. Robust DWT-SVD domain image watermarking: embedding data in all frequen- cies. InProceedings of the 2004 Workshop on Multimedia and Security, pages 166–174, 2004. 2

work page 2004

[16] [16]

Parameter-efficient transfer learning for nlp

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning, pages 2790–2799. PMLR, 2019. 4

work page 2019

[17] [17]

Robin: Robust and invisible watermarks for diffusion models with adversarial optimization.Advances in Neural Information Processing Systems, 37:3937–3963, 2024

Huayang Huang, Yu Wu, and Qian Wang. Robin: Robust and invisible watermarks for diffusion models with adversarial optimization.Advances in Neural Information Processing Systems, 37:3937–3963, 2024. 2

work page 2024

[18] [18]

Diffusion model-based image editing: A survey.IEEE transactions on pattern analysis and machine intelligence, 47(6):4409–4437, 2025

Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey.IEEE transactions on pattern analysis and machine intelligence, 47(6):4409–4437, 2025. 1

work page 2025

[19] [19]

Vi- sual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean conference on computer vision, pages 709–727. Springer, 2022. 1

work page 2022

[20] [20]

Transparent medical image ai via an image–text foundation model grounded in medical literature.Nature medicine, 30(4):1154–1165, 2024

Chanwoo Kim, Soham U Gadgil, Alex J DeGrave, Jesut- ofunmi A Omiye, Zhuo Ran Cai, Roxana Daneshjou, and Su-In Lee. Transparent medical image ai via an image–text foundation model grounded in medical literature.Nature medicine, 30(4):1154–1165, 2024. 1

work page 2024

[21] [21]

Multi-concept customization of text-to-image diffusion

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1931–1941, 2023. 2

work page 1931

[22] [22]

Photomaker: Customizing re- alistic human photos via stacked id embedding

Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming- Ming Cheng, and Ying Shan. Photomaker: Customizing re- alistic human photos via stacked id embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8640–8650, 2024. 2

work page 2024

[23] [23]

Black-box forgery attacks on se- mantic watermarks for diffusion models

Andreas M ¨uller, Denis Lukovnikov, Jonas Thietke, Asja Fis- cher, and Erwin Quiring. Black-box forgery attacks on se- mantic watermarks for diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 20937–20946, 2025. 2

work page 2025

[24] [24]

A survey on proactive deepfake defense: Disruption and watermarking.ACM Computing Surveys, 58 (5):1–37, 2025

Hong-Hanh Nguyen-Le, Van-Tuan Tran, Thuc Nguyen, and Nhien-An Le-Khac. A survey on proactive deepfake defense: Disruption and watermarking.ACM Computing Surveys, 58 (5):1–37, 2025. 2

work page 2025

[25] [25]

A self-supervised descriptor for image copy detection

Ed Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal, and Matthijs Douze. A self-supervised descriptor for image copy detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 14532–14542, 2022. 5

work page 2022

[26] [26]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 4, 5

work page 2021

[27] [27]

Lawa: Using latent space for in-generation image watermarking

Ahmad Rezaei, Mohammad Akbari, Saeed Ranjbar Alvar, Arezou Fatemi, and Yong Zhang. Lawa: Using latent space for in-generation image watermarking. InEuropean Confer- ence on Computer Vision, pages 118–136. Springer, 2024. 1

work page 2024

[28] [28]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 2, 5

work page 2022

[29] [29]

Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500– 22510, 2023. 1, 2

work page 2023

[30] [30]

ALADIN: All layer adaptive instance normalization for fine- grained style similarity

Dan Ruta, Saeid Motiian, Baldo Faieta, Zhe Lin, Hailin Jin, Alex Filipkowski, Andrew Gilbert, and John Collomosse. ALADIN: All layer adaptive instance normalization for fine- grained style similarity. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11926– 11935, 2021. 5

work page 2021

[31] [31]

Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022. 2

work page 2022

[32] [32]

Watermark anything with localized messages

Tom Sander, Pierre Fernandez, Alain Durmus, Teddy Furon, and Matthijs Douze. Watermark anything with localized messages. InInternational Conference on Learning Repre- sentations (ICLR), 2025. 1, 2

work page 2025

[33] [33]

A novel technique for digital image water- marking in spatial domain

Amit Kumar Singh, Nomit Sharma, Mayank Dave, and Anand Mohan. A novel technique for digital image water- marking in spatial domain. In2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing, pages 497–501. IEEE, 2012. 2

work page 2012

[34] [34]

Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024

Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shra- may Palta, Micah Goldblum, Jonas Geiping, Abhinav Shri- vastava, and Tom Goldstein. Measuring style similarity in diffusion models.arXiv preprint arXiv:2404.01292, 2024. 4

work page arXiv 2024

[35] [35]

Improved artgan for conditional synthesis of natural image and artwork.IEEE Transactions on Image Processing, 28(1):394–409, 2018

Wei Ren Tan, Chee Seng Chan, Hernan E Aguirre, and Kiyoshi Tanaka. Improved artgan for conditional synthesis of natural image and artwork.IEEE Transactions on Image Processing, 28(1):394–409, 2018. 5

work page 2018

[36] [36]

Saliltorn Thongmeensuk. Rethinking copyright exceptions in the era of generative ai: Balancing innovation and intel- lectual property protection.The Journal of World Intellectual Property, 27(2):278–295, 2024. 2

work page 2024

[37] [37]

Copyright, text & data mining and the in- novation dimension of generative ai.Journal of Intellectual Property Law & Practice, 19(7):557–570, 2024

Kalpana Tyagi. Copyright, text & data mining and the in- novation dimension of generative ai.Journal of Intellectual Property Law & Practice, 19(7):557–570, 2024. 2

work page 2024

[38] [38]

Must: Robust image watermarking for multi-source tracing

Guanjie Wang, Zehua Ma, Chang Liu, Xi Yang, Han Fang, Weiming Zhang, and Nenghai Yu. Must: Robust image watermarking for multi-source tracing. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5364– 5371, 2024. 1

work page 2024

[39] [39]

ROAR: Reducing inversion error in generative im- age watermarking

Hanyi Wang, Han Fang, Shi-Lin Wang, and Ee-Chien Chang. ROAR: Reducing inversion error in generative im- age watermarking. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 19742–19751,

work page

[40] [40]

Faketagger: Robust safeguards against deepfake dis- semination via provenance tracking

Run Wang, Felix Juefei-Xu, Meng Luo, Yang Liu, and Lina Wang. Faketagger: Robust safeguards against deepfake dis- semination via provenance tracking. InProceedings of the 29th ACM international conference on multimedia, pages 3546–3555, 2021. 2

work page 2021

[41] [41]

Evaluating data attribution for text-to-image models

Sheng-Yu Wang, Alexei A Efros, Jun-Yan Zhu, and Richard Zhang. Evaluating data attribution for text-to-image models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7192–7203, 2023. 5

work page 2023

[42] [42]

Designdiffusion: High- quality text-to-design image generation with diffusion mod- els

Zhendong Wang, Jianmin Bao, Shuyang Gu, Dong Chen, Wengang Zhou, and Houqiang Li. Designdiffusion: High- quality text-to-design image generation with diffusion mod- els. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 20906–20915, 2025. 2

work page 2025

[43] [43]

Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models

Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, and Zhengzhong Tu. Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 8213–8224, 2025. 1, 2

work page 2025

[44] [44]

Elite: Encoding visual con- cepts into textual embeddings for customized text-to-image generation

Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, and Wangmeng Zuo. Elite: Encoding visual con- cepts into textual embeddings for customized text-to-image generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15943–15953, 2023. 2

work page 2023

[45] [45]

Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models

Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weim- ing Zhang, and Nenghai Yu. Gaussian shading: Prov- able performance-lossless image watermarking for diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12162– 12171, 2024. 2

work page 2024

[46] [46]

Forget-me-not: Learning to for- get in text-to-image diffusion models

Gong Zhang, Kai Wang, Xingqian Xu, Zhangyang Wang, and Humphrey Shi. Forget-me-not: Learning to for- get in text-to-image diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1755–1764, 2024. 2

work page 2024

[47] [47]

Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking

Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, and Jian Zhang. Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 3008–3018,

work page

[48] [48]

Easycontrol: Adding efficient and flexible control for diffusion transformer

Yuxuan Zhang, Yirui Yuan, Yiren Song, Haofan Wang, and Jiaming Liu. Easycontrol: Adding efficient and flexible control for diffusion transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19513–19524, 2025. 2

work page 2025

[49] [49]

Invisible image watermarks are provably removable using generative ai.Advances in neural information processing systems, 37:8643–8672, 2024

Xuandong Zhao, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu- Xiang Wang, and Lei Li. Invisible image watermarks are provably removable using generative ai.Advances in neural information processing systems, 37:8643–8672, 2024. 8

work page 2024

[50] [50]

Proactive deepfake defence via identity water- marking

Yuan Zhao, Bo Liu, Ming Ding, Baoping Liu, Tianqing Zhu, and Xin Yu. Proactive deepfake defence via identity water- marking. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 4602–4611, 2023. 2

work page 2023

[51] [51]

Conditional prompt learning for vision-language mod- els

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language mod- els. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 16816–16825,

work page

[52] [52]

Hidden: Hiding data with deep networks

Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. Hidden: Hiding data with deep networks. InProceedings of the European conference on computer vision (ECCV), pages 657–672, 2018. 2

work page 2018

[53] [53]

Watermark-embedded adversarial examples for copyright protection against diffusion models

Peifei Zhu, Tsubasa Takahashi, and Hirokatsu Kataoka. Watermark-embedded adversarial examples for copyright protection against diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24420–24430, 2024. 1

work page 2024