FedOT: Ownership Verification and Leakage Tracing via Watermarks for Federated LDMs

Jiaxu Miao; Wenlong Cheng; Yuan Gan; Yunqiu Xu

arxiv: 2606.22875 · v1 · pith:RTNNJ4Q6new · submitted 2026-06-22 · 💻 cs.CV

FedOT: Ownership Verification and Leakage Tracing via Watermarks for Federated LDMs

Wenlong Cheng , Yuan Gan , Yunqiu Xu , Jiaxu Miao This is my paper

Pith reviewed 2026-06-26 09:15 UTC · model grok-4.3

classification 💻 cs.CV

keywords federated learninglatent diffusion modelswatermarkingownership verificationleakage tracingVAE replacementmodel protection

0 comments

The pith

FedOT embeds chunked watermarks tied to a latent transformation so that ownership can be verified and leaks traced to specific clients in federated latent diffusion models while VAE swaps destroy usability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that existing VAE watermark methods fail for federated LDMs because they permit only ownership checks and can be stripped by decoder replacement. FedOT addresses both gaps by splitting the watermark into an ownership segment and a client identifier, then applying a Latent Vector Transformation that alters the VAE latent distribution to bind it tightly to the U-Net. If the claim holds, any attempt to remove the mark by swapping the VAE produces images too degraded for practical use. A sympathetic reader would care because federated training necessarily distributes the model to multiple parties, creating resale and misuse risks that current techniques cannot contain.

Core claim

The central claim is that a chunked watermark combined with Latent Vector Transformation creates the first practical system for both ownership verification and client-specific leakage tracing in federated LDMs; the transformation strengthens the link between VAE and U-Net latent spaces so that decoder replacement necessarily degrades generated image quality to the point the model becomes unusable.

What carries the argument

The chunked watermark (first segment for ownership verification, second for client identification) together with Latent Vector Transformation (LVT), which modifies the original VAE latent distribution to create a non-removable dependence on the U-Net.

If this is right

The first watermark chunk enables standard ownership verification across all clients.
The second watermark chunk allows identification of the exact client responsible for any leaked copy.
LVT prevents simple VAE replacement attacks by tying latent spaces so tightly that removal destroys model utility.
The overall scheme works inside the federated training loop without requiring changes to the core diffusion training process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If LVT-style binding proves robust, similar latent-space tying could be applied to protect other components shared in federated generative pipelines.
The chunked design suggests a general pattern for multi-purpose watermarks where verification and attribution are separated into independent segments.
Adoption would require federated protocols to include watermark extraction steps during model distribution audits.

Load-bearing premise

Replacing the VAE after LVT has been applied will always produce unusable image quality rather than allowing a clean swap that keeps the model functional while removing the watermark.

What would settle it

Train a FedOT-protected LDM, replace its VAE with an unmodified clean decoder, generate a set of images, and measure both perceptual quality metrics and watermark detection accuracy; high quality with absent watermark falsifies the protection claim.

Figures

Figures reproduced from arXiv: 2606.22875 by Jiaxu Miao, Wenlong Cheng, Yuan Gan, Yunqiu Xu.

**Figure 1.** Figure 1: Motivation of FedOT. (a) Malicious clients leak the federated LDMs, verifying ownership or tracing the source of generated images becomes infeasible. (b) FedOT supports ownership verification and malicious client tracing even if the leaked model is misused, by using chunked watermark and Latent Vector Transformation (LVT). 61]. Traditionally, these models rely on centralized training over massive opensou… view at source ↗

**Figure 2.** Figure 2: Overview of FedOT. (a) FedOT Workflow. The server applies LVT to the VAE, then embeds watermarks, and distributes model replicas to clients. Clients train the SD model locally and upload updates. The server verifies watermarks and aggregates only U-Net parameters. (b) Details of LVT within FedOT. Stage I: Transform latent vector z to z ′ and train the encoder. Stage II: Use the trained encoder to generate … view at source ↗

**Figure 3.** Figure 3: VAE reconstruction results after training the encoder with different LVT transformations. Although standard Gaussian noise is added in each training iteration, the VAE cannot learn a consistent deterministic mapping to counteract this perturbation. To seek a deterministic approach, we draw inspiration from recent findings [37], which preliminarily observed that applying constant shifts to the latent spac… view at source ↗

**Figure 4.** Figure 4: Comparison between different FedOT methods and Stable Signature*. "Clean" represents the results without VAE replacement attacks, while "Attack" represents the results after VAE replacement attacks. This figure corresponds to [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: FID variation during the purification process for 3 different FedOT methods. A purification attack targets watermarks or hidden information embedded in model parameters. In this scenario, the attacker fine-tunes or optimizes the model using a clean dataset without any watermark, aiming to "purify" the model by removing the watermark while preserving the model’s original performance as much as possible.… view at source ↗

**Figure 6.** Figure 6: FID trends across training rounds under FedOT framework [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 8.** Figure 8: Training a watermark in decoder with extractor E. Specifically, we first generate a unique binary watermark m for each client. A set of public images is then sampled and passed through a frozen VAE encoder to obtain latent representations z. These latent vectors use a trainable VAE decoder to reconstruct images xˆ, which are then fed into the extractor E to predict the embedded watermark m′ . Importantly, … view at source ↗

**Figure 9.** Figure 9: Pixel-level residual maps between original and watermarked images [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗

**Figure 10.** Figure 10: ROC curves under different detection thresholds τ across FedOT variants. E.5 End-to-End Attribution Accuracy. To further evaluate the practical tracing capability of FedOT, we sample 1,000 generated images per client and perform end-to-end attribution: for each image, we extract the embedded watermark and identify its source client. As shown in [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

**Figure 11.** Figure 11: Additional generated image results under FedOT include various animals, landscapes, and paintings, using prompts from the Laion10K dataset. ensures that high-quality image generation becomes infeasible. Malicious attackers may exploit the compromised model to generate various images for profit. In the following visualizations, the translation coefficient is set to 5. To illustrate this, [PITH_FULL_IMAGE… view at source ↗

**Figure 12.** Figure 12: Additional generated image results under FedOT include various humanrelated images, using prompts from the Laion10K dataset. the latent space has undergone some transformation, it is still difficult to precisely infer T. This is because, during the training of the LVT, the learned VAE parameters approximate the designed mapping but inevitably include minor deviations. These deviations further increase … view at source ↗

read the original abstract

Training Latent Diffusion Models (LDMs) within Federated Learning (FL) has attracted increasing attention due to its ability to combine the powerful generative capacity of LDMs with the privacy-preserving properties of FL. However, FL requires sharing the global model with multiple participants, which risks unauthorized model distribution or resale by malicious clients. While an intuitive approach is to adopt existing VAE-based watermarking techniques for LDMs in FL, this strategy falls short in addressing such threats due to two fundamental challenges: (1) Existing methods support ownership verification but lack the ability to trace model leakage to a specific malicious client; (2) VAE-based watermarks are vulnerable, as they can be removed simply by replacing the decoder with a clean counterpart. In this paper, we propose FedOT, the first framework for ownership verification and leakage tracing in federated LDMs. Specifically, to address the first challenge, we design a chunked watermark, where the first part is for ownership verification, and the second part is used for client identification. Furthermore, to overcome the second challenge and secure the model against VAE replacement attack, we introduce Latent Vector Transformation (LVT), which strengthens the connection between the VAE and U-Net latent spaces by modifying the original latent distribution of the VAE. Consequently, any attempt to replace the VAE for watermark removal leads to significant image quality degradation, making the LDM model unusable. Extensive experiments demonstrate that FedOT achieves superior performance in both ownership verification and traceability. Project page: https://spyzixuan.github.io/FedOT/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FedOT adds chunked watermarks plus LVT to handle both ownership verification and client tracing in federated LDMs while claiming to block VAE swaps, but the load-bearing security claim has no supporting analysis or numbers.

read the letter

The paper's core move is to split the watermark into a verification chunk and a client-ID chunk, then use Latent Vector Transformation to alter the VAE latent distribution so that swapping in a clean decoder tanks generation quality. This directly targets the two gaps the authors flag in prior VAE watermarking for federated LDMs.

The construction itself is straightforward and responds to a real deployment issue: shared models in FL can be redistributed or resold, and simple decoder replacement has been an easy removal path. Credit for naming those limits clearly and offering a single mechanism that tries to solve both.

The soft spot is the security guarantee around LVT. The abstract states that any VAE replacement produces significant degradation and makes the model unusable, yet it supplies no argument that the transformed distribution is hard to invert or that an attacker could not jointly adapt the U-Net with a new decoder to restore both utility and watermark removal. No equations, no invertibility analysis, and the "extensive experiments" are mentioned without metrics, baselines, or attack implementations. That leaves the central claim as an assertion rather than a demonstrated result.

This is for groups already working on IP protection or watermarking inside federated generative pipelines. A reader who needs a concrete starting point for tracing leaks in LDM FL settings can pull the chunked design and LVT idea. The work is coherent on its own terms and engages the right prior limitations, so it clears the bar for a serious referee even though the experiments will need close checking.

Referee Report

2 major / 1 minor

Summary. The paper proposes FedOT, the first framework for ownership verification and leakage tracing in federated LDMs. It uses a chunked watermark (first part for ownership verification, second for client identification) and introduces Latent Vector Transformation (LVT) to modify the VAE latent distribution and strengthen its link to the U-Net, such that replacing the VAE to remove the watermark necessarily causes significant image quality degradation and renders the model unusable. The authors claim that extensive experiments show superior performance in verification and traceability.

Significance. If the LVT security property holds, the work would provide a practical defense against model leakage in federated generative model training, a setting where sharing the global model creates clear IP risks. The chunked watermark design usefully combines verification with traceability in multi-client FL without requiring separate mechanisms.

major comments (2)

[Abstract] Abstract: The central security claim that 'any attempt to replace the VAE for watermark removal leads to significant image quality degradation, making the LDM model unusable' is presented as a direct consequence of LVT but without any analysis, equations, or experiments showing that the transformation is non-invertible or that joint fine-tuning of the U-Net with a clean VAE cannot recover both watermark removal and generation quality. This is the load-bearing assumption for the security guarantee.
[Abstract] Abstract: The assertion of 'superior performance in both ownership verification and traceability' via 'extensive experiments' is unsupported by any reported metrics, baselines, attack implementations, or controls, preventing assessment of the empirical claims.

minor comments (1)

The abstract would be strengthened by including at least summary quantitative results (e.g., verification accuracy, traceability success rates, FID degradation under attack) to allow readers to gauge the strength of the performance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract requires expansion to substantiate its claims and will revise the manuscript to include the requested analysis, equations, and detailed experimental reporting.

read point-by-point responses

Referee: [Abstract] Abstract: The central security claim that 'any attempt to replace the VAE for watermark removal leads to significant image quality degradation, making the LDM model unusable' is presented as a direct consequence of LVT but without any analysis, equations, or experiments showing that the transformation is non-invertible or that joint fine-tuning of the U-Net with a clean VAE cannot recover both watermark removal and generation quality. This is the load-bearing assumption for the security guarantee.

Authors: We acknowledge the abstract presents the LVT security property concisely. The full manuscript describes LVT in Section 3.2 and reports quality degradation under VAE replacement in Section 4.3. To strengthen the claim, we will add a formal definition of the latent distribution shift induced by LVT, a brief invertibility argument, and new experiments evaluating joint fine-tuning of U-Net with a clean VAE in the revised version. revision: yes
Referee: [Abstract] Abstract: The assertion of 'superior performance in both ownership verification and traceability' via 'extensive experiments' is unsupported by any reported metrics, baselines, attack implementations, or controls, preventing assessment of the empirical claims.

Authors: We will revise the abstract to reference specific quantitative results and ensure the experiments section (currently Section 4) explicitly tabulates all metrics, comparison baselines, attack implementations, and control settings so that the superiority claims can be directly evaluated. revision: yes

Circularity Check

0 steps flagged

No circularity: new construction without reductions to inputs or self-citations

full rationale

The paper presents FedOT as an original framework introducing chunked watermarks and Latent Vector Transformation (LVT) to link VAE and U-Net spaces. The abstract asserts that LVT modification causes degradation on VAE replacement, but this is framed as a design consequence rather than derived from equations, fitted parameters, or prior self-citations. No load-bearing steps reduce by construction to inputs; the method is a novel proposal with experimental validation claimed, not a renaming or self-referential definition. The derivation chain is self-contained as an engineering construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the LVT and chunked watermark are presented as new design choices without further decomposition.

pith-pipeline@v0.9.1-grok · 5824 in / 1136 out tokens · 32071 ms · 2026-06-26T09:15:25.929815+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 3 linked inside Pith

[1]

arXiv preprint arXiv:2211.01324 (2022)

Balaji, Y., Nah, S., Huang, X., Vahdat, A., Song, J., Zhang, Q., Kreis, K., Ait- tala, M., Aila, T., Laine, S., et al.: ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324 (2022)

Pith/arXiv arXiv 2022
[2]

In: CVPR (2023)

Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: CVPR (2023)

2023
[3]

arXiv preprint arXiv:1802.07228 (2018)

Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., Dafoe, A., Scharre, P., Zeitzoff, T., Filar, B., et al.: The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv preprint arXiv:1802.07228 (2018)

arXiv 2018
[4]

ACM Computing Surveys (2025)

Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P., Sun, L.: A survey of ai-generated content (aigc). ACM Computing Surveys (2025)

2025
[5]

In: ICML (2025)

Ci, H., Song, Y., Yang, P., Xie, J., Shou, M.Z.: Wmadapter: Adding watermark control to latent diffusion models. In: ICML (2025)

2025
[6]

Morgan Kaufmann (2007)

Cox, I., Miller, M., Bloom, J., Fridrich, J., Kalker, T.: Digital watermarking and steganography. Morgan Kaufmann (2007)

2007
[7]

In: ICML (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: ICML (2024)

2024
[8]

In: CVPR (2021)

Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: CVPR (2021)

2021
[9]

In: USENIX Security (2020)

Fang, M., Cao, X., Jia, J., Gong, N.: Local model poisoning attacks to{Byzantine- Robust}federated learning. In: USENIX Security (2020)

2020
[10]

In: IEEE Security and Privacy Workshops (2021)

Fereidooni, H., Marchal, S., Miettinen, M., Mirhoseini, A., Möllering, H., Nguyen, T.D., Rieger, P., Sadeghi, A.R., Schneider, T., Yalame, H., et al.: Safelearn: Se- cure aggregation for private federated learning. In: IEEE Security and Privacy Workshops (2021)

2021
[11]

In: ICCV (2023)

Fernandez, P., Couairon, G., Jégou, H., Douze, M., Furon, T.: The stable signature: Rooting watermarks in latent diffusion models. In: ICCV (2023)

2023
[12]

arXiv preprint arXiv:1808.04866 (2018)

Fung, C., Yoon, C.J., Beschastnikh, I.: Mitigating sybils in federated learning poi- soning. arXiv preprint arXiv:1808.04866 (2018)

arXiv 2018
[13]

In: CVPR (2025)

Gan, Y., Miao, J., Wang, Y., Yang, Y.: Silence is golden: Leveraging adversarial examples to nullify audio control in ldm-based talking-head generation. In: CVPR (2025)

2025
[14]

In: NeurIPS (2024)

Gan, Y., Miao, J., Yang, Y.: Datastealing: Steal data from diffusion models in federated learning with multiple trojans. In: NeurIPS (2024)

2024
[15]

In: CVPR (2022)

Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., Guo, B.: Vector quantized diffusion model for text-to-image synthesis. In: CVPR (2022)

2022
[16]

In: ICCV (2023)

Han, L., Li, Y., Zhang, H., Milanfar, P., Metaxas, D., Yang, F.: Svdiff: Compact parameter space for diffusion fine-tuning. In: ICCV (2023)

2023
[17]

In: NeurIPS (2017) 16 W

Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS (2017) 16 W. Cheng et al

2017
[18]

In: NeurIPS (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)

2020
[19]

In: NeurIPS (2022)

Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. In: NeurIPS (2022)

2022
[20]

In: ICLR (2025)

Hu, R., Zhang, J., Li, Y., Li, J., Guo, Q., Qiu, H., Zhang, T.: Videoshield: Regu- lating diffusion-based video generation models via watermarking. In: ICLR (2025)

2025
[21]

In: MMM (2026)

Jia, H., Zhao, N., Xu, Y., Zhu, L., Yang, Y.: Gas: Geometry-appearance synergy for consistent video customization. In: MMM (2026)

2026
[22]

In: ECCV (2016)

Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV (2016)

2016
[23]

In: ICLR (2014)

Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)

2014
[24]

The Annals of Math- ematical Statistics (1951)

Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Math- ematical Statistics (1951)

1951
[25]

In: International Conference on Machine Learning, Big Data, Cloud and Parallel Com- puting (2019)

Lambora, A., Gupta, K., Chopra, K.: Genetic algorithm-a literature review. In: International Conference on Machine Learning, Big Data, Cloud and Parallel Com- puting (2019)

2019
[26]

arXiv preprint arXiv:2405.02696 (2024)

Lei, L., Gai, K., Yu, J., Zhu, L.: Diffusetrace: A transparent and flexible water- marking scheme for latent diffusion model. arXiv preprint arXiv:2405.02696 (2024)

arXiv 2024
[27]

IEEE TPAMI (2022)

Li, B., Fan, L., Gu, H., Li, J., Yang, Q.: Fedipr: Ownership verification for federated deep neural network models. IEEE TPAMI (2022)

2022
[28]

IEEE TCSVT (2024)

Li, D., Xie, W., Wang, Z., Lu, Y., Li, Y., Fang, L.: Feddiff: Diffusion model driven federated learning for multi-modal and multi-clients. IEEE TCSVT (2024)

2024
[29]

In: ECCV (2014)

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV (2014)

2014
[30]

In: CVPR (2025)

Liu, G., Cao, S., Qian, Z., Zhang, X., Li, S., Peng, W.: Watermarking one for all: A robust watermarking scheme against partial image theft. In: CVPR (2025)

2025
[31]

In: ECCV (2024)

Liu, X., Guan, X., Wu, Y., Miao, J.: Iterative ensemble training with anti-gradient control for mitigating memorization in diffusion models. In: ECCV (2024)

2024
[32]

Liu, X., Shao, S., Yang, Y., Wu, K., Yang, W., Fang, H.: Secure federated learning modelverification:Aclient-sidebackdoortriggeredwatermarkingscheme.In:IEEE International Conference on Systems, Man, and Cybernetics (2021)

2021
[33]

In: ACM MM (2024)

Ma, Z., Jia, G., Qi, B., Zhou, B.: Safe-sd: Safe and traceable stable diffusion with text prompt trigger for invisible generative watermarking. In: ACM MM (2024)

2024
[34]

In: AISTATS (2017)

McMahan, B., Moore, E., Ramage, D., Hampson, S.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)

2017
[35]

In: ECCV (2024)

Min, R., Li, S., Chen, H., Cheng, M.: A watermark-conditioned diffusion model for ip protection. In: ECCV (2024)

2024
[36]

arXiv preprint arXiv:2405.07925 (2024)

Morafah, M., Reisser, M., Lin, B., Louizos, C.: Stable diffusion-based data augmen- tation for federated learning with non-iid data. arXiv preprint arXiv:2405.07925 (2024)

arXiv 2024
[37]

In: CVPR (2025)

Morita, R., Frolov, S., Moser, B.B., Shirakawa, T., Watanabe, K., Dengel, A., Zhou, J.: Tkg-dm: Training-free chroma key content generation diffusion model. In: CVPR (2025)

2025
[38]

In: ICML (2022)

Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., Chen, M.: Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In: ICML (2022)

2022
[39]

PNAS (2022)

Nightingale, S.J., Farid, H.: Ai-synthesized faces are indistinguishable from real faces and more trustworthy. PNAS (2022)

2022
[40]

In: ICLR (2024) FedOT 17

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. In: ICLR (2024) FedOT 17

2024
[41]

In: ICML (2021)

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)

2021
[42]

arXiv preprint arXiv:2204.06125 (2022)

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text- conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)

Pith/arXiv arXiv 2022
[43]

In: CVPR (2022)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

2022
[44]

In: MICCAI (2015)

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: MICCAI (2015)

2015
[45]

In: CVPR (2023)

Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR (2023)

2023
[46]

arXiv preprint arXiv:2111.02114 (2021)

Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullis, C., Katta, A., Coombes,T.,Jitsev,J.,Komatsuzaki,A.:Laion-400m:Opendatasetofclip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114 (2021)

Pith/arXiv arXiv 2021
[47]

IEEE TDSC (2024)

Shao, S., Yang, W., Gu, H., Qin, Z., Fan, L., Yang, Q.: Fedtracker: Furnishing ownership verification and traceability for federated learning model. IEEE TDSC (2024)

2024
[48]

In: NeurIPS (2023)

Somepalli, G., Singla, V., Goldblum, M., Geiping, J., Goldstein, T.: Understanding and mitigating copying in diffusion models. In: NeurIPS (2023)

2023
[49]

In: ICLR (2021)

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)

2021
[50]

In: WWW (2024)

Stanley Jothiraj, F.V., Mashhadi, A.: Phoenix: A federated generative diffusion model. In: WWW (2024)

2024
[51]

In: International Symposium on Reliable Distributed Systems (2021)

Tekgul, B.G., Xia, Y., Marchal, S., Asokan, N.: Waffle: Watermarking in federated learning. In: International Symposium on Reliable Distributed Systems (2021)

2021
[52]

In: AAAI (2025)

Tian, Z., Quan, R., Ma, F., Zhan, K., Yang, Y.: Brainguard: Privacy-preserving multisubject image reconstructions from brain activities. In: AAAI (2025)

2025
[53]

IEEE TIP (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP (2004)

2004
[54]

In: CVPR (2025)

Wang, Z., Guo, J., Zhu, J., Li, Y., Huang, H., Chen, M., Tu, Z.: Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models. In: CVPR (2025)

2025
[55]

In: NeurIPS (2023)

Wen, Y., Kirchenbauer, J., Geiping, J., Goldstein, T.: Tree-rings watermarks: In- visible fingerprints for diffusion images. In: NeurIPS (2023)

2023
[56]

In: CVPR (2026)

Xia, C., Ma, F., Quan, R., Xu, Y., Zhan, K., Yang, Y.: Echoes of ownership: Adversarial-guided dual injection for copyright protection in mllms. In: CVPR (2026)

2026
[57]

IEEE TII (2022)

Xiao, X., Tang, Z., Li, C., Xiao, B., Li, K.: Sca: Sybil-based collusion attacks of iiot data poisoning in federated learning. IEEE TII (2022)

2022
[58]

In: ACM MM (2024)

Xu, Y., Zhu, L., Yang, Y.: Gg-editor: Locally editing 3d avatars with multimodal large language model guidance. In: ACM MM (2024)

2024
[59]

ACM TIST (2019)

Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: Concept and applications. ACM TIST (2019)

2019
[60]

ACM TIST (2023)

Yang, W., Shao, S., Yang, Y., Liu, X., Liu, X., Xia, Z., Schaefer, G., Fang, H.: Watermarking in secure federated learning: A verification framework based on client-side backdooring. ACM TIST (2023)

2023
[61]

Frontiers of Infor- mation Technology & Electronic Engineering (2021) 18 W

Yang, Y., Zhuang, Y., Pan, Y.: Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies. Frontiers of Infor- mation Technology & Electronic Engineering (2021) 18 W. Cheng et al

2021
[62]

In: CVPR (2024)

Yang, Z., Zeng, K., Chen, K., Fang, H., Zhang, W., Yu, N.: Gaussian shading: Provable performance-lossless image watermarking for diffusion models. In: CVPR (2024)

2024
[63]

TACL (2014)

Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL (2014)

2014
[64]

In: CVPR (2021)

Yu, N., Skripniuk, V., Abdelnabi, S., Fritz, M.: Artificial fingerprinting for gener- ative models: Rooting deepfake attribution in training data. In: CVPR (2021)

2021
[65]

In: ICCV (2023)

Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV (2023)

2023
[66]

In: ICLR (2026)

Zhou, D., Li, M., Yang, Z., Lu, Y., Xu, Y., Wang, Z., Huang, Z., Yang, Y.: Bid- eDPO: Conditional image generation with simultaneous text and condition align- ment. In: ICLR (2026)

2026
[67]

In: ECCV (2018) FedOT 19 Appendix Table of Contents A Preliminaries

Zhu, J., Kaplan, R., Johnson, J., Fei-Fei, L.: Hidden: Hiding data with deep net- works. In: ECCV (2018) FedOT 19 Appendix Table of Contents A Preliminaries........................................................ 19 A.1 Federated LDMs and Threat Model ............................... 19 A.2 Local Training under FedOT ...................................... ...

2018
[68]

double negative recovery

(21) The latter term is dominated by noise, which disrupts the local neighborhood structure of the latent space, making it difficult for the decoderDto learn a stable inverse mapping. Consequently, the decoder fails to accurately reconstruct the images, leading to a noticeable degradation in overall reconstruction quality. Translation and Mirror Transform...

[1] [1]

arXiv preprint arXiv:2211.01324 (2022)

Balaji, Y., Nah, S., Huang, X., Vahdat, A., Song, J., Zhang, Q., Kreis, K., Ait- tala, M., Aila, T., Laine, S., et al.: ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324 (2022)

Pith/arXiv arXiv 2022

[2] [2]

In: CVPR (2023)

Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: Learning to follow image editing instructions. In: CVPR (2023)

2023

[3] [3]

arXiv preprint arXiv:1802.07228 (2018)

Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., Dafoe, A., Scharre, P., Zeitzoff, T., Filar, B., et al.: The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv preprint arXiv:1802.07228 (2018)

arXiv 2018

[4] [4]

ACM Computing Surveys (2025)

Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P., Sun, L.: A survey of ai-generated content (aigc). ACM Computing Surveys (2025)

2025

[5] [5]

In: ICML (2025)

Ci, H., Song, Y., Yang, P., Xie, J., Shou, M.Z.: Wmadapter: Adding watermark control to latent diffusion models. In: ICML (2025)

2025

[6] [6]

Morgan Kaufmann (2007)

Cox, I., Miller, M., Bloom, J., Fridrich, J., Kalker, T.: Digital watermarking and steganography. Morgan Kaufmann (2007)

2007

[7] [7]

In: ICML (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: ICML (2024)

2024

[8] [8]

In: CVPR (2021)

Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: CVPR (2021)

2021

[9] [9]

In: USENIX Security (2020)

Fang, M., Cao, X., Jia, J., Gong, N.: Local model poisoning attacks to{Byzantine- Robust}federated learning. In: USENIX Security (2020)

2020

[10] [10]

In: IEEE Security and Privacy Workshops (2021)

Fereidooni, H., Marchal, S., Miettinen, M., Mirhoseini, A., Möllering, H., Nguyen, T.D., Rieger, P., Sadeghi, A.R., Schneider, T., Yalame, H., et al.: Safelearn: Se- cure aggregation for private federated learning. In: IEEE Security and Privacy Workshops (2021)

2021

[11] [11]

In: ICCV (2023)

Fernandez, P., Couairon, G., Jégou, H., Douze, M., Furon, T.: The stable signature: Rooting watermarks in latent diffusion models. In: ICCV (2023)

2023

[12] [12]

arXiv preprint arXiv:1808.04866 (2018)

Fung, C., Yoon, C.J., Beschastnikh, I.: Mitigating sybils in federated learning poi- soning. arXiv preprint arXiv:1808.04866 (2018)

arXiv 2018

[13] [13]

In: CVPR (2025)

Gan, Y., Miao, J., Wang, Y., Yang, Y.: Silence is golden: Leveraging adversarial examples to nullify audio control in ldm-based talking-head generation. In: CVPR (2025)

2025

[14] [14]

In: NeurIPS (2024)

Gan, Y., Miao, J., Yang, Y.: Datastealing: Steal data from diffusion models in federated learning with multiple trojans. In: NeurIPS (2024)

2024

[15] [15]

In: CVPR (2022)

Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., Guo, B.: Vector quantized diffusion model for text-to-image synthesis. In: CVPR (2022)

2022

[16] [16]

In: ICCV (2023)

Han, L., Li, Y., Zhang, H., Milanfar, P., Metaxas, D., Yang, F.: Svdiff: Compact parameter space for diffusion fine-tuning. In: ICCV (2023)

2023

[17] [17]

In: NeurIPS (2017) 16 W

Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS (2017) 16 W. Cheng et al

2017

[18] [18]

In: NeurIPS (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)

2020

[19] [19]

In: NeurIPS (2022)

Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. In: NeurIPS (2022)

2022

[20] [20]

In: ICLR (2025)

Hu, R., Zhang, J., Li, Y., Li, J., Guo, Q., Qiu, H., Zhang, T.: Videoshield: Regu- lating diffusion-based video generation models via watermarking. In: ICLR (2025)

2025

[21] [21]

In: MMM (2026)

Jia, H., Zhao, N., Xu, Y., Zhu, L., Yang, Y.: Gas: Geometry-appearance synergy for consistent video customization. In: MMM (2026)

2026

[22] [22]

In: ECCV (2016)

Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV (2016)

2016

[23] [23]

In: ICLR (2014)

Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)

2014

[24] [24]

The Annals of Math- ematical Statistics (1951)

Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Math- ematical Statistics (1951)

1951

[25] [25]

In: International Conference on Machine Learning, Big Data, Cloud and Parallel Com- puting (2019)

Lambora, A., Gupta, K., Chopra, K.: Genetic algorithm-a literature review. In: International Conference on Machine Learning, Big Data, Cloud and Parallel Com- puting (2019)

2019

[26] [26]

arXiv preprint arXiv:2405.02696 (2024)

Lei, L., Gai, K., Yu, J., Zhu, L.: Diffusetrace: A transparent and flexible water- marking scheme for latent diffusion model. arXiv preprint arXiv:2405.02696 (2024)

arXiv 2024

[27] [27]

IEEE TPAMI (2022)

Li, B., Fan, L., Gu, H., Li, J., Yang, Q.: Fedipr: Ownership verification for federated deep neural network models. IEEE TPAMI (2022)

2022

[28] [28]

IEEE TCSVT (2024)

Li, D., Xie, W., Wang, Z., Lu, Y., Li, Y., Fang, L.: Feddiff: Diffusion model driven federated learning for multi-modal and multi-clients. IEEE TCSVT (2024)

2024

[29] [29]

In: ECCV (2014)

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV (2014)

2014

[30] [30]

In: CVPR (2025)

Liu, G., Cao, S., Qian, Z., Zhang, X., Li, S., Peng, W.: Watermarking one for all: A robust watermarking scheme against partial image theft. In: CVPR (2025)

2025

[31] [31]

In: ECCV (2024)

Liu, X., Guan, X., Wu, Y., Miao, J.: Iterative ensemble training with anti-gradient control for mitigating memorization in diffusion models. In: ECCV (2024)

2024

[32] [32]

Liu, X., Shao, S., Yang, Y., Wu, K., Yang, W., Fang, H.: Secure federated learning modelverification:Aclient-sidebackdoortriggeredwatermarkingscheme.In:IEEE International Conference on Systems, Man, and Cybernetics (2021)

2021

[33] [33]

In: ACM MM (2024)

Ma, Z., Jia, G., Qi, B., Zhou, B.: Safe-sd: Safe and traceable stable diffusion with text prompt trigger for invisible generative watermarking. In: ACM MM (2024)

2024

[34] [34]

In: AISTATS (2017)

McMahan, B., Moore, E., Ramage, D., Hampson, S.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)

2017

[35] [35]

In: ECCV (2024)

Min, R., Li, S., Chen, H., Cheng, M.: A watermark-conditioned diffusion model for ip protection. In: ECCV (2024)

2024

[36] [36]

arXiv preprint arXiv:2405.07925 (2024)

Morafah, M., Reisser, M., Lin, B., Louizos, C.: Stable diffusion-based data augmen- tation for federated learning with non-iid data. arXiv preprint arXiv:2405.07925 (2024)

arXiv 2024

[37] [37]

In: CVPR (2025)

Morita, R., Frolov, S., Moser, B.B., Shirakawa, T., Watanabe, K., Dengel, A., Zhou, J.: Tkg-dm: Training-free chroma key content generation diffusion model. In: CVPR (2025)

2025

[38] [38]

In: ICML (2022)

Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., Chen, M.: Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In: ICML (2022)

2022

[39] [39]

PNAS (2022)

Nightingale, S.J., Farid, H.: Ai-synthesized faces are indistinguishable from real faces and more trustworthy. PNAS (2022)

2022

[40] [40]

In: ICLR (2024) FedOT 17

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. In: ICLR (2024) FedOT 17

2024

[41] [41]

In: ICML (2021)

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)

2021

[42] [42]

arXiv preprint arXiv:2204.06125 (2022)

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text- conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022)

Pith/arXiv arXiv 2022

[43] [43]

In: CVPR (2022)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

2022

[44] [44]

In: MICCAI (2015)

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: MICCAI (2015)

2015

[45] [45]

In: CVPR (2023)

Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR (2023)

2023

[46] [46]

arXiv preprint arXiv:2111.02114 (2021)

Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullis, C., Katta, A., Coombes,T.,Jitsev,J.,Komatsuzaki,A.:Laion-400m:Opendatasetofclip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114 (2021)

Pith/arXiv arXiv 2021

[47] [47]

IEEE TDSC (2024)

Shao, S., Yang, W., Gu, H., Qin, Z., Fan, L., Yang, Q.: Fedtracker: Furnishing ownership verification and traceability for federated learning model. IEEE TDSC (2024)

2024

[48] [48]

In: NeurIPS (2023)

Somepalli, G., Singla, V., Goldblum, M., Geiping, J., Goldstein, T.: Understanding and mitigating copying in diffusion models. In: NeurIPS (2023)

2023

[49] [49]

In: ICLR (2021)

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)

2021

[50] [50]

In: WWW (2024)

Stanley Jothiraj, F.V., Mashhadi, A.: Phoenix: A federated generative diffusion model. In: WWW (2024)

2024

[51] [51]

In: International Symposium on Reliable Distributed Systems (2021)

Tekgul, B.G., Xia, Y., Marchal, S., Asokan, N.: Waffle: Watermarking in federated learning. In: International Symposium on Reliable Distributed Systems (2021)

2021

[52] [52]

In: AAAI (2025)

Tian, Z., Quan, R., Ma, F., Zhan, K., Yang, Y.: Brainguard: Privacy-preserving multisubject image reconstructions from brain activities. In: AAAI (2025)

2025

[53] [53]

IEEE TIP (2004)

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP (2004)

2004

[54] [54]

In: CVPR (2025)

Wang, Z., Guo, J., Zhu, J., Li, Y., Huang, H., Chen, M., Tu, Z.: Sleepermark: Towards robust watermark against fine-tuning text-to-image diffusion models. In: CVPR (2025)

2025

[55] [55]

In: NeurIPS (2023)

Wen, Y., Kirchenbauer, J., Geiping, J., Goldstein, T.: Tree-rings watermarks: In- visible fingerprints for diffusion images. In: NeurIPS (2023)

2023

[56] [56]

In: CVPR (2026)

Xia, C., Ma, F., Quan, R., Xu, Y., Zhan, K., Yang, Y.: Echoes of ownership: Adversarial-guided dual injection for copyright protection in mllms. In: CVPR (2026)

2026

[57] [57]

IEEE TII (2022)

Xiao, X., Tang, Z., Li, C., Xiao, B., Li, K.: Sca: Sybil-based collusion attacks of iiot data poisoning in federated learning. IEEE TII (2022)

2022

[58] [58]

In: ACM MM (2024)

Xu, Y., Zhu, L., Yang, Y.: Gg-editor: Locally editing 3d avatars with multimodal large language model guidance. In: ACM MM (2024)

2024

[59] [59]

ACM TIST (2019)

Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: Concept and applications. ACM TIST (2019)

2019

[60] [60]

ACM TIST (2023)

Yang, W., Shao, S., Yang, Y., Liu, X., Liu, X., Xia, Z., Schaefer, G., Fang, H.: Watermarking in secure federated learning: A verification framework based on client-side backdooring. ACM TIST (2023)

2023

[61] [61]

Frontiers of Infor- mation Technology & Electronic Engineering (2021) 18 W

Yang, Y., Zhuang, Y., Pan, Y.: Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies. Frontiers of Infor- mation Technology & Electronic Engineering (2021) 18 W. Cheng et al

2021

[62] [62]

In: CVPR (2024)

Yang, Z., Zeng, K., Chen, K., Fang, H., Zhang, W., Yu, N.: Gaussian shading: Provable performance-lossless image watermarking for diffusion models. In: CVPR (2024)

2024

[63] [63]

TACL (2014)

Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL (2014)

2014

[64] [64]

In: CVPR (2021)

Yu, N., Skripniuk, V., Abdelnabi, S., Fritz, M.: Artificial fingerprinting for gener- ative models: Rooting deepfake attribution in training data. In: CVPR (2021)

2021

[65] [65]

In: ICCV (2023)

Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: ICCV (2023)

2023

[66] [66]

In: ICLR (2026)

Zhou, D., Li, M., Yang, Z., Lu, Y., Xu, Y., Wang, Z., Huang, Z., Yang, Y.: Bid- eDPO: Conditional image generation with simultaneous text and condition align- ment. In: ICLR (2026)

2026

[67] [67]

In: ECCV (2018) FedOT 19 Appendix Table of Contents A Preliminaries

Zhu, J., Kaplan, R., Johnson, J., Fei-Fei, L.: Hidden: Hiding data with deep net- works. In: ECCV (2018) FedOT 19 Appendix Table of Contents A Preliminaries........................................................ 19 A.1 Federated LDMs and Threat Model ............................... 19 A.2 Local Training under FedOT ...................................... ...

2018

[68] [68]

double negative recovery

(21) The latter term is dominated by noise, which disrupts the local neighborhood structure of the latent space, making it difficult for the decoderDto learn a stable inverse mapping. Consequently, the decoder fails to accurately reconstruct the images, leading to a noticeable degradation in overall reconstruction quality. Translation and Mirror Transform...