pith. sign in

arxiv: 2605.22060 · v1 · pith:ADSQBC5Unew · submitted 2026-05-21 · 💻 cs.CR · cs.AI

Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

Pith reviewed 2026-05-22 05:50 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords model stealingknowledge distillationtext-to-image generationadversarial perturbationAPI defensecopyright protection
0
0 comments X

The pith

WaveGuard injects frequency-aware perturbations into text-to-image outputs to block unauthorized distillation into substitute models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses model stealing from closed-weight text-to-image services that release synthetic images through query APIs. Attackers collect these outputs in volume and train private student models to replicate the original capabilities without accessing weights. WaveGuard counters this by generating and adding structured perturbations in one forward pass under a user-chosen budget. The perturbations are designed to keep images visually intact for ordinary viewers yet markedly less effective when used as training data for distillation. Experiments focus on WikiArt-style synthetic outputs and report gains in protection efficiency alongside explicit control over how visible the changes remain.

Core claim

WaveGuard is a single-pass, generator-based framework that employs a frequency-aware perturbation generator to embed imperceptible, structured noise into released synthetic images, thereby lowering their value as training material for unauthorized student models while preserving perceptual quality for legitimate users under an explicit perturbation budget.

What carries the argument

A frequency-aware perturbation generator that produces structured, budget-constrained perturbations tuned to degrade distillation performance.

If this is right

  • Protected images retain visual fidelity sufficient for benign viewing and downstream use.
  • Protection scales efficiently to large-volume output release compared with prior defenses.
  • Users retain explicit control over the magnitude of added perturbations.
  • The method delivers a favorable balance of protection strength, image quality, and computational cost under WikiArt-related distillation scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • API providers could integrate the protection step directly into their generation pipeline without requiring changes to the underlying model weights.
  • The same frequency-aware approach might transfer to other generative modalities where synthetic outputs are released through public queries.
  • Widespread adoption would raise the cost for attackers attempting to replicate commercial image-generation capabilities via distillation.

Load-bearing premise

Frequency-aware perturbations injected under a user-specified budget will substantially reduce the usefulness of protected images as training data for student models while maintaining perceptual utility for benign viewers.

What would settle it

Train a student model on a large corpus of WaveGuard-protected images released by the target service and measure whether its performance on standard evaluation tasks reaches or exceeds the level achieved by an identical student trained on the same volume of unprotected images.

Figures

Figures reproduced from arXiv: 2605.22060 by Hongyuan Zhang, Sida Huang, Xuelong Li, Yilan Gao.

Figure 1
Figure 1. Figure 1: Unauthorized distillation from released syn￾thetic images and WaveGuard protection. (Left) An at￾tacker queries a closed-weight generative service, collects released synthetic images, and trains a substitute model to imitate the teacher. (Right) WaveGuard applies bounded, structured perturbations before release, reducing down￾stream imitation while preserving visual fidelity. sion boundaries. For generativ… view at source ↗
Figure 2
Figure 2. Figure 2: Training pipeline and generator data flow. (a) Overview of the WaveGuard training pipeline. The generator pre￾dicts bounded adversarial perturbations and adds them to the original image to produce the protected output. (b) Data flow of low-frequency (LF) and high-frequency (HF) features in the frequency-aware injection path of the generator. (c) Grayscale visualization of the LL subband and high-frequency … view at source ↗
Figure 3
Figure 3. Figure 3: Budget-controlled fidelity–protection trade-off. Each point corresponds to a different perturbation budget. and Textual Inversion (TI) students. The key pattern is con￾sistent across metrics: WaveGuard achieves the best visual fidelity among the compared defenses while still provid￾ing substantial protection over the no-protection baseline. In particular, Mist obtains the strongest protection on the DreamB… view at source ↗
Figure 5
Figure 5. Figure 5: Visual and frequency-domain analysis of pro￾tected images. Compared with iterative baselines, WaveG￾uard shows weaker overall perturbation energy and less vi￾sually disruptive frequency spread, which is consistent with its stronger fidelity metrics. 5.5 Ablation Study We further analyze the design of our frequency module in [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative illustration of student training. A student initialized from a public checkpoint can be fine￾tuned on teacher outputs to imitate the teacher’s style. denotes wavelet unpooling, HFk/LLk denote the high-/low￾frequency branches at stage k, and “skip”, “proj”, and “ref” denote skip fusion, channel projection, and feature refine￾ment, respectively. The perturbation is computed as ∆ = ϵ · ∆norm, and … view at source ↗
read the original abstract

Closed-weight generative services are increasingly deployed through query-based APIs, where users can obtain generated outputs while model parameters remain inaccessible. However, such deployment does not prevent model stealing: an attacker can repeatedly query the service, collect large volumes of released synthetic images, and use them as training data for a private substitute model. This query-output-driven process enables unauthorized knowledge distillation and capability replication without direct access to the original weights. To mitigate this threat, a practical defense should preserve the visual fidelity of released images, provide explicit control over perturbation magnitude, and scale efficiently to large-volume output release. We present WaveGuard, a single-pass, generator-based protection framework that safeguards released synthetic images under a user-specified perturbation budget. WaveGuard employs a frequency-aware perturbation generator to inject structured, imperceptible perturbations that maintain perceptual utility for benign viewers while reducing the usefulness of protected images as training data for unauthorized student models. Extensive experiments under WikiArt-related synthetic-output distillation settings show that WaveGuard achieves a favorable efficacy--fidelity--efficiency trade-off, with explicit imperceptibility control and substantial gains in protection efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes WaveGuard, a single-pass generator-based defense that injects frequency-aware perturbations into synthetic images released by text-to-image APIs. Under a user-specified perturbation budget, the method aims to degrade the utility of these images as training data for unauthorized student models via knowledge distillation while preserving perceptual fidelity for benign users. Experiments on WikiArt-related synthetic-output distillation settings are reported to demonstrate a favorable efficacy-fidelity-efficiency trade-off with explicit imperceptibility control.

Significance. If the empirical claims hold under realistic threat models, the work addresses a timely security issue for closed-weight generative services by providing a practical, controllable defense against query-based model stealing. The single-pass design and efficiency gains for large-volume releases would be valuable contributions to the literature on protecting deployed generative models.

major comments (2)
  1. [Experiments] Experiments section: the reported efficacy-fidelity trade-off is evaluated only against non-adaptive student models trained directly on protected outputs. No results are shown for adaptive attackers who could learn a removal mapping (e.g., a small denoiser or frequency filter) from protected/unprotected pairs and preprocess the training corpus before distillation. This leaves open whether the protection generalizes beyond the assumed threat model.
  2. [Method] Method description: the frequency-aware perturbation generator is presented as reducing usefulness for student models, but the paper provides no analysis or ablation showing that the injected perturbations survive common preprocessing steps that an adaptive attacker might apply.
minor comments (2)
  1. [Abstract] Abstract: quantitative results, specific metrics, and baseline comparisons for the claimed trade-off are not reported, making it difficult to assess the magnitude of the gains.
  2. Notation: the perturbation budget is described as user-specified but its precise definition and units are not clarified early in the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the reported efficacy-fidelity trade-off is evaluated only against non-adaptive student models trained directly on protected outputs. No results are shown for adaptive attackers who could learn a removal mapping (e.g., a small denoiser or frequency filter) from protected/unprotected pairs and preprocess the training corpus before distillation. This leaves open whether the protection generalizes beyond the assumed threat model.

    Authors: We acknowledge that the current experiments focus on the non-adaptive threat model described in the paper, where attackers use protected images directly for distillation. To address this concern, we will add new experiments in the revised manuscript that evaluate against adaptive attackers. These will include training a lightweight removal network (e.g., a small denoiser or frequency filter) on protected/unprotected image pairs and measuring the resulting distillation performance after preprocessing. This will provide empirical evidence on the generalization of WaveGuard's protection. revision: yes

  2. Referee: [Method] Method description: the frequency-aware perturbation generator is presented as reducing usefulness for student models, but the paper provides no analysis or ablation showing that the injected perturbations survive common preprocessing steps that an adaptive attacker might apply.

    Authors: We agree that an analysis of perturbation survival under preprocessing is important for a complete evaluation. In the revision, we will add an ablation study examining the effects of common preprocessing operations (such as denoising, frequency-domain filtering, and standard data augmentations) on both perceptual quality and downstream distillation utility. This will demonstrate that the frequency-aware perturbations are not trivially removable without compromising image fidelity. revision: yes

Circularity Check

0 steps flagged

No circularity: protection efficacy shown via external distillation experiments, not by construction

full rationale

The paper introduces WaveGuard as a generator-based perturbation method with frequency-aware injection under a user budget. Claims of reduced usefulness for student models rest on empirical WikiArt distillation trials rather than any self-definitional equation, fitted parameter renamed as prediction, or self-citation chain. No load-bearing derivations appear; the central efficacy-fidelity trade-off is externally falsifiable via the reported experiments and does not reduce to the method's own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The method relies on the unproven premise that frequency-domain perturbations can selectively degrade distillation performance without degrading human perception, plus the assumption that the generator can be trained or designed to respect a user budget while achieving both goals.

free parameters (1)
  • perturbation budget
    User-specified magnitude limit that controls the strength of injected perturbations.
axioms (1)
  • domain assumption Frequency-aware perturbations can be made imperceptible to humans yet harmful to model training.
    Central premise invoked to justify the generator design and claimed trade-off.
invented entities (1)
  • frequency-aware perturbation generator no independent evidence
    purpose: To produce structured perturbations in a single pass that protect against distillation.
    New component introduced by the framework; no independent evidence provided in abstract.

pith-pipeline@v0.9.0 · 5725 in / 1190 out tokens · 29461 ms · 2026-05-22T05:50:15.553102+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 1 internal anchor

  1. [1]

    2021 , doi =

    Song, Jiaming and Meng, Chenlin and Ermon, Stefano , title =. 2021 , doi =

  2. [2]

    2024 , doi =

    Podell, Dustin and English, Zion and Lacey, Kyle and Blattmann, Andreas and Dockhorn, Tim and M. 2024 , doi =

  3. [3]

    2025 , doi =

    Huang, Zhihao and Qiu, Xi and Ma, Yukuo and Zhou, Yifu and Chen, Junjie and Zhang, Hongyuan and Zhang, Chi and Li, Xuelong , title =. 2025 , doi =

  4. [4]

    ACM SIGKDD Explorations Newsletter , volume =

    Cui, Yingqian and Ren, Jie and Xu, Han and He, Pengfei and Liu, Hui and Sun, Lichao and Xing, Yue and Tang, Jiliang , title =. ACM SIGKDD Explorations Newsletter , volume =. 2025 , doi =

  5. [5]

    2023 , doi =

    Zhao, Yunqing and Pang, Tianyu and Du, Chao and Yang, Xiao and Cheung, Ngai-Man and Lin, Min , title =. 2023 , doi =

  6. [6]

    ICLR 2024 Workshop on Secure and Trustworthy Large Language Models , year =

    Ye, Xiaoyu and Huang, Hao and An, Jiaqi and Wang, Yongtao , title =. ICLR 2024 Workshop on Secure and Trustworthy Large Language Models , year =

  7. [7]

    2023 , doi =

    Liang, Chumeng and Wu, Xiaoyu , title =. 2023 , doi =

  8. [8]

    2025 , pages =

    Li, Minghao and Wang, Rui and Sun, Ming and Jing, Lihua , title =. 2025 , pages =

  9. [9]

    2023 , doi =

    Le, Thanh Van and Phung, Hao and Nguyen, Thuan Hoang and Dao, Quan and Tran, Ngoc and Tran, Anh , title =. 2023 , doi =

  10. [10]

    2024 , doi =

    Zhu, Peifei and Takahashi, Tsubasa and Kataoka, Hirokatsu , title =. 2024 , doi =

  11. [11]

    2021 , pages =

    Dhariwal, Prafulla and Nichol, Alexander , title =. 2021 , pages =

  12. [12]

    2020 , pages =

    Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , title =. 2020 , pages =

  13. [13]

    2022 , pages =

    Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bjorn , title =. 2022 , pages =

  14. [14]

    and Chechik, Gal and Cohen-Or, Daniel , title =

    Gal, Rinon and Alaluf, Yuval and Atzmon, Yuval and Patashnik, Or and Bermano, Amit H. and Chechik, Gal and Cohen-Or, Daniel , title =. 2022 , doi =

  15. [15]

    2023 , pages =

    Ruiz, Nataniel and Li, Yuanzhen and Jampani, Varun and Pritch, Yael and Rubinstein, Michael and Aberman, Kfir , title =. 2023 , pages =

  16. [16]

    and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =

    Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. 2022 , doi =

  17. [17]

    2023 , pages =

    Kumari, Nupur and Zhang, Bingliang and Zhang, Richard and Shechtman, Eli and Zhu, Jun-Yan , title =. 2023 , pages =

  18. [18]

    2023 , pages =

    Zhang, Lvmin and Rao, Anyi and Agrawala, Maneesh , title =. 2023 , pages =

  19. [19]

    2024 , pages =

    Mou, Chong and Wang, Xintao and Xie, Liangbin and Wu, Yanze and Zhang, Jian and Qi, Zhongang and Shan, Ying , title =. 2024 , pages =

  20. [20]

    International Conference on Information and Communication Technology Convergence (ICTC) , year =

    Park, Junseo and Ko, Beomseok and Jang, Hyeryung , title =. International Conference on Information and Communication Technology Convergence (ICTC) , year =

  21. [21]

    2024 , pages =

    Chung, Jiwoo and Hyun, Sangeek and Heo, Jae-Pil , title =. 2024 , pages =

  22. [22]

    , title =

    Shan, Shawn and Cryan, Jenna and Wenger, Emily and Zheng, Haitao and Hanocka, Rana and Zhao, Ben Y. , title =. 2023 , doi =

  23. [23]

    2023 , doi =

    Salman, Hadi and Khaddaj, Alaa and Leclerc, Guillaume and Ilyas, Andrew and Madry, Aleksander , title =. 2023 , doi =

  24. [24]

    2023 , doi =

    Liu, Yixin and Fan, Chenrui and Dai, Yutong and Chen, Xun and Zhou, Pan and Sun, Lichao , title =. 2023 , doi =

  25. [25]

    , title =

    Shan, Shawn and Ding, Wenxin and Passananti, Josephine and Wu, Stanley and Zheng, Haitao and Zhao, Ben Y. , title =. 2023 , doi =

  26. [26]

    2023 , doi =

    Liang, Chumeng and Wu, Xiaoyu and Hua, Yang and Zhang, Jiaru and Xue, Yiming and Song, Tao and Xue, Zhengui and Ma, Ruhui and Guan, Haibing , title =. 2023 , doi =

  27. [27]

    2025 , doi =

    Li, Pingzhi and Tan, Zhen and Zhang, Mohan and Qu, Huaizhi and Liu, Huan and Chen, Tianlong , title =. 2025 , doi =

  28. [28]

    Zico , title =

    Savani, Yash and Trockman, Asher and Feng, Zhili and Xu, Yixuan Even and Schwarzschild, Avi and Robey, Alexander and Finzi, Marc Anton and Kolter, J. Zico , title =. 2025 , eprint =

  29. [29]

    2024 , pages =

    Wang, Feifei and Tan, Zhentao and Wei, Tianyi and Wu, Yue and Huang, Qidong , title =. 2024 , pages =

  30. [30]

    2024 , doi =

    Yang, Jing and Xi, Runping and Lai, Yingxin and Lin, Xun and Yu, Zitong , title =. 2024 , doi =

  31. [31]

    2022 , doi =

    Yang, Mengping and Wang, Zhe and Chi, Ziqiu and Feng, Wenyi , title =. 2022 , doi =

  32. [32]

    2023 , pages =

    Zhu, Peifei and Osada, Genki and Kataoka, Hirokatsu and Takahashi, Tsubasa , title =. 2023 , pages =

  33. [33]

    , title =

    Daubechies, I. , title =. IEEE Transactions on Information Theory , volume =. 1990 , pages =

  34. [34]

    2019 , pages =

    Karras, Tero and Laine, Samuli and Aila, Timo , title =. 2019 , pages =

  35. [35]

    Pattern Recognition: 44th

    Wright, Matthias and Ommer, Bj. Pattern Recognition: 44th. 2022 , doi =

  36. [36]

    2024 , doi =

    Somepalli, Gowthami and Gupta, Anubhav and Gupta, Kamal and Palta, Shramay and Goldblum, Micah and Geiping, Jonas and Shrivastava, Abhinav and Goldstein, Tom , title =. 2024 , doi =

  37. [37]

    2022 , doi =

    Nie, Weili and Guo, Brandon and Huang, Yujia and Xiao, Chaowei and Vahdat, Arash and Anandkumar, Anima , title =. 2022 , doi =

  38. [38]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

    Hu, Baoyue and Wei, Yang and Xiao, Junhao and Huang, Wendong and Bi, Xiuli and Xiao, Bin , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

  39. [39]

    International Conference on Learning Representations (ICLR) , year=

    Undistillable: Making A Nasty Teacher That Cannot Teach Students , author=. International Conference on Learning Representations (ICLR) , year=

  40. [40]

    International Conference on Learning Representations (ICLR) , year=

    Sparse logits suffice to fail knowledge distillation , author=. International Conference on Learning Representations (ICLR) , year=

  41. [41]

    2026 , eprint =

    Hiding in Plain Sight: Detectability-Aware Antidistillation of Reasoning Models , author =. 2026 , eprint =

  42. [42]

    2025 , eprint =

    Information-Preserving Reformulation of Reasoning Traces for Antidistillation , author =. 2025 , eprint =

  43. [43]

    2026 , eprint=

    Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective , author=. 2026 , eprint=

  44. [44]

    International Journal for Digital Art History , number =

    Saleh, Babak and Elgammal, Ahmed , title =. International Journal for Digital Art History , number =. 2016 , eprint =

  45. [45]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Gu, Zhenyu and Xu, Yanchen and Huang, Sida and Guo, Yubin and Zhang, Hongyuan , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2026 , doi =

  46. [46]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Huang, Sida and Huang, Siqi and Luo, Ping and Zhang, Hongyuan , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2026 , doi =

  47. [47]

    2025 , archivePrefix =

    Zhu, Ruishu and Huang, Zhihao and Sun, Jiacheng and Luo, Ping and Zhang, Hongyuan and Li, Xuelong , title =. 2025 , archivePrefix =. 2512.14099 , primaryClass =

  48. [48]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

    Zhang, Hongyuan and Huang, Sida and Guo, Yubin and Li, Xuelong , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2025 , doi =

  49. [49]

    International Conference on Machine Learning , year =

    Zhang, Hongyuan and Xu, Yanchen and Huang, Sida and Li, Xuelong , title =. International Conference on Machine Learning , year =

  50. [50]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Zhu, Ruishu and Huang, Sida and Jiao, Ziheng and Zhang, Hongyuan , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2026 , doi =