pith. sign in

arxiv: 2508.06656 · v2 · submitted 2025-08-08 · 💻 cs.CV

ClusterMark: Towards Robust Watermarking for Autoregressive Image Generators with Visual Token Clustering

Pith reviewed 2026-05-18 23:44 UTC · model grok-4.3

classification 💻 cs.CV
keywords watermarkingautoregressive image generationvisual token clusteringrobustness to perturbationsVQ-VAEimage attributiondetection
0
0 comments X

The pith

Clustering similar visual tokens into shared color sets creates a watermark signal that survives perturbations in autoregressive image generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard token-level watermarking loses detectability under image edits in autoregressive models, but grouping similar tokens into the same red or green set preserves the statistical bias for detection. A sympathetic reader would care because this offers a way to mark and later attribute AI-generated images reliably even after common changes like noise or compression, without extra training or quality loss. The approach works both with simple clustering and with a fine-tuned predictor, and experiments show gains over baselines while keeping verification fast.

Core claim

By clustering visual tokens produced by the VQ-VAE and assigning entire clusters to the same watermark color set, the next-token prediction bias creates a persistent signal that remains detectable after perturbations and regeneration attacks, unlike direct per-token schemes.

What carries the argument

Visual token clustering that places similar tokens into the same red or green set to bias autoregressive prediction toward the watermark color.

If this is right

  • Watermarked autoregressive images remain attributable after standard perturbations and regeneration attempts.
  • Image quality metrics stay comparable to unmarked outputs.
  • Verification time matches lightweight post-hoc methods.
  • Both training-free clustering and fine-tuned predictors outperform direct token watermarking and concurrent baselines.
  • The method applies across different VQ-VAE tokenizers without retraining the generator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The clustering principle could transfer to watermarking other token-sequence generators such as video or audio models.
  • Examining how cluster stability interacts with different attack strengths might reveal further robustness limits.
  • Integrating this approach into existing image forensics pipelines could strengthen attribution for mixed generation sources.

Load-bearing premise

Grouping similar visual tokens into the same color set keeps the watermark bias detectable under perturbations even when the clustering is imperfect or varies across tokenizers.

What would settle it

A statistical test showing no excess of one color cluster in watermarked images after Gaussian noise or JPEG compression, compared with unmarked images, would show the robustness gain does not hold.

Figures

Figures reproduced from arXiv: 2508.06656 by Andreas M\"uller, Asja Fischer, Denis Lukovnikov, Erwin Quiring.

Figure 1
Figure 1. Figure 1: Watermarking for autoregressive models by biasing to [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our proposed cluster-based red-green watermarking for AR models. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Empirical TPR@FPR=1% under different perturbations for LlamaGen (GPT-L) and RAR-XL for different numbers of clusters [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of green token ratio measured across 2.000 [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Examples of visual perturbations. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIDs across models, amount of clusters, penalty [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Ablation of detection characteristics across models, perturbations, amount of clusters, penalty [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual examples of unwatermarked and watermarked images generated with [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visual examples of unwatermarked and watermarked images generated with [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visual examples of unwatermarked and watermarked images generated with [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
read the original abstract

In-generation watermarking for latent diffusion models has recently shown high robustness in marking generated images for easier detection and attribution. However, its application to autoregressive (AR) image models is underexplored. Autoregressive models generate images by autoregressively predicting a sequence of visual tokens that are then decoded into pixels using a VQ-VAE decoder. Inspired by KGW watermarking for large language models, we examine token-level watermarking schemes that bias the next-token prediction based on prior tokens. We find that a direct transfer of these schemes works in principle, but the detectability of the watermarks decreases considerably under common image perturbations. As a remedy, we propose a watermarking approach based on visual token clustering, which assigns similar tokens to the same set (red or green). We investigate token clustering in a training-free setting, as well as in combination with a more accurate fine-tuned token or cluster predictor. Overall, our experiments show that cluster-based watermarks greatly improve robustness against perturbations and regeneration attacks while preserving image quality, outperforming a set of baselines and concurrent works. Moreover, our methods offer fast verification runtime, comparable to lightweight post-hoc watermarking techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes ClusterMark, a token-level watermarking scheme for autoregressive image generators that groups similar visual tokens into the same red or green set (via training-free or fine-tuned clustering) to bias next-token sampling, extending KGW-style methods from LLMs. It claims that this clustering yields substantially higher robustness to common image perturbations and regeneration attacks than direct per-token watermarking or baselines, while preserving generation quality and enabling fast verification.

Significance. If the robustness gains hold under the reported conditions, the work would address an underexplored gap in in-generation watermarking for AR image models (as opposed to latent diffusion). The empirical demonstration of improved detectability via cluster-based partitioning could inform practical attribution techniques, particularly if the method generalizes across VQ-VAE tokenizers.

major comments (1)
  1. [Experiments / robustness evaluation] The central claim of markedly improved robustness rests on the premise that tokens assigned to the same cluster remain sufficiently consistent after pixel-space perturbations and re-encoding. The manuscript does not report any direct measurement of cluster-label agreement (e.g., fraction of tokens retaining the same red/green assignment) on original versus perturbed images passed back through the VQ-VAE, nor does it ablate whether the observed gains survive when cluster assignments are recomputed on the perturbed image. Without such verification, the advantage over naïve per-token schemes cannot be isolated from potential confounds in cluster stability.
minor comments (2)
  1. [§4 / experimental setup] The experimental section should specify the exact perturbation strengths (e.g., noise variance, compression quality factors) and report statistical significance (p-values or confidence intervals) for the robustness metric improvements over baselines.
  2. [Method / clustering variants] Clarify whether the fine-tuned cluster predictor is trained on the same VQ-VAE tokenizer used at inference or on a held-out set, and include an ablation on clustering accuracy versus watermark detectability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address the single major comment below and will incorporate the suggested analysis into the revised manuscript to strengthen the empirical support for our claims.

read point-by-point responses
  1. Referee: The central claim of markedly improved robustness rests on the premise that tokens assigned to the same cluster remain sufficiently consistent after pixel-space perturbations and re-encoding. The manuscript does not report any direct measurement of cluster-label agreement (e.g., fraction of tokens retaining the same red/green assignment) on original versus perturbed images passed back through the VQ-VAE, nor does it ablate whether the observed gains survive when cluster assignments are recomputed on the perturbed image. Without such verification, the advantage over naïve per-token schemes cannot be isolated from potential confounds in cluster stability.

    Authors: We agree that directly quantifying cluster stability under perturbations would provide valuable additional evidence and help isolate the contribution of our clustering approach. In the revised manuscript we will add (i) measurements of the fraction of tokens that retain the same red/green cluster assignment after common perturbations followed by re-encoding through the VQ-VAE, and (ii) an ablation in which cluster assignments are recomputed on the perturbed images before watermark detection. These results will be reported alongside the existing robustness tables to clarify whether the observed gains derive primarily from cluster consistency. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation of clustering-based watermarking stands independently

full rationale

The paper introduces a visual token clustering approach for watermarking autoregressive image generators, drawing inspiration from KGW but implementing and testing the method directly. The central claims rest on experimental results comparing robustness under perturbations and regeneration attacks, with no equations, derivations, or self-citations that reduce the reported gains to fitted parameters, self-definitions, or prior author results by construction. Cluster assignment is presented as a design choice validated empirically rather than assumed or fitted in a way that forces the outcome. The derivation chain is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from self-citations as load-bearing premises.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach depends on the domain assumption that visual tokens from a VQ-VAE exhibit meaningful similarity structure that can be exploited for clustering without additional training in the base case.

axioms (1)
  • domain assumption Visual tokens produced by the VQ-VAE decoder exhibit sufficient similarity structure to allow meaningful clustering into red/green sets.
    Invoked when the paper states that similar tokens are assigned to the same set.

pith-pipeline@v0.9.0 · 5749 in / 1171 out tokens · 37775 ms · 2026-05-18T23:44:03.009423+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Combined dwt-dct digital image watermarking

    Ali Al-Haj. Combined dwt-dct digital image watermarking. Journal of computer science, 3(9):740–746, 2007. 5, 7

  2. [2]

    Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T. Freeman. Maskgit: Masked generative image transformer. In Proc. of IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 11315–11325, June 2022. 1, 2

  3. [3]

    Freeman, Michael Ru- binstein, Yuanzhen Li, and Dilip Krishnan

    Huiwen Chang, Han Zhang, Jarred Barber, Aaron Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Patrick Murphy, William T. Freeman, Michael Ru- binstein, Yuanzhen Li, and Dilip Krishnan. Muse: Text- to-image generation via masked generative transformers. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Bar- bara Engelhardt, Sivan Sabato, and Jonat...

  4. [4]

    Undetectable watermarks for language models

    Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In Proc. of the An- nual ACM Workshop on Computational Learning Theory (COLT), volume 210 of Proceedings of Machine Learning Research, pages 1125–1139. PMLR, 2024. URL https: //proceedings.mlr.press/v210/christ24a. html. 2

  5. [6]

    WMAdapter: Adding watermark control to latent dif- fusion models, 2024

    Hai Ci, Yiren Song, Pei Yang, Jinheng Xie, and Mike Zheng Shou. WMAdapter: Adding watermark control to latent dif- fusion models, 2024. 7

  6. [7]

    RingID: Rethinking tree-ring watermarking for enhanced multi-key identification

    Hai Ci, Pei Yang, Yiren Song, and Mike Zheng Shou. RingID: Rethinking tree-ring watermarking for enhanced multi-key identification. In Aleˇs Leonardis, Elisa Ricci, Ste- fan Roth, Olga Russakovsky, Torsten Sattler, and G¨ul Varol, editors, Proc. of the European Conference on Computer Vi- sion (ECCV), pages 338–354, Cham, 2024. Springer Nature Switzerland....

  7. [8]

    Digital Watermarking and Steganography

    Ingemar Cox, Matthew Miller, Jeffrey Bloom, Jessica Fridrich, and Ton Kalker. Digital Watermarking and Steganography. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2 edition, 2007. 7

  8. [9]

    Fluid: Scaling autoregressive text-to-image gen- erative models with continuous tokens

    Lijie Fan, Tianhong Li, Siyang Qin, Yuanzhen Li, Chen Sun, Michael Rubinstein, Deqing Sun, Kaiming He, and Yong- long Tian. Fluid: Scaling autoregressive text-to-image gen- erative models with continuous tokens. InInternational Con- ference on Learning Representations (ICLR), 2025. 1, 2

  9. [10]

    The stable signature: Rooting watermarks in latent diffusion models

    Pierre Fernandez, Guillaume Couairon, Herv ´e J ´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. In Proc. of IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2023. 1, 7

  10. [11]

    An unde- tectable watermark for generative image models

    Sam Gunn, Xuandong Zhao, and Dawn Song. An unde- tectable watermark for generative image models. In Inter- national Conference on Learning Representations (ICLR) ,

  11. [12]

    Gans trained by a two time-scale update rule converge to a local nash equilib- rium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Ad- vances in Neural Information Proccessing Systems (NIPS) , volume 30, 2017. 5

  12. [13]

    Watermarking autoregressive image generation, 2025

    Nikola Jovanovi ´c, Ismail Labiad, Tom ´aˇs Sou ˇcek, Martin Vechev, and Pierre Fernandez. Watermarking autoregressive image generation, 2025. 8

  13. [14]

    A watermark for large language models

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Proc. of Int. Conference on Machine Learning (ICML), pages 17061–17084. PMLR, 2023. 1, 2

  14. [15]

    On the reliabil- ity of watermarks for large language models

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliabil- ity of watermarks for large language models. InInternational Conference on Learning Representations (ICLR), 2024. 2

  15. [16]

    Robust distortion-free watermarks for lan- guage models

    Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for lan- guage models. Transactions on Machine Learning Research (TMLR), 2024. 2

  16. [17]

    Fu, Christopher Re, and David W

    Hermann Kumbong, Xian Liu, Tsung-Yi Lin, Ming-Yu Liu, Xihui Liu, Ziwei Liu, Daniel Y . Fu, Christopher Re, and David W. Romero. Hmar: Efficient hierarchical masked auto-regressive image generation. In Proc. of IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR) , pages 2535–2544, June 2025. 1, 2

  17. [18]

    Robust watermarking using generative pri- ors against image editing: From benchmarking to advances

    Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. Robust watermarking using generative pri- ors against image editing: From benchmarking to advances. In International Conference on Learning Representations (ICLR), 2025. URL https : / / openreview . net / forum?id=16O8GCm8Wn. 7

  18. [19]

    Generative ai misuse: A taxonomy of tactics and insights from real-world data, 2024

    Nahema Marchal, Rachel Xu, Rasmi Elasmar, Iason Gabriel, Beth Goldberg, and William Isaac. Generative ai misuse: A taxonomy of tactics and insights from real-world data, 2024. 1

  19. [20]

    K. A. Navas, Mathews Cheriyan Ajay, M. Lekshmi, Tampy S. Archana, and M. Sasikumar. Dwt-dct-svd based watermarking. In 2008 3rd International Conference on Communication Systems Software and Middleware and Workshops (COMSWARE ’08), pages 271–274, 2008. doi: 10.1109/COMSW A.2008.4554423. 5 9

  20. [21]

    A robust semantics- based watermark for large language model against para- phrasing

    Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. A robust semantics- based watermark for large language model against para- phrasing. In Findings of the Association for Computational Linguistics: NAACL, pages 613–625, 2024. 2, 8

  21. [22]

    High-resolution image syn- thesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. In Proc. of IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR) , pages 10684–10695, June 2022. 5, 11

  22. [23]

    ImageNet Large Scale Visual Recognition Challenge

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet large scale visual recognition chal- lenge. International Journal of Computer Vision (IJCV), 115 (3):211–252, 2015. doi: 10.1007/s11263-015-0816-y. 4

  23. [24]

    Autoregressive model beats diffusion: Llama for scalable image generation, 2024

    Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, and Zehuan Yuan. Autoregressive model beats diffusion: Llama for scalable image generation, 2024. 1, 2, 4

  24. [25]

    Stegastamp: Invisible hyperlinks in physical photographs

    Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp: Invisible hyperlinks in physical photographs. In Proc. of IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 2117–2126, 2020. 7

  25. [26]

    HART: Efficient visual generation with hybrid autoregressive transformer

    Haotian Tang, Yecheng Wu, Shang Yang, Enze Xie, Junsong Chen, Junyu Chen, Zhuoyang Zhang, Han Cai, Yao Lu, and Song Han. HART: Efficient visual generation with hybrid autoregressive transformer. In International Conference on Learning Representations (ICLR), 2025. 1, 2

  26. [27]

    Visual autoregressive modeling: Scalable im- age generation via next-scale prediction

    Keyu Tian, Yi Jiang, Zehuan Yuan, BINGYUE PENG, and Liwei Wang. Visual autoregressive modeling: Scalable im- age generation via next-scale prediction. InAdvances in Neu- ral Information Proccessing Systems (NeurIPS), 2024. 1, 2

  27. [28]

    Training-free watermarking for autoregressive image gener- ation, 2025

    Yu Tong, Zihao Pan, Shuai Yang, and Kaiyang Zhou. Training-free watermarking for autoregressive image gener- ation, 2025. 5, 8, 11

  28. [29]

    Emu3: Next-token prediction is all you need, 2024

    Xinlong Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Zhen Li, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li, Boya Wu, Bo Zhao, Bowen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, and Zhongyuan Wang. Emu3: Next-token prediction is all you need, 2024. 1, 2

  29. [30]

    Bridging continuous and discrete tokens for autoregressive visual generation

    Yuqing Wang, Zhijie Lin, Yao Teng, Yuanzhi Zhu, Shuhuai Ren, Jiashi Feng, and Xihui Liu. Bridging continuous and discrete tokens for autoregressive visual generation. InProc. of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), 2025. 1, 2

  30. [31]

    Tree-Ring watermarks: Invisible fingerprints for diffusion images

    Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-Ring watermarks: Invisible fingerprints for diffusion images. In Advances in Neural Information Proc- cessing Systems (NeurIPS), 2023. 1, 6, 7, 8

  31. [32]

    A resilient and accessible distribution- preserving watermark for large language models

    Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang. A resilient and accessible distribution- preserving watermark for large language models. In Proc. of Int. Conference on Machine Learning (ICML), pages 53443– 53470. PMLR, 2024. 2

  32. [33]

    A watermark for auto-regressive image gener- ation models, 2025

    Yihan Wu, Xuehao Cui, Ruibo Chen, Georgios Milis, and Heng Huang. A watermark for auto-regressive image gener- ation models, 2025. 8

  33. [34]

    Flexible and secure watermarking for latent diffusion model

    Cheng Xiong, Chuan Qin, Guorui Feng, and Xinpeng Zhang. Flexible and secure watermarking for latent diffusion model. In Proc. ACM Internation Conference on Multime- dia (ACM MM), MM ’23, page 1668–1676, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400701085. doi: 10.1145/3581783.3612448. 7

  34. [35]

    Gpt-imgeval: A comprehensive benchmark for diagnosing gpt4o in image generation, 2025

    Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Sheng- hai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, and Li Yuan. Gpt-imgeval: A comprehensive benchmark for diagnosing gpt4o in image generation, 2025. 1

  35. [36]

    Gaussian Shading: Prov- able performance-lossless image watermarking for diffusion models

    Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weim- ing Zhang, and Nenghai Yu. Gaussian Shading: Prov- able performance-lossless image watermarking for diffusion models. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 7, 8

  36. [37]

    Robust multi-bit natural language watermarking through in- variant features

    KiYoon Yoo, Wonhyuk Ahn, Jiho Jang, and Nojun Kwak. Robust multi-bit natural language watermarking through in- variant features. In Proc. of the Annual Meeting of the As- sociation for Computational Linguistics (ACL), pages 2092– 2115, 2023. 2

  37. [38]

    Scaling autoregressive models for content-rich text-to-image generation

    Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gun- jan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yin- fei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. Scaling autoregressive models for content-rich text-to-image generation. Transactions on Machine Learn- ing Research, 202...

  38. [39]

    Randomized autoregressive visual generation,

    Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, and Liang- Chieh Chen. Randomized autoregressive visual generation,

  39. [40]

    Robust invisible video watermark- ing with attention, 2019

    Kevin Alex Zhang, Lei Xu, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Robust invisible video watermark- ing with attention, 2019. 5, 7

  40. [41]

    Invisible image watermarks are provably removable using generative AI

    Xuandong Zhao, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, and Lei Li. Invisible image watermarks are provably removable using generative AI. In Advances in Neural Information Proccessing Systems (NeurIPS) ,

  41. [42]

    URL https://openreview.net/forum?id= 7hy5fy2OC6. 5, 11

  42. [43]

    Hidden: Hiding data with deep networks

    Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. Hidden: Hiding data with deep networks. In Proc. of the Eu- ropean Conference on Computer Vision (ECCV), pages 657– 672, 2018. 7 10 A. Full Experimental Setup and Results A.1. Image Generation Settings Table 3 shows detailed generation settings for the models (LlamaGen2 and RAR3) used in this pape...