pith. sign in

arxiv: 2605.16720 · v1 · pith:MMZPTCDLnew · submitted 2026-05-16 · 💻 cs.CV · cs.LG

Compositional Adversarial Training for Robust Visual Watermarking

Pith reviewed 2026-05-19 21:49 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords robust visual watermarkingcompositional adversarial trainingGumbel-Softmaxmin-max optimizationattack compositionwatermark robustnessimage watermarkingvideo watermarking
0
0 comments X

The pith

Training visual watermarks against learned sequences of attacks produces higher robustness than random augmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard training for visual watermarks relies on random post-processing augmentations, but these rarely sample the combinations of attacks that actually remove the embedded message. The paper instead casts robustness as a min-max problem over structured compositional transformations and solves it with a learned sequential adversary. The adversary watches the current watermarked image and picks attack families step by step to maximize detection failure, using straight-through Gumbel-Softmax selection plus entropy regularization so gradients flow end-to-end. When this adversary is used during training, the resulting watermarks retain more message capacity under both single-step and multi-step attacks and generalize better to images and videos outside the training distribution.

Core claim

Formulating watermark robustness as a min-max problem over a structured space of compositional transformations and solving it via a sequential differentiable adversary selected with Gumbel-Softmax yields watermarks whose capacity improves by up to 63.5 percent in single-step attack settings and 13.0 percent in the compositional setting, with the largest gains on hard composed attacks and out-of-distribution evaluations.

What carries the argument

Compositional Adversarial Training (CAT), which trains a sequential adversary to select and compose attack families step-by-step using straight-through Gumbel-Softmax for end-to-end differentiability and entropy regularization to prevent mode collapse.

If this is right

  • Watermark capacity rises by up to 63.5 percent under single-step attacks and 13.0 percent under compositional attacks.
  • The largest gains appear on hard composed attacks and on out-of-distribution image and video benchmarks.
  • In the autoregressive setting, true-positive rate at one-percent false-positive rate improves by 12 percent on difficult geometric transformations.
  • The framework acts as a plug-in that improves existing models such as VideoSeal and PixelSeal without changing their architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sequential-adversary idea could be tested on other media such as audio or text watermarks where attack compositions also matter.
  • Extending the attack sequence length or adding more attack families may produce further gains provided the entropy term continues to discourage collapse.
  • The results suggest that any robustness training problem currently solved with random sampling might benefit from replacing it with a learned compositional adversary.

Load-bearing premise

The learned sequential adversary with Gumbel-Softmax selection can reliably cover the combinatorial space of realistic attack pipelines without missing critical compositions or collapsing to one attack mode.

What would settle it

Run both CAT-trained and random-augmentation watermarks on a fixed test suite of rare composed attack pipelines never seen in training and observe no capacity gain or lower detection rates for the CAT version.

Figures

Figures reproduced from arXiv: 2605.16720 by Andrew Xu, Anirudh Satheesh, Furong Huang, Georgios Milis, Heng Huang, Michael-Andrei Panaitescu-Liess, Zikui Cai.

Figure 1
Figure 1. Figure 1: Overview of Compositional Adversarial Training (CAT) for visual watermarking. CAT improves the overall bit accuracy by 2.2% and capacity by 17.0% for single-step and compositional attacks. Corresponding author(s): Anirudh Satheesh anirudhsatheesh.com; Email anirudhs@terpmail.umd.edu arXiv:2605.16720v1 [cs.CV] 16 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Random augmentation creates unstable training due to inefficient augmentation allocations, whereas [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Conceptual overview of the proposed training pipeline. The embedder writes message m into image x to produce the watermarked image x0. The adversary then repeatedly observes the current image xt , uses a recurrent controller to produce logits, selects an attack family via straight-through Gumbel-Softmax, and applies differentiable attacks to obtain xt+1. After T steps, the final attacked image xT is passed… view at source ↗
Figure 4
Figure 4. Figure 4: CAT substantially accelerates convergence for both PixelSeal and VideoSeal, and this advantage [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: CAT improves PixelSeal training across payload sizes. Validation bit error rate over training is shown for 32-, 64-, 128-, and 256-bit payloads under no augmentation (gray), random augmentation (blue), and CAT (orange). CAT consistently drives the error lower than random augmentation and remains effective as payload increases, whereas random augmentation plateaus at substantially higher error. E. Additiona… view at source ↗
Figure 6
Figure 6. Figure 6: Continuous attack sweeps for autoregressive watermark robustness. [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: ROC curves for clean autoregressive watermark detection. [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Token-match histograms under clean evaluation. [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Forward transfer from single-step training to two-step attack compositions. [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Backward transfer from compositional training to single-step attacks. [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Additional qualitative results. H. Qualitative Results Figures 11–15 present additional qualitative examples for the image watermarking experiments discussed in the main paper. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Additional qualitative results. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Additional qualitative results. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Additional qualitative results. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Additional qualitative results. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_15.png] view at source ↗
read the original abstract

Robust watermarking is typically trained with random post-processing augmentation, but random sampling under-covers the combinatorial space of realistic attack pipelines and rarely encounters the rare compositions that actually break detection. This leads to unstable training and poor sample efficiency. We instead formulate watermark robustness as a min-max problem over a structured space of compositional transformations. We propose Compositional Adversarial Training (CAT), a plug-in framework that learns a sequential differentiable adversary that observes the current watermarked image and selects an attack family at each step to maximally disrupt message recovery. CAT combines a straight-through Gumbel-Softmax attack selection with entropy regularization, allowing the backward pass to be end-to-end differentiable and aggregate gradient information across attack families, yielding faster, smoother convergence without collapsing to a single attack mode. We evaluate CAT on post-generation watermarks VideoSeal 0.0, VideoSeal 1.0, and PixelSeal and in-generation WMAR under both single-step and two-step attack suites, on in-distribution and multiple out-of-distribution image and video benchmarks. CAT consistently outperforms random-augmentation baselines trained with the same augmentation budget, with the largest gains on hard composed attacks and OOD evaluations; improving overall watermark capacity by up to $63.5\%$ in the single-step attack setting and $13.0\%$ in the compositional setting. In the autoregressive setting, CAT improves the TPR@FPR$=1\%$ by $12\%$ on average on difficult geometric transformations. These results show that robust visual watermarking benefits from training against adaptive compositional adversaries rather than independent random corruptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Compositional Adversarial Training (CAT) for robust visual watermarking. It models watermark robustness as a min-max problem over compositional transformations and proposes a plug-in framework with a sequential differentiable adversary that uses straight-through Gumbel-Softmax selection combined with entropy regularization to avoid mode collapse. The approach is tested on VideoSeal 0.0, VideoSeal 1.0, PixelSeal, and WMAR under single-step and two-step attacks on in- and out-of-distribution image and video datasets, reporting capacity gains of up to 63.5% and 13.0% respectively over random-augmentation baselines with matched budget, along with improved TPR@FPR on geometric transformations.

Significance. Should the empirical findings prove robust, this work offers a promising direction for training watermark detectors against realistic, composed attack pipelines that random sampling often misses. By learning an adaptive adversary, it potentially improves sample efficiency and stability in training robust watermarks, which is relevant for protecting digital media. The consistent gains across multiple watermarking methods and settings, including OOD, highlight the practical value if the adversary's exploration of the attack space is adequately demonstrated.

major comments (2)
  1. The central assumption that the Gumbel-Softmax sequential selection with entropy regularization reliably covers the combinatorial space of attack compositions without mode collapse is not sufficiently validated. No ablation studies on the entropy regularization strength or reports on the diversity of selected attack sequences (e.g., frequency of each family) are provided. This is critical because if collapse occurs, the method reduces to standard adversarial training against a subset of attacks, undermining the claimed superiority over random-augmentation baselines with the same budget.
  2. The reported improvements lack error bars, statistical significance tests, or details on the exact attack compositions used in the suites. This makes it difficult to assess the reliability of the 63.5% and 13.0% capacity gains, especially given potential sensitivity to unstated implementation choices in the adversary training.
minor comments (2)
  1. The manuscript should specify the number of attack families and the maximum number of steps in the sequential adversary to allow reproducibility.
  2. Include figures or tables showing example attack compositions or selection probabilities over training to enhance clarity on the learned policy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address each major comment below and outline revisions to strengthen the manuscript's clarity and empirical support.

read point-by-point responses
  1. Referee: The central assumption that the Gumbel-Softmax sequential selection with entropy regularization reliably covers the combinatorial space of attack compositions without mode collapse is not sufficiently validated. No ablation studies on the entropy regularization strength or reports on the diversity of selected attack sequences (e.g., frequency of each family) are provided. This is critical because if collapse occurs, the method reduces to standard adversarial training against a subset of attacks, undermining the claimed superiority over random-augmentation baselines with the same budget.

    Authors: We appreciate this observation. The entropy regularization is explicitly introduced in Section 3.2 to promote diversity across attack families during sequential selection and to mitigate mode collapse, enabling gradient aggregation over the full combinatorial space. However, we agree that direct empirical validation via ablations and diversity statistics is currently absent and would better substantiate the claims. In the revised manuscript we will add (i) an ablation varying the entropy coefficient and reporting its effect on capacity and convergence, and (ii) histograms or tables showing the empirical frequency of each attack family selected by the adversary throughout training. These additions will confirm that the learned policy explores the space more effectively than random sampling under the same budget. revision: yes

  2. Referee: The reported improvements lack error bars, statistical significance tests, or details on the exact attack compositions used in the suites. This makes it difficult to assess the reliability of the 63.5% and 13.0% capacity gains, especially given potential sensitivity to unstated implementation choices in the adversary training.

    Authors: We concur that additional statistical rigor and implementation details would improve interpretability. In the revision we will (i) report mean capacity gains together with standard deviations computed over at least five independent training runs, (ii) include paired statistical significance tests (e.g., Wilcoxon signed-rank) between CAT and the matched-budget random-augmentation baselines, and (iii) provide an appendix table enumerating the precise attack families, parameters, and composition rules used in both the single-step and two-step suites. These changes will allow readers to evaluate the robustness of the reported gains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains from CAT vs. matched-budget random baselines are externally validated

full rationale

The paper defines watermark robustness as a min-max problem over compositional transformations and solves it via the CAT procedure (straight-through Gumbel-Softmax selection plus entropy regularization). Reported improvements (63.5% single-step capacity, 13% compositional, 12% TPR@FPR=1%) are measured on held-out in-distribution and OOD image/video benchmarks against random-augmentation baselines trained with identical budget. No equation, prediction, or central claim reduces by construction to a fitted input, self-definition, or self-citation chain; the evaluation remains falsifiable on external data and does not rename known results or smuggle ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into modeling assumptions; standard differentiability of image transformations and attack families is presupposed, with no explicit free parameters or invented entities named.

axioms (1)
  • domain assumption The space of realistic attack pipelines can be structured as sequential compositions of differentiable transformation families.
    Invoked in the min-max formulation of watermark robustness over compositional transformations.

pith-pipeline@v0.9.0 · 5843 in / 1222 out tokens · 29807 ms · 2026-05-19T21:49:34.038254+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 6 internal anchors

  1. [1]

    Explaining and Harnessing Adversarial Examples

    Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

  2. [2]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Towards deep learning models resistant to adversarial attacks , author=. arXiv preprint arXiv:1706.06083 , year=

  3. [3]

    arXiv preprint arXiv:2210.02577 , year=

    A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition , author=. arXiv preprint arXiv:2210.02577 , year=

  4. [4]

    International conference on machine learning , pages=

    Population based augmentation: Efficient learning of augmentation policy schedules , author=. International conference on machine learning , pages=. 2019 , organization=

  5. [5]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Trivialaugment: Tuning-free yet state-of-the-art data augmentation , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  6. [6]

    arXiv preprint arXiv:2006.12655 , year=

    Perceptual adversarial robustness: Defense against unseen threat models , author=. arXiv preprint arXiv:2006.12655 , year=

  7. [7]

    Intriguing properties of neural networks

    Intriguing properties of neural networks , author=. arXiv preprint arXiv:1312.6199 , year=

  8. [8]

    2017 ieee symposium on security and privacy (sp) , pages=

    Towards evaluating the robustness of neural networks , author=. 2017 ieee symposium on security and privacy (sp) , pages=. 2017 , organization=

  9. [9]

    Advances in neural information processing systems , volume=

    Adversarial training and robustness for multiple perturbations , author=. Advances in neural information processing systems , volume=

  10. [10]

    International Conference on Machine Learning , pages=

    Adversarial robustness against the union of multiple perturbation models , author=. International Conference on Machine Learning , pages=. 2020 , organization=

  11. [11]

    International conference on machine learning , pages=

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , author=. International conference on machine learning , pages=. 2020 , organization=

  12. [12]

    Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security , pages=

    Sat: Improving adversarial training via curriculum-based loss smoothing , author=. Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security , pages=

  13. [13]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Towards compositional adversarial robustness: Generalizing adversarial training to composite semantic perturbations , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  14. [14]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Autoaugment: Learning augmentation strategies from data , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  15. [15]

    European conference on computer vision , pages=

    Differentiable automatic data augmentation , author=. European conference on computer vision , pages=. 2020 , organization=

  16. [16]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Direct differentiable augmentation search , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  17. [17]

    arXiv preprint arXiv:1912.11188 , year=

    Adversarial autoaugment , author=. arXiv preprint arXiv:1912.11188 , year=

  18. [18]

    arXiv preprint arXiv:2506.16349 , year=

    Watermarking autoregressive image generation , author=. arXiv preprint arXiv:2506.16349 , year=

  19. [19]

    arXiv preprint arXiv:2310.00076 , year=

    Robustness of ai-image detectors: Fundamental limits and practical attacks , author=. arXiv preprint arXiv:2310.00076 , year=

  20. [20]

    Diffusion models for adversarial purifi- cation.arXiv preprint arXiv:2205.07460,

    Diffusion models for adversarial purification , author=. arXiv preprint arXiv:2205.07460 , year=

  21. [21]

    Proceedings of international conference on image processing , volume=

    DCT-based watermark recovering without resorting to the uncorrupted original image , author=. Proceedings of international conference on image processing , volume=. 1997 , organization=

  22. [22]

    Optics Express , volume=

    Wavelet transform based watermark for digital images , author=. Optics Express , volume=. 1998 , publisher=

  23. [23]

    IEEE transactions on image processing , volume=

    Improved wavelet-based watermarking through pixel-wise masking , author=. IEEE transactions on image processing , volume=. 2001 , publisher=

  24. [24]

    Proceedings of the European conference on computer vision (ECCV) , pages=

    Hidden: Hiding data with deep networks , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

  25. [25]

    Expert Systems with Applications , volume=

    ReDMark: Framework for residual diffusion watermarking based on deep networks , author=. Expert Systems with Applications , volume=. 2020 , publisher=

  26. [26]

    2022 9th International Conference on Behavioural and Social Computing (BESC) , pages=

    Digital Watermarking via Inverse Gradient Attention , author=. 2022 9th International Conference on Behavioural and Social Computing (BESC) , pages=. 2022 , organization=

  27. [27]

    Proceedings of the 29th ACM international conference on multimedia , pages=

    Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression , author=. Proceedings of the 29th ACM international conference on multimedia , pages=

  28. [28]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    The stable signature: Rooting watermarks in latent diffusion models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  29. [29]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Trustmark: Robust watermarking and watermark removal for arbitrary resolution images , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  30. [30]

    Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust

    Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust , author=. arXiv preprint arXiv:2305.20030 , year=

  31. [31]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Editguard: Versatile image watermarking for tamper localization and copyright protection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  32. [32]

    2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

    InvisMark: Invisible and robust watermarking for AI-generated image provenance , author=. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=. 2025 , organization=

  33. [33]

    Robust watermarking using generative priors against image editing: From benchmarking to advances.arXiv preprint arXiv:2410.18775, 2024

    Robust watermarking using generative priors against image editing: From benchmarking to advances , author=. arXiv preprint arXiv:2410.18775 , year=

  34. [34]

    arXiv preprint arXiv:1909.01285 , year=

    Robust invisible video watermarking with attention , author=. arXiv preprint arXiv:1909.01285 , year=

  35. [35]

    Video seal: Open and efficient video watermarking

    Video seal: Open and efficient video watermarking , author=. arXiv preprint arXiv:2412.09492 , year=

  36. [36]

    arXiv preprint arXiv:1910.01221 , year=

    Romark: A robust watermarking system using adversarial training , author=. arXiv preprint arXiv:1910.01221 , year=

  37. [37]

    IEEE Transactions on Image Processing , year=

    Dvmark: a deep multiscale framework for video watermarking , author=. IEEE Transactions on Image Processing , year=

  38. [38]

    , author=

    VStegNET: Video Steganography Network using Spatio-Temporal features and Micro-Bottleneck. , author=. BMVC , volume=

  39. [39]

    Journal of Information Security and Applications , volume=

    VHNet: A video hiding network with robustness to video coding , author=. Journal of Information Security and Applications , volume=. 2023 , publisher=

  40. [40]

    Pacific Conference on Computer Graphics and Applications (Huangshan, CN)(PG’24)

    StegaVideo: Robust High-Resolution Video Steganography with Temporal and Edge Guidance , author=. Pacific Conference on Computer Graphics and Applications (Huangshan, CN)(PG’24). EG, Eindhoven, NL, Article , volume=

  41. [41]

    IEEE Transactions on Circuits and Systems for Video Technology , volume=

    Robust and compatible video watermarking via spatio-temporal enhancement and multiscale pyramid attention , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2024 , publisher=

  42. [42]

    2023 International Conference on Culture-Oriented Science and Technology (CoST) , pages=

    ItoV: efficiently adapting deep learning-based image watermarking to video watermarking , author=. 2023 International Conference on Culture-Oriented Science and Technology (CoST) , pages=. 2023 , organization=

  43. [43]

    Advances in Neural Information Processing Systems , volume=

    Robin: Robust and invisible watermarks for diffusion models with adversarial optimization , author=. Advances in Neural Information Processing Systems , volume=

  44. [44]

    arXiv preprint arXiv:2506.20370 , year=

    InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking , author=. arXiv preprint arXiv:2506.20370 , year=

  45. [45]

    Advances in neural information processing systems , volume=

    Invisible image watermarks are provably removable using generative ai , author=. Advances in neural information processing systems , volume=

  46. [46]

    arXiv preprint arXiv:2412.12511 , year=

    Invisible Watermarks: Attacks and Robustness , author=. arXiv preprint arXiv:2412.12511 , year=

  47. [47]

    International Conference on Machine Learning , pages=

    WAVES: Benchmarking the Robustness of Image Watermarks , author=. International Conference on Machine Learning , pages=. 2024 , organization=

  48. [48]

    arXiv preprint arXiv:2512.16874 , year=

    Pixel Seal: Adversarial-only training for invisible image and video watermarking , author=. arXiv preprint arXiv:2512.16874 , year=

  49. [49]

    International Conference on Learning Representations , year=

    Categorical Reparameterization with Gumbel-Softmax , author=. International Conference on Learning Representations , year=

  50. [50]

    2024 , booktitle=

    Erasing the Invisible: A Stress-Test Challenge for Image Watermarks , author=. 2024 , booktitle=

  51. [51]

    DINOv3

    Dinov3 , author=. arXiv preprint arXiv:2508.10104 , year=

  52. [52]

    International conference on machine learning , pages=

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , author=. International conference on machine learning , pages=. 2018 , organization=

  53. [53]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Segment anything , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  54. [54]

    Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

    Ntire 2017 challenge on single image super-resolution: Dataset and study , author=. Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

  55. [55]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages=

    Low bitrate image compression with discretized gaussian mixture likelihoods , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops , pages=

  56. [56]

    , author=

    A Compression Objective and a Cycle Loss for Neural Image Compression. , author=. CVPR Workshops , pages=

  57. [57]

    Advances in neural information processing systems , volume=

    Training generative adversarial networks with limited data , author=. Advances in neural information processing systems , volume=

  58. [58]

    Movie Gen: A Cast of Media Foundation Models

    Movie gen: A cast of media foundation models , author=. arXiv preprint arXiv:2410.13720 , year=

  59. [59]

    SAM 2: Segment Anything in Images and Videos

    Sam 2: Segment anything in images and videos , author=. arXiv preprint arXiv:2408.00714 , year=

  60. [60]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Taming transformers for high-resolution image synthesis , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  61. [61]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Randomized autoregressive visual generation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=