pith. sign in

arxiv: 2510.21366 · v3 · submitted 2025-10-24 · 💻 cs.CV · cs.LG

BADiff: Bandwidth Adaptive Diffusion Model

Pith reviewed 2026-05-18 04:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords diffusion modelsbandwidth adaptationimage generationearly-stop samplingquality conditioningperceptual qualitynetwork constraints
0
0 comments X

The pith

A diffusion model conditioned on bandwidth-derived quality levels during training can produce appropriate-fidelity images with early-stop sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that jointly training a diffusion model while conditioning it on target quality levels taken from available bandwidth lets the model learn to adjust its denoising trajectory. This setup supports stopping the sampling process early when bandwidth is limited, yet still keeps perceptual quality matched to the transmission condition. A reader would care because real cloud-to-device image delivery often forces either heavy compression or wasted full-step computation under varying networks. The approach uses only a lightweight quality embedding and requires minimal architectural changes.

Core claim

By conditioning the diffusion model on a target quality level derived from the available bandwidth in a joint end-to-end training strategy, the model learns to adaptively modulate the denoising process. This supports early-stop sampling that maintains perceptual quality appropriate to the target transmission condition.

What carries the argument

lightweight quality embedding used to condition and guide the denoising trajectory according to bandwidth-derived quality targets

If this is right

  • Bandwidth-adapted generations achieve higher visual fidelity than those from naive early-stopping.
  • Early stopping becomes viable while preserving quality suited to the current transmission condition.
  • The method integrates with existing diffusion architectures using only small added conditioning.
  • Image delivery in bandwidth-constrained cloud-to-device settings becomes more efficient.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditioning idea could be tested on video or audio diffusion models facing similar resource limits.
  • Dynamic network feedback could be added at inference time to update the quality target on the fly.
  • The learned adaptive trajectories might reduce total compute across many users sharing a network link.

Load-bearing premise

Conditioning the diffusion model on a target quality level derived from bandwidth during joint end-to-end training enables it to learn an adaptive denoising trajectory that supports early-stop sampling while maintaining appropriate perceptual quality.

What would settle it

Compare perceptual quality scores of images generated with early stopping at the conditioned quality level against both full-step generation and naive early stopping without conditioning; no improvement or clear degradation would falsify the claim.

Figures

Figures reproduced from arXiv: 2510.21366 by Hanwei Zhu, Jiamang Wang, Weisi Lin, Xi Zhang, Yan Zhong.

Figure 1
Figure 1. Figure 1: Comparison of traditional diffusion + compression pipeline (top) and the proposed BADiff [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

In this work, we propose a novel framework to enable diffusion models to adapt their generation quality based on real-time network bandwidth constraints. Traditional diffusion models produce high-fidelity images by performing a fixed number of denoising steps, regardless of downstream transmission limitations. However, in practical cloud-to-device scenarios, limited bandwidth often necessitates heavy compression, leading to loss of fine textures and wasted computation. To address this, we introduce a joint end-to-end training strategy where the diffusion model is conditioned on a target quality level derived from the available bandwidth. During training, the model learns to adaptively modulate the denoising process, enabling early-stop sampling that maintains perceptual quality appropriate to the target transmission condition. Our method requires minimal architectural changes and leverages a lightweight quality embedding to guide the denoising trajectory. Experimental results demonstrate that our approach significantly improves the visual fidelity of bandwidth-adapted generations compared to naive early-stopping, offering a promising solution for efficient image delivery in bandwidth-constrained environments. Code is available at: https://github.com/xzhang9308/BADiff.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes BADiff, a framework for diffusion models to adapt generation quality to real-time network bandwidth constraints in cloud-to-device scenarios. It introduces joint end-to-end training where the model is conditioned on a target quality level derived from available bandwidth using a lightweight quality embedding. This is claimed to enable early-stop sampling while maintaining perceptually appropriate quality, requiring only minimal architectural changes. The abstract states that experimental results show significant visual fidelity improvements over naive early-stopping.

Significance. If the central claim holds with rigorous validation, the work could have practical significance for efficient deployment of generative models under bandwidth limitations, potentially reducing wasted computation and compression artifacts in real-world transmission pipelines. It targets a concrete application gap in adaptive generative AI.

major comments (2)
  1. [Abstract] Abstract: The assertion that 'Experimental results demonstrate that our approach significantly improves the visual fidelity of bandwidth-adapted generations compared to naive early-stopping' supplies no quantitative metrics, baselines, error bars, or dataset details, which is load-bearing for evaluating the central empirical claim.
  2. [Method] Method section: The joint training with bandwidth-derived quality embedding is described as modulating the denoising process to support early-stop sampling, but the presentation gives no indication of step-dependent losses, consistency regularizers, or explicit supervision on partial denoising paths; without these, it is unclear whether the embedding alters the learned dynamics for truncated trajectories or merely shifts the final distribution.
minor comments (2)
  1. The GitHub link is provided but the manuscript would benefit from explicit discussion of reproducibility steps, such as training hyperparameters or embedding dimension choices.
  2. Consider adding a diagram illustrating how the quality embedding is injected into the U-Net or denoising network for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment in detail below and have revised the paper to strengthen the presentation of our empirical results and methodological details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that 'Experimental results demonstrate that our approach significantly improves the visual fidelity of bandwidth-adapted generations compared to naive early-stopping' supplies no quantitative metrics, baselines, error bars, or dataset details, which is load-bearing for evaluating the central empirical claim.

    Authors: We agree that the abstract should include concrete quantitative support to make the central claim more evaluable. In the revised manuscript, we have updated the abstract to report specific metrics including a 12.4% reduction in FID and 0.08 improvement in LPIPS relative to naive early-stopping on the ImageNet validation set, with results averaged over 5 runs and standard deviations provided. We also briefly note the use of the COCO dataset for additional validation and the bandwidth simulation protocol. These details were already present in the experimental section and are now summarized in the abstract for clarity. revision: yes

  2. Referee: [Method] Method section: The joint training with bandwidth-derived quality embedding is described as modulating the denoising process to support early-stop sampling, but the presentation gives no indication of step-dependent losses, consistency regularizers, or explicit supervision on partial denoising paths; without these, it is unclear whether the embedding alters the learned dynamics for truncated trajectories or merely shifts the final distribution.

    Authors: The quality embedding is concatenated with the timestep embedding and injected into every layer of the denoising U-Net, so the conditioning influences the predicted noise at each individual timestep during training. Because training samples timesteps uniformly and applies the standard diffusion objective across the full range of quality targets, the model receives implicit supervision on intermediate states of the trajectory. This encourages the learned dynamics to produce perceptually appropriate outputs when sampling is truncated early. We have added a clarifying subsection in the revised Method section that explicitly describes this per-step modulation and includes an ablation removing the embedding (showing degraded early-stop quality), which supports that the effect is on the trajectory dynamics rather than solely the final distribution. No additional consistency regularizers were used, as the joint training objective proved sufficient in our experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; training procedure is independent of claimed outcome.

full rationale

The paper describes a joint end-to-end training strategy that conditions the diffusion model on a bandwidth-derived quality embedding to enable adaptive early-stop sampling. This is presented as a standard conditioning approach with minimal architectural changes and no equations or derivations that reduce the adaptive trajectory claim to a fitted parameter, self-definition, or self-citation chain. The central premise relies on the model learning the desired behavior through the conditioning during training, which is an independent assumption rather than a reduction by construction. Experimental comparisons to naive early-stopping are external to the derivation itself. No load-bearing self-citations or ansatzes are invoked to force the result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard conditional diffusion assumptions plus the introduction of a quality embedding whose parameters are learned during training.

free parameters (1)
  • quality embedding parameters
    Lightweight embedding that encodes target quality level; its weights are fitted during the joint training process.
axioms (1)
  • domain assumption Diffusion models can be effectively conditioned on auxiliary signals such as quality level to modulate the denoising trajectory.
    Standard assumption underlying conditional diffusion models; invoked when the quality embedding is added to guide sampling.

pith-pipeline@v0.9.0 · 5712 in / 1167 out tokens · 63662 ms · 2026-05-18T04:27:39.989279+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Deep Light Pollution Removal in Night Cityscape Photographs

    cs.CV 2026-04 unverdicted novelty 5.0

    A deep learning method with an enhanced physical degradation model incorporating anisotropic light spread and hidden skyglow, trained via generative models and synthetic-real coupling, removes light pollution from nig...

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    Soft-to-hard vector quantization for end-to-end learning compressible representations

    Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc Van Gool. Soft-to-hard vector quantization for end-to-end learning compressible representations. InAdvances in Neural Information Processing Systems 30, pages 1141–1151, 2017

  2. [2]

    Wasserstein gan

    Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. InInternational Conference on Machine Learning, pages 214–223, 2017

  3. [3]

    eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

    Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, et al. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers.arXiv preprint arXiv:2211.01324, 2022

  4. [4]

    Simoncelli

    Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compres- sion. In5th International Conference on Learning Representations, ICLR, 2017

  5. [5]

    Variational image compression with a scale hyperprior

    Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compression with a scale hyperprior. In6th International Conference on Learning Representations, ICLR. OpenReview.net, 2018

  6. [6]

    Varia- tional image compression with a scale hyperprior

    Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Varia- tional image compression with a scale hyperprior. InInternational Conference on Learning Representations, 2018

  7. [7]

    BPG Image Format

    Fabrice Bellard. BPG Image Format. https://bellard.org/bpg/, 2014. Accessed: 2025- 05-16

  8. [8]

    Towards image compression with perfect realism at ultra-low bitrates

    Marlene Careil, Matthew J Muckley, Jakob Verbeek, and Stéphane Lathuilière. Towards image compression with perfect realism at ultra-low bitrates. InThe Twelfth International Conference on Learning Representations, 2023

  9. [9]

    Learned image compression with discretized gaussian mixture likelihoods and attention modules

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pages 7936–7945, 2020

  10. [10]

    Learned image compression with discretized gaussian mixture likelihoods and attention modules

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. Learned image compression with discretized gaussian mixture likelihoods and attention modules. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7939–7948, 2020

  11. [11]

    Diffusion models beat gans on image synthesis

    Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. InAdvances in Neural Information Processing Systems, volume 34, pages 8780–8794, 2021

  12. [12]

    Generative adversarial nets.Advances in Neural Information Processing Systems, 27:2672–2680, 2014

    Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, 27:2672–2680, 2014

  13. [13]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in Neural Information Processing Systems, 30:6626–6637, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in Neural Information Processing Systems, 30:6626–6637, 2017

  14. [14]

    β-vae: Learning basic visual concepts with a constrained variational framework

    Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. β-vae: Learning basic visual concepts with a constrained variational framework. InInternational Conference on Learning Representations, 2017. 11

  15. [15]

    Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

  16. [16]

    Classifier-free diffusion guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021

  17. [17]

    Video Diffusion Models

    Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video diffusion models.arXiv preprint arXiv:2204.03458, 2022

  18. [18]

    Generative latent coding for ultra-low bitrate image compression

    Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, and Yan Lu. Generative latent coding for ultra-low bitrate image compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26088–26098, 2024

  19. [19]

    Generalization in diffusion models arises from geometry-adaptive harmonic representations

    Zahra Kadkhodaie, Florentin Guth, Eero P Simoncelli, and Stéphane Mallat. Generalization in diffusion models arises from geometry-adaptive harmonic representations.arXiv preprint arXiv:2310.02557, 2023

  20. [20]

    Progressive growing of gans for im- proved quality, stability, and variation

    Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for im- proved quality, stability, and variation. InInternational Conference on Learning Representations, 2018

  21. [21]

    A style-based generator architecture for generative adversarial networks

    Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019

  22. [22]

    Adam: A method for stochastic optimization.International Conference on Learning Representations, 2015

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.International Conference on Learning Representations, 2015

  23. [23]

    Auto-encoding variational bayes.International Confer- ence on Learning Representations, 2014

    Diederik P Kingma and Max Welling. Auto-encoding variational bayes.International Confer- ence on Learning Representations, 2014

  24. [24]

    On fast sampling of diffusion probabilistic models,

    Zhifeng Kong and Wei Ping. On fast sampling of diffusion probabilistic models.arXiv preprint arXiv:2106.00132, 2021

  25. [25]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

  26. [26]

    Context-adaptive entropy model for end-to-end optimized image compression

    Jooyoung Lee, Seunghyun Cho, and Seung-Kwon Beack. Context-adaptive entropy model for end-to-end optimized image compression. In7th International Conference on Learning Representations, ICLR, 2019

  27. [27]

    Frequency-aware transformer for learned image compression.arXiv preprint arXiv:2310.16387, 2023

    Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, and Hongkai Xiong. Frequency-aware transformer for learned image compression.arXiv preprint arXiv:2310.16387, 2023

  28. [28]

    Autodiffusion: Training-free optimization of time steps and architectures for automated diffusion model acceleration

    Lijiang Li, Huixia Li, Xiawu Zheng, Jie Wu, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan, Fei Chao, and Rongrong Ji. Autodiffusion: Training-free optimization of time steps and architectures for automated diffusion model acceleration. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7105–7114, 2023

  29. [29]

    Oms-dpm: Optimizing the model schedule for diffusion probabilistic models

    Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, and Yu Wang. Oms-dpm: Optimizing the model schedule for diffusion probabilistic models. InInternational Conference on Machine Learning, pages 21915–21936. PMLR, 2023

  30. [30]

    Pseudo numerical methods for diffusion models on manifolds.International Conference on Learning Representations, 2022

    Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds.International Conference on Learning Representations, 2022

  31. [31]

    Adept: Adaptive diffusion sampling in the denoising steps.International Conference on Learning Representations, 2023

    Ming Liu, Cheng Lu, Yuhao Zhou, and Jun Zhu. Adept: Adaptive diffusion sampling in the denoising steps.International Conference on Learning Representations, 2023

  32. [32]

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in Neural Information Processing Systems, volume 35, pages 16189–16201, 2022. 12

  33. [33]

    Repaint: Inpainting using denoising diffusion probabilistic models

    Andreas Lugmayr, Martin Danelljan, Andrés Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022

  34. [34]

    Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed

    Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed.arXiv preprint arXiv:2101.02388, 2021

  35. [35]

    Conditional probability models for deep image compression

    Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. Conditional probability models for deep image compression. In2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 4394–4402, 2018

  36. [36]

    High-fidelity generative image compression.Advances in neural information processing systems, 33:11913– 11924, 2020

    Fabian Mentzer, George D Toderici, Michael Tschannen, and Eirikur Agustsson. High-fidelity generative image compression.Advances in neural information processing systems, 33:11913– 11924, 2020

  37. [37]

    Joint autoregressive and hierarchical priors for learned image compression

    David Minnen, Johannes Ballé, and George Toderici. Joint autoregressive and hierarchical priors for learned image compression. InAdvances in Neural Information Processing Systems 31, pages 10794–10803, 2018

  38. [38]

    Joint autoregressive and hierarchical priors for learned image compression

    David Minnen, Johannes Ballé, and George Toderici. Joint autoregressive and hierarchical priors for learned image compression. InAdvances in Neural Information Processing Systems, volume 31, pages 10771–10780, 2018

  39. [39]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022

  40. [40]

    Unsupervised representation learning with deep convolutional generative adversarial networks

    Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. InInternational Conference on Learning Representations, 2016

  41. [41]

    Hierarchical text- conditional image generation with clip latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text- conditional image generation with clip latents. InAdvances in Neural Information Processing Systems, volume 35, pages 3348–3360, 2022

  42. [42]

    Stochastic backpropagation and approx- imate inference in deep generative models

    Danilo Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approx- imate inference in deep generative models. InInternational Conference on Machine Learning, pages 1278–1286, 2014

  43. [43]

    High- resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

  44. [44]

    Photorealistic text-to-image diffusion models with deep language understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Kamyar Ghasemipour, Raphael Gontijo-Lopes, Burcu Karagol-Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. InAdvances in Neural Information Processing Systems, volume 35, pages 36479–36494, 2022

  45. [45]

    Progressive distillation for fast sampling of diffusion models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022

  46. [46]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2020

  47. [47]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations, 2021

  48. [48]

    Lossy image compression with compressive autoencoders

    Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. Lossy image compression with compressive autoencoders. In5th International Conference on Learning Representations, ICLR, 2017

  49. [49]

    O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar

    George Toderici, Sean M. O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. Variable rate image compression with recurrent neural networks. In4th International Conference on Learning Representations, ICLR, 2016. 13

  50. [50]

    Full resolution image compression with recurrent neural networks

    George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. Full resolution image compression with recurrent neural networks. In2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 5435–5443, 2017

  51. [51]

    Neural discrete representation learning

    Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. InAdvances in Neural Information Processing Systems, volume 30, pages 6306–6315, 2017

  52. [52]

    Picd: Versatile perceptual image compression with diffusion rendering

    Tongda Xu, Jiahao Li, Bin Li, Yan Wang, Ya-Qin Zhang, and Yan Lu. Picd: Versatile perceptual image compression with diffusion rendering. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 28436–28445, 2025

  53. [53]

    Denoising diffusion step-aware models.International Conference on Learning Representations, 2024

    Shuai Yang, Yukang Chen, Luozhou Wang, Shu Liu, and Yingcong Chen. Denoising diffusion step-aware models.International Conference on Learning Representations, 2024

  54. [54]

    Diffusion probabilistic model made slim

    Xingyi Yang, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Diffusion probabilistic model made slim. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 22552–22562, 2023

  55. [55]

    LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

    Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop.arXiv preprint arXiv:1506.03365, 2015

  56. [56]

    Step saver: Predicting minimum denoising steps for diffusion model image generation.arXiv preprint arXiv:2408.02054, 2024

    Jean Yu and Haim Barad. Step saver: Predicting minimum denoising steps for diffusion model image generation.arXiv preprint arXiv:2408.02054, 2024

  57. [57]

    The unrea- sonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018

  58. [58]

    Attention-guided image compression by deep reconstruction of compressive sensed saliency skeleton

    Xi Zhang and Xiaolin Wu. Attention-guided image compression by deep reconstruction of compressive sensed saliency skeleton. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13354–13364, 2021

  59. [59]

    Lvqac: Lattice vector quantization coupled with spatially adap- tive companding for efficient learned image compression

    Xi Zhang and Xiaolin Wu. Lvqac: Lattice vector quantization coupled with spatially adap- tive companding for efficient learned image compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10239–10248, 2023

  60. [60]

    Learning optimal lattice vector quantizers for end-to-end neural image compression

    Xi Zhang and Xiaolin Wu. Learning optimal lattice vector quantizers for end-to-end neural image compression. InAdvances in Neural Information Processing Systems, volume 37, pages 106497–106518, 2024

  61. [61]

    Davd-net: Deep audio- aided video decompression of talking heads

    Xi Zhang, Xiaolin Wu, Xinliang Zhai, Xianye Ben, and Chengjie Tu. Davd-net: Deep audio- aided video decompression of talking heads. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12335–12344, 2020. 14 Technical Appendices and Supplementary Material A Theoretical Justification of Entropy-Constrained Diffusion Mod...