ProGIC: Progressive and Lightweight Generative Image Compression with Residual Vector Quantization

Chengbin Liang; Hao Cao; Jungong Han; Wenqi Guo; Zhijin Qin

arxiv: 2603.02897 · v2 · pith:QOILC5SSnew · submitted 2026-03-03 · 💻 cs.CV

ProGIC: Progressive and Lightweight Generative Image Compression with Residual Vector Quantization

Hao Cao , Chengbin Liang , Wenqi Guo , Zhijin Qin , Jungong Han This is my paper

Pith reviewed 2026-05-25 07:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords generative image compressionresidual vector quantizationprogressive transmissionlightweight neural codecperceptual qualityimage codingvector quantization

0 comments

The pith

ProGIC uses residual vector quantization to build a lightweight generative image compressor that produces progressive bitstreams and runs over ten times faster than prior methods while matching their perceptual quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ProGIC as a compact codec for generative image compression that relies on residual vector quantization to encode image data in successive stages. Each stage refines the previous residual, yielding a bitstream from which partial reconstructions can be formed at different quality levels. The design pairs this mechanism with a slim backbone of depthwise-separable convolutions and small attention modules to keep the model small enough for both GPU and CPU hardware. A reader would care if the approach truly delivers comparable perceptual results at lower bitrates together with major speed gains, because existing generative compressors have been too large for flexible or low-resource use.

Core claim

ProGIC attains comparable compression performance compared with previous methods by encoding residuals stage by stage with separate codebooks in residual vector quantization, producing a coarse-to-fine reconstruction and progressive bitstream, while a lightweight backbone enables over 10 times faster encoding and decoding on GPUs and bitrate savings of up to 57.57 percent on DISTS and 58.83 percent on LPIPS versus MS-ILLM on the Kodak dataset.

What carries the argument

Residual vector quantization, in which a sequence of vector quantizers encodes successive residuals each with its own codebook so that the codewords sum to a progressive reconstruction.

If this is right

Partial bitstreams allow image previews before the full file arrives, supporting flexible transmission.
The compact backbone permits practical use on CPU-only devices in addition to GPUs.
Encoding and decoding run more than ten times faster than MS-ILLM on GPUs.
Bitrate reductions reach 57.57 percent on DISTS and 58.83 percent on LPIPS relative to the compared baseline on Kodak images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Progressive output may reduce perceived latency in streaming applications where bandwidth varies over time.
The same staged quantization structure could be tested on video frames to check whether temporal residuals yield similar efficiency.
Smaller model size may lower memory footprint enough for on-device compression in mobile cameras.
Direct measurement of power draw during encoding would test whether the speed gain also reduces energy use.

Load-bearing premise

The reported perceptual metric improvements and speedups hold when measured on the Kodak dataset against the single baseline MS-ILLM.

What would settle it

A side-by-side test on a separate dataset such as CLIC or DIV2K that finds no bitrate savings on DISTS or LPIPS and no speed advantage would show the performance claims do not generalize.

Figures

Figures reproduced from arXiv: 2603.02897 by Chengbin Liang, Hao Cao, Jungong Han, Wenqi Guo, Zhijin Qin.

**Figure 2.** Figure 2: Conceptual illustration of the motivation behind ProGIC. The original image vector is approximated by a base vector plus a [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Overview of the proposed ProGIC. Each down-/up-sampling stage consists of a stack of [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Feature modulation in an FFN: at each progressive de [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Rate-distortion performance on the Kodak, Tecnick, DIV2K, and CLIC2020-Professional datasets, evaluated with LPIPS and [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of reconstructed images from different methods on Kodak. Values denote DISTS / bpp. Lower DISTS indicates [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: R–D performance compared with the progressive [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: R–D performance with different codebook numbers. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Effect of different weighting ratios p for training. The top-right zoom highlights the low-bitrate region. The bottom-right zoom highlights the high-bitrate region. References [1] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 126–135, 2017. 5 [2… view at source ↗

**Figure 11.** Figure 11: Reconstruction quality comparison across different [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 10.** Figure 10: Visualization of progressive image transmission with [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 12.** Figure 12: Entropy of different codebook usages in ProGIC. [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 13.** Figure 13: Rate-distortion curves with and without entropy cod [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization by t-SNE of latent features. The top-left [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗

**Figure 15.** Figure 15: In the event of a forest fire, ProGIC enables rapid response by transmitting images over a satellite short message link, assuming [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

**Figure 16.** Figure 16: Different BPP ranges achieved by varying the number [PITH_FULL_IMAGE:figures/full_fig_p014_16.png] view at source ↗

**Figure 17.** Figure 17: Rate-distortion performance on the Kodak, Tecnick, DIV2K, and CLIC2020-Professional datasets, evaluated with PSNR, MS [PITH_FULL_IMAGE:figures/full_fig_p017_17.png] view at source ↗

**Figure 18.** Figure 18: Visualization of reconstructed images from different methods on Tecnick. Values denote DISTS / BPP. Lower DISTS indicates [PITH_FULL_IMAGE:figures/full_fig_p018_18.png] view at source ↗

**Figure 19.** Figure 19: Visualization of reconstructed images from different methods on Tecnick. Values denote DISTS / BPP. Lower DISTS indicates [PITH_FULL_IMAGE:figures/full_fig_p019_19.png] view at source ↗

**Figure 20.** Figure 20: Visualization of reconstructed images from different methods on DIV2K. Values denote DISTS / BPP. Lower DISTS indicates [PITH_FULL_IMAGE:figures/full_fig_p020_20.png] view at source ↗

**Figure 21.** Figure 21: Visualization of reconstructed images from different methods on DIV2K. Values denote DISTS / BPP. Lower DISTS indicates [PITH_FULL_IMAGE:figures/full_fig_p020_21.png] view at source ↗

**Figure 22.** Figure 22: Visualization of reconstructed images from different methods on CLIC 2020. Values denote DISTS / BPP. Lower DISTS [PITH_FULL_IMAGE:figures/full_fig_p021_22.png] view at source ↗

**Figure 23.** Figure 23: Visualization of reconstructed images from different methods on CLIC 2020. Values denote DISTS / BPP. Lower DISTS [PITH_FULL_IMAGE:figures/full_fig_p021_23.png] view at source ↗

read the original abstract

Recent advances in generative image compression (GIC) have delivered remarkable improvements in perceptual quality. However, many GICs rely on large-scale and rigid models, which severely constrain their utility for flexible transmission and practical deployment in low-bitrate scenarios. To address these issues, we propose Progressive Generative Image Compression (ProGIC), a compact codec built on residual vector quantization (RVQ). In RVQ, a sequence of vector quantizers encodes the residuals stage by stage, each with its own codebook. The resulting codewords sum to a coarse-to-fine reconstruction and a progressive bitstream, enabling previews from partial data. We pair this with a lightweight backbone based on depthwise-separable convolutions and small attention blocks, enabling practical deployment on both GPUs and CPU-only devices. Experimental results show that ProGIC attains comparable compression performance compared with previous methods. It achieves bitrate savings of up to 57.57% on DISTS and 58.83% on LPIPS compared to MS-ILLM on the Kodak dataset. Beyond perceptual quality, ProGIC enables progressive transmission for flexibility, and also delivers over 10 times faster encoding and decoding compared with MS-ILLM on GPUs for efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProGIC pairs RVQ with a depthwise-separable backbone for progressive fast GIC, but the big reported gains sit on a single baseline and one dataset with no visible controls.

read the letter

The main takeaway is that this paper builds a compact progressive generative codec around residual vector quantization and a lightweight backbone of depthwise-separable convolutions plus small attention blocks. It claims over 10x faster encode/decode than MS-ILLM on GPU plus substantial bitrate savings on DISTS and LPIPS on Kodak, while also supporting partial-bitstream previews. That combination of speed, progressive transmission, and reported perceptual numbers is the concrete thing on offer. The architecture itself is the new piece: RVQ is known, but the specific pairing with this backbone for both CPU and GPU practicality and the progressive property is not in the cited prior work. The paper does a reasonable job explaining how the successive residual quantizers produce a coarse-to-fine sum that naturally yields a progressive stream without extra machinery. That part is straightforward engineering and addresses a real deployment need for flexible low-bitrate use. The focus on keeping the model small enough for CPU inference is also useful and often missing from heavier GIC designs. The soft spots are exactly where the stress-test note flags them. All the headline numbers come from Kodak against only MS-ILLM, with no information on training-set overlap, no error bars, no ablations on the separable convolutions or attention, and no training details or equations in the abstract. If the full paper supplies those controls and they are clean, the claims become more credible; if not, the deltas could be sensitive to protocol choices or variance in the generative decoder. This work is aimed at engineers who need a deployable progressive codec rather than theorists looking for new frameworks. A reader building systems for edge or low-resource settings would find the architecture choices worth examining. It deserves a serious referee to verify the experimental setup and see whether the performance numbers hold once the missing controls are checked.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes ProGIC, a lightweight generative image compression codec based on residual vector quantization (RVQ) paired with a backbone of depthwise-separable convolutions and small attention blocks. The approach produces a progressive bitstream enabling coarse-to-fine reconstruction from partial data. Experiments claim comparable or superior perceptual performance to prior GIC methods, with bitrate savings of up to 57.57% on DISTS and 58.83% on LPIPS versus MS-ILLM on Kodak, plus >10× faster encoding/decoding on GPUs.

Significance. If the empirical claims hold after addressing controls, the work supplies a deployable, CPU/GPU-friendly GIC solution that adds progressive transmission without sacrificing efficiency, filling a gap between high-capacity generative codecs and practical constraints in low-bitrate settings.

major comments (3)

[Section 4] Section 4 (Experimental results): The headline bitrate savings (57.57% DISTS, 58.83% LPIPS vs. MS-ILLM on Kodak) are load-bearing for the 'comparable or superior' claim, yet the text does not state whether MS-ILLM numbers were reproduced under the identical evaluation protocol or taken from the original paper; without this, the deltas cannot be treated as robust evidence.
[Section 4.1] Section 4.1 (Datasets and training): No information is supplied on training-set composition or whether Kodak images were excluded from training, which directly affects the validity of the reported generalization on perceptual metrics.
[Section 4.3] Section 4.3 (Ablation and efficiency): The >10× speedup claim is presented without error bars, run-to-run variance, or details on hardware normalization, leaving open whether the efficiency advantage exceeds typical generative-decoder variability.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a single sentence stating the approximate parameter count or FLOPs of the lightweight backbone to ground the 'lightweight' descriptor.
[Section 3] Notation for the RVQ stages (e.g., how residuals are defined across quantizers) is introduced without an accompanying equation; adding a compact definition would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and detailed comments on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions where needed to improve the rigor and transparency of the experimental reporting.

read point-by-point responses

Referee: [Section 4] Section 4 (Experimental results): The headline bitrate savings (57.57% DISTS, 58.83% LPIPS vs. MS-ILLM on Kodak) are load-bearing for the 'comparable or superior' claim, yet the text does not state whether MS-ILLM numbers were reproduced under the identical evaluation protocol or taken from the original paper; without this, the deltas cannot be treated as robust evidence.

Authors: We appreciate this observation on ensuring fair and reproducible comparisons. The MS-ILLM baseline results were obtained by running the official implementation under the exact same evaluation protocol used for ProGIC, including identical test images from Kodak, metric computation pipelines, and bit-rate sampling. We will revise the manuscript in Section 4 to explicitly state this reproduction procedure and any relevant implementation details. revision: yes
Referee: [Section 4.1] Section 4.1 (Datasets and training): No information is supplied on training-set composition or whether Kodak images were excluded from training, which directly affects the validity of the reported generalization on perceptual metrics.

Authors: We agree that details on the training data are necessary to validate generalization claims. The model was trained exclusively on a subset of the ImageNet training set with no overlap to the Kodak images. We will update Section 4.1 to include the precise training-set composition, size, preprocessing steps, and explicit confirmation that Kodak images were excluded from training. revision: yes
Referee: [Section 4.3] Section 4.3 (Ablation and efficiency): The >10× speedup claim is presented without error bars, run-to-run variance, or details on hardware normalization, leaving open whether the efficiency advantage exceeds typical generative-decoder variability.

Authors: We acknowledge the value of statistical reporting for efficiency claims. The >10× speedup was measured on a fixed GPU hardware configuration, and we will revise Section 4.3 to report error bars from multiple independent runs, include run-to-run variance, and specify the exact hardware and normalization procedure used for the timing measurements. revision: yes

Circularity Check

0 steps flagged

No derivation chain; performance claims are purely empirical comparisons

full rationale

The manuscript describes an engineering proposal (RVQ-based progressive codec + depthwise-separable backbone) whose central assertions are bitrate savings and speedups measured on Kodak versus MS-ILLM. No equations, first-principles derivations, fitted parameters presented as predictions, or self-citation chains that justify uniqueness are present in the abstract or described structure. The work is self-contained against external benchmarks; the reader's 2.0 assessment is consistent with the lack of any load-bearing self-referential step.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no mathematical derivations, so the ledger is empty. All performance numbers are treated as empirical claims whose grounding cannot be audited from the given text.

pith-pipeline@v0.9.0 · 5756 in / 1191 out tokens · 18568 ms · 2026-05-25T07:24:04.545082+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 2 internal anchors

[1]

Ntire 2017 challenge on single image super-resolution: Dataset and study

Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. InPro- ceedings of the IEEE conference on computer vision and pat- tern recognition workshops, pages 126–135, 2017. 5

work page 2017
[2]

Testimages: a large- scale archive for testing visual devices and basic image pro- cessing algorithms

Nicola Asuni, Andrea Giachetti, et al. Testimages: a large- scale archive for testing visual devices and basic image pro- cessing algorithms. InSTAG: Smart Tools and Applications in Computer Graphics, pages 63–70, 2014. 5

work page 2014
[3]

Variational image compres- sion with a scale hyperprior

Johannes Ball ´e, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compres- sion with a scale hyperprior. InInternational Conference on Learning Representations (ICLR), 2018. 1, 2, 3, 8

work page 2018
[4]

Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc

Gisle Bjontegaard. Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc. VCEG-M33, 2001. 2, 4

work page 2001
[5]

Rethinking lossy compres- sion: The rate-distortion-perception tradeoff

Yochai Blau and Tomer Michaeli. Rethinking lossy compres- sion: The rate-distortion-perception tradeoff. InProceedings of the 36th International Conference on Machine Learning, pages 675–685. PMLR, 2019. 1, 2, 5

work page 2019
[6]

Towards image compression with per- fect realism at ultra-low bitrates

Marlene Careil, Matthew J Muckley, Jakob Verbeek, and St´ephane Lathuili`ere. Towards image compression with per- fect realism at ultra-low bitrates. InInternational Conference on Learning Representations (ICLR), 2023. 3

work page 2023
[7]

Learned image compression with discretized gaussian mixture likelihoods and attention modules

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. Learned image compression with discretized gaussian mixture likelihoods and attention modules. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7939–7948, 2020. 2, 4

work page 2020
[8]

Workshop and challenge on learned image compres- sion

CLIC. Workshop and challenge on learned image compres- sion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2020. 5

work page 2020
[9]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition (CVPR), pages 248–255. IEEE, 2009. 5

work page 2009
[10]

and Ba Jimmy

Kingma Diederik, P. and Ba Jimmy. A method for stochas- tic optimization. InInternational Conference on Learning Representations (ICLR), 2015. 5

work page 2015
[11]

Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and ma- chine intelligence, 44(5):2567–2581, 2020

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and ma- chine intelligence, 44(5):2567–2581, 2020. 2, 5, 4

work page 2020
[12]

Generative adversar- ial networks for extreme learned image compression

Agustsson Eirikur, Tschannen Michael, Mentzer Fabian, Timofte Radu, and Van Gool Luc. Generative adversar- ial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 221–231, 2019. 2

work page 2019
[13]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021. 3, 5

work page 2021
[14]

Linear attention mod- eling for learned image compression

Donghui Feng, Zhengxue Cheng, Shen Wang, Ronghua Wu, Hongwei Hu, Guo Lu, and Li Song. Linear attention mod- eling for learned image compression. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7623–7632, 2025. 2, 3, 8

work page 2025
[15]

Nvtc: Nonlinear vector transform coding

Runsen Feng, Zongyu Guo, Weiping Li, and Zhibo Chen. Nvtc: Nonlinear vector transform coding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6101–6110, 2023. 3

work page 2023
[16]

Vector quantized semantic communication system.IEEE Wireless Communications Letters, 12(6):982– 986, 2023

Qifan Fu, Huiqiang Xie, Zhijin Qin, Gregory Slabaugh, and Xiaoming Tao. Vector quantized semantic communication system.IEEE Wireless Communications Letters, 12(6):982– 986, 2023. 1

work page 2023
[17]

Exploring multimodal knowledge for image compression via large foundation models.IEEE Transac- tions on Image Processing, 34:5904–5919, 2025

Junlong Gao, Zhimeng Huang, Qi Mao, Siwei Ma, and Chuanmin Jia. Exploring multimodal knowledge for image compression via large foundation models.IEEE Transac- tions on Image Processing, 34:5904–5919, 2025. 2, 3

work page 2025
[18]

Generative adversarial nets.Advances in neural information processing systems, 27, 2014

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014. 1, 2, 5

work page 2014
[19]

V .K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001. 2

work page 2001
[20]

Oscar: One- step diffusion codec across multiple bit-rates

Jinpei Guo, Yifei Ji, Zheng Chen, Kai Liu, Min Liu, Wang Rao, Wenbo Li, Yong Guo, and Yulun Zhang. Oscar: One- step diffusion codec across multiple bit-rates. InConference on Neural Information Processing Systems (NeurIPS), 2025. 1, 2, 3, 5, 6, 7, 4

work page 2025
[21]

Elic: Efficient learned image compres- sion with unevenly grouped space-channel contextual adap- tive coding

Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. Elic: Efficient learned image compres- sion with unevenly grouped space-channel contextual adap- tive coding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5718–5727,

work page
[22]

Po-elic: Perception-oriented efficient learned image coding

Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, and Yan Wang. Po-elic: Perception-oriented efficient learned image coding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1764– 1769, 2022. 2

work page 2022
[23]

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 1, 3

work page 2020
[24]

ProgDTD: Progressive learned image compression with double-tail- drop training

Ali Hojjat, Janek Haberer, and Olaf Landsiedel. ProgDTD: Progressive learned image compression with double-tail- drop training. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1130– 1139, 2023. 2, 5, 7, 8

work page 2023
[25]

Context-based trit-plane coding for progressive im- age compression

Seungmin Jeon, Kwang Pyo Choi, Youngo Park, and Chang- Su Kim. Context-based trit-plane coding for progressive im- age compression. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 14348–14357, 2023. 2, 5, 6

work page 2023
[26]

Towards practical real-time neural video compression

Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, and Yan Lu. Towards practical real-time neural video compression. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 12543–12552, 2025. 2, 4, 5, 6, 7, 3

work page 2025
[27]

Mlic++: Linear complexity multi-reference en- tropy modeling for learned image compression.ACM Trans

Wei Jiang, Jiayu Yang, Yongqi Zhai, Feng Gao, and Rong- gang Wang. Mlic++: Linear complexity multi-reference en- tropy modeling for learned image compression.ACM Trans. Multimedia Comput. Commun. Appl., 21(5), 2025. 3, 2

work page 2025
[28]

King and N.M

R.A. King and N.M. Nasrabadi. Image coding using vector quantization in the transform domain.Pattern Recognition Letters, 1(5):323–329, 1983. 3

work page 1983
[29]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes.arXiv preprint arXiv:1312.6114, 2013. 2

work page internal anchor Pith review Pith/arXiv arXiv 2013
[30]

http://r0k

Kodak Lossless True Color Image Suite. http://r0k. us/graphics/kodak/, 1993. 2, 5

work page 1993
[31]

Vu, and George Goussetis

Oltjon Kodheli, Eva Lagunas, Nicola Maturo, Shree Kr- ishna Sharma, Bhavani Shankar, Jesus Fabian Mendoza Montoya, Juan Carlos Merlano Duncan, Danilo Spano, Symeon Chatzinotas, Steven Kisseleff, Jorge Querol, Lei Lei, Thang X. Vu, and George Goussetis. Satellite communi- cations in the new space era: A survey and future challenges. IEEE Communications Sur...

work page
[32]

High-fidelity audio compres- sion with improved rvqgan.Advances in Neural Information Processing Systems, 36:27980–27993, 2023

Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, and Kundan Kumar. High-fidelity audio compres- sion with improved rvqgan.Advances in Neural Information Processing Systems, 36:27980–27993, 2023. 2, 3

work page 2023
[33]

Dpict: Deep progressive im- age compression using trit-planes

Jae-Han Lee, Seungmin Jeon, Kwang Pyo Choi, Youngo Park, and Chang-Su Kim. Dpict: Deep progressive im- age compression using trit-planes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16113–16122, 2022. 2, 5, 6

work page 2022
[34]

Once-for-all: Controllable generative image compression with dynamic granularity adaptation

Anqi Li, Feng Li, Yuxi Liu, Runmin Cong, Yao Zhao, and Huihui Bai. Once-for-all: Controllable generative image compression with dynamic granularity adaptation. InInter- national Conference on Learning Representations (ICLR),

work page
[35]

Misc: Ultra-low bitrate image semantic compres- sion driven by large multimodal model.IEEE Transactions on Image Processing, 34:335–349, 2024

Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, and Wen- jun Zhang. Misc: Ultra-low bitrate image semantic compres- sion driven by large multimodal model.IEEE Transactions on Image Processing, 34:335–349, 2024. 3

work page 2024
[36]

Learned im- age compression with hierarchical progressive context mod- eling

Yuqi Li, Haotian Zhang, Li Li, and Dong Liu. Learned im- age compression with hierarchical progressive context mod- eling. InThe Twentieth IEEE/CVF International Conference on Computer Vision, 2025. 1, 2, 3, 5, 6, 8

work page 2025
[37]

Towards extreme image compression with latent feature guidance and diffusion prior.IEEE Transactions on Circuits and Systems for Video Technology, 35(1):888–899,

Zhiyuan Li, Yanhui Zhou, Hao Wei, Chenyang Ge, and Jing- wen Jiang. Towards extreme image compression with latent feature guidance and diffusion prior.IEEE Transactions on Circuits and Systems for Video Technology, 35(1):888–899,

work page
[38]

Learned image compression with dictionary- based entropy model

Jingbo Lu, Leheng Zhang, Xingyu Zhou, Mu Li, Wen Li, and Shuhang Gu. Learned image compression with dictionary- based entropy model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12850–12859,

work page
[39]

Extreme im- age compression using fine-tuned VQGANs

Qi Mao, Tinghan Yang, Yinuo Zhang, Zijian Wang, Meng Wang, Shiqi Wang, Libiao Jin, and Siwei Ma. Extreme im- age compression using fine-tuned VQGANs. In2024 Data Compression Conference (DCC), pages 203–212. IEEE,

work page
[40]

Range encoding: an algorithm for remov- ing redundancy from a digitised message

G Nigel N Martin. Range encoding: an algorithm for remov- ing redundancy from a digitised message. InProc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording, 1979. 4, 8, 2

work page 1979
[41]

High-fidelity generative image compres- sion.Advances in neural information processing systems, 33:11913–11924, 2020

Fabian Mentzer, George D Toderici, Michael Tschannen, and Eirikur Agustsson. High-fidelity generative image compres- sion.Advances in neural information processing systems, 33:11913–11924, 2020. 1, 2, 5, 6, 4

work page 2020
[42]

Improving statistical fi- delity for neural image compression with implicit local like- lihood models

Matthew J Muckley, Alaaeldin El-Nouby, Karen Ullrich, Herv´e J ´egou, and Jakob Verbeek. Improving statistical fi- delity for neural image compression with implicit local like- lihood models. InInternational Conference on Machine Learning (ICML), pages 25426–25443. PMLR, 2023. 1, 2, 5, 6, 7, 4

work page 2023
[43]

Rectified linear units im- prove restricted boltzmann machines

Vinod Nair and Geoffrey E Hinton. Rectified linear units im- prove restricted boltzmann machines. InProceedings of the 27th international conference on machine learning (ICML- 10), pages 807–814, 2010. 4

work page 2010
[44]

Generative latent coding for ultra-low bi- trate image and video compression.IEEE Transactions on Circuits and Systems for Video Technology, 35(10):10500– 10515, 2025

Linfeng Qi, Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, and Yan Lu. Generative latent coding for ultra-low bi- trate image and video compression.IEEE Transactions on Circuits and Systems for Video Technology, 35(10):10500– 10515, 2025. 1, 3

work page 2025
[45]

Gener- ating diverse high-fidelity images with vq-vae-2.Advances in neural information processing systems, 32, 2019

Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. Gener- ating diverse high-fidelity images with vq-vae-2.Advances in neural information processing systems, 32, 2019. 4

work page 2019
[46]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 3

work page 2022
[47]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014. 5, 4

work page internal anchor Pith review Pith/arXiv arXiv 2014
[48]

Stablecodec: Taming one-step diffusion for extreme image compression

Zhang Tianyu, Luo Xin, Li Li, and Liu Dong. Stablecodec: Taming one-step diffusion for extreme image compression. InInternational Conference on Computer Vision (ICCV),

work page
[49]

Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017

Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017. 1, 3, 5

work page 2017
[50]

de/jvet/VVCSoftware_VTM/, 2025

VTM-23.10.https://vcgit.hhi.fraunhofer. de/jvet/VVCSoftware_VTM/, 2025. Accessed: 2025- 06-05. 1, 2, 5, 6, 7, 4

work page 2025
[51]

The jpeg still picture compression stan- dard.Communications of the ACM, 34(4):30–44, 1991

Gregory K Wallace. The jpeg still picture compression stan- dard.Communications of the ACM, 34(4):30–44, 1991. 1, 2, 3

work page 1991
[52]

Ex- ploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. InPro- ceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 5, 6

work page 2023
[53]

Multi- scale structural similarity for image quality assessment

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multi- scale structural similarity for image quality assessment. In The thirty-seventh asilomar conference on signals, systems & computers, 2003, pages 1398–1402. IEEE, 2003. 5, 6

work page 2003
[54]

Multirate neural im- age compression with adaptive lattice vector quantization

Hao Xu, Xiaolin Wu, and Xi Zhang. Multirate neural im- age compression with adaptive lattice vector quantization. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 7633–7642, 2025. 3

work page 2025
[55]

One-step diffusion-based image compression with semantic distillation

Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, and Yan Lu. One-step diffusion-based image compression with semantic distillation. InConference on Neural Information Processing Systems (NeurIPS), 2025. 1, 3

work page 2025
[56]

Dlf: Extreme image compression with dual- generative latent fusion

Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, and Yan Lu. Dlf: Extreme image compression with dual- generative latent fusion. InInternational Conference on Computer Vision (ICCV), 2025. 1, 3, 2

work page 2025
[57]

Lossy image compression with conditional diffusion models.Advances in Neural In- formation Processing Systems, 36:64971–64995, 2023

Ruihan Yang and Stephan Mandt. Lossy image compression with conditional diffusion models.Advances in Neural In- formation Processing Systems, 36:64971–64995, 2023. 3

work page 2023
[58]

Progres- sive compression with universally quantized diffusion mod- els

Yibo Yang, Justus C Will, and Stephan Mandt. Progres- sive compression with universally quantized diffusion mod- els. InInternational Conference on Learning Representa- tions (ICLR), 2025. 2, 5, 6

work page 2025
[59]

Soundstream: An end- to-end neural audio codec.IEEE/ACM Transactions on Au- dio, Speech, and Language Processing, 30:495–507, 2021

Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. Soundstream: An end- to-end neural audio codec.IEEE/ACM Transactions on Au- dio, Speech, and Language Processing, 30:495–507, 2021. 2, 3

work page 2021
[60]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 2, 5, 4

work page 2018
[61]

generate

Xiaosu Zhu, Jingkuan Song, Lianli Gao, Feng Zheng, and Heng Tao Shen. Unified multivariate gaussian mixture for efficient neural image compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17612–17621, 2022. 3, 4 ProGIC: Progressive and Lightweight Generative Image Compression with Residual V ector Quantiz...

work page 2022

[1] [1]

Ntire 2017 challenge on single image super-resolution: Dataset and study

Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. InPro- ceedings of the IEEE conference on computer vision and pat- tern recognition workshops, pages 126–135, 2017. 5

work page 2017

[2] [2]

Testimages: a large- scale archive for testing visual devices and basic image pro- cessing algorithms

Nicola Asuni, Andrea Giachetti, et al. Testimages: a large- scale archive for testing visual devices and basic image pro- cessing algorithms. InSTAG: Smart Tools and Applications in Computer Graphics, pages 63–70, 2014. 5

work page 2014

[3] [3]

Variational image compres- sion with a scale hyperprior

Johannes Ball ´e, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compres- sion with a scale hyperprior. InInternational Conference on Learning Representations (ICLR), 2018. 1, 2, 3, 8

work page 2018

[4] [4]

Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc

Gisle Bjontegaard. Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc. VCEG-M33, 2001. 2, 4

work page 2001

[5] [5]

Rethinking lossy compres- sion: The rate-distortion-perception tradeoff

Yochai Blau and Tomer Michaeli. Rethinking lossy compres- sion: The rate-distortion-perception tradeoff. InProceedings of the 36th International Conference on Machine Learning, pages 675–685. PMLR, 2019. 1, 2, 5

work page 2019

[6] [6]

Towards image compression with per- fect realism at ultra-low bitrates

Marlene Careil, Matthew J Muckley, Jakob Verbeek, and St´ephane Lathuili`ere. Towards image compression with per- fect realism at ultra-low bitrates. InInternational Conference on Learning Representations (ICLR), 2023. 3

work page 2023

[7] [7]

Learned image compression with discretized gaussian mixture likelihoods and attention modules

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. Learned image compression with discretized gaussian mixture likelihoods and attention modules. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7939–7948, 2020. 2, 4

work page 2020

[8] [8]

Workshop and challenge on learned image compres- sion

CLIC. Workshop and challenge on learned image compres- sion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2020. 5

work page 2020

[9] [9]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition (CVPR), pages 248–255. IEEE, 2009. 5

work page 2009

[10] [10]

and Ba Jimmy

Kingma Diederik, P. and Ba Jimmy. A method for stochas- tic optimization. InInternational Conference on Learning Representations (ICLR), 2015. 5

work page 2015

[11] [11]

Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and ma- chine intelligence, 44(5):2567–2581, 2020

Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and ma- chine intelligence, 44(5):2567–2581, 2020. 2, 5, 4

work page 2020

[12] [12]

Generative adversar- ial networks for extreme learned image compression

Agustsson Eirikur, Tschannen Michael, Mentzer Fabian, Timofte Radu, and Van Gool Luc. Generative adversar- ial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 221–231, 2019. 2

work page 2019

[13] [13]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021. 3, 5

work page 2021

[14] [14]

Linear attention mod- eling for learned image compression

Donghui Feng, Zhengxue Cheng, Shen Wang, Ronghua Wu, Hongwei Hu, Guo Lu, and Li Song. Linear attention mod- eling for learned image compression. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7623–7632, 2025. 2, 3, 8

work page 2025

[15] [15]

Nvtc: Nonlinear vector transform coding

Runsen Feng, Zongyu Guo, Weiping Li, and Zhibo Chen. Nvtc: Nonlinear vector transform coding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6101–6110, 2023. 3

work page 2023

[16] [16]

Vector quantized semantic communication system.IEEE Wireless Communications Letters, 12(6):982– 986, 2023

Qifan Fu, Huiqiang Xie, Zhijin Qin, Gregory Slabaugh, and Xiaoming Tao. Vector quantized semantic communication system.IEEE Wireless Communications Letters, 12(6):982– 986, 2023. 1

work page 2023

[17] [17]

Exploring multimodal knowledge for image compression via large foundation models.IEEE Transac- tions on Image Processing, 34:5904–5919, 2025

Junlong Gao, Zhimeng Huang, Qi Mao, Siwei Ma, and Chuanmin Jia. Exploring multimodal knowledge for image compression via large foundation models.IEEE Transac- tions on Image Processing, 34:5904–5919, 2025. 2, 3

work page 2025

[18] [18]

Generative adversarial nets.Advances in neural information processing systems, 27, 2014

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014. 1, 2, 5

work page 2014

[19] [19]

V .K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001. 2

work page 2001

[20] [20]

Oscar: One- step diffusion codec across multiple bit-rates

Jinpei Guo, Yifei Ji, Zheng Chen, Kai Liu, Min Liu, Wang Rao, Wenbo Li, Yong Guo, and Yulun Zhang. Oscar: One- step diffusion codec across multiple bit-rates. InConference on Neural Information Processing Systems (NeurIPS), 2025. 1, 2, 3, 5, 6, 7, 4

work page 2025

[21] [21]

Elic: Efficient learned image compres- sion with unevenly grouped space-channel contextual adap- tive coding

Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. Elic: Efficient learned image compres- sion with unevenly grouped space-channel contextual adap- tive coding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5718–5727,

work page

[22] [22]

Po-elic: Perception-oriented efficient learned image coding

Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, and Yan Wang. Po-elic: Perception-oriented efficient learned image coding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1764– 1769, 2022. 2

work page 2022

[23] [23]

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 1, 3

work page 2020

[24] [24]

ProgDTD: Progressive learned image compression with double-tail- drop training

Ali Hojjat, Janek Haberer, and Olaf Landsiedel. ProgDTD: Progressive learned image compression with double-tail- drop training. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1130– 1139, 2023. 2, 5, 7, 8

work page 2023

[25] [25]

Context-based trit-plane coding for progressive im- age compression

Seungmin Jeon, Kwang Pyo Choi, Youngo Park, and Chang- Su Kim. Context-based trit-plane coding for progressive im- age compression. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 14348–14357, 2023. 2, 5, 6

work page 2023

[26] [26]

Towards practical real-time neural video compression

Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, and Yan Lu. Towards practical real-time neural video compression. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 12543–12552, 2025. 2, 4, 5, 6, 7, 3

work page 2025

[27] [27]

Mlic++: Linear complexity multi-reference en- tropy modeling for learned image compression.ACM Trans

Wei Jiang, Jiayu Yang, Yongqi Zhai, Feng Gao, and Rong- gang Wang. Mlic++: Linear complexity multi-reference en- tropy modeling for learned image compression.ACM Trans. Multimedia Comput. Commun. Appl., 21(5), 2025. 3, 2

work page 2025

[28] [28]

King and N.M

R.A. King and N.M. Nasrabadi. Image coding using vector quantization in the transform domain.Pattern Recognition Letters, 1(5):323–329, 1983. 3

work page 1983

[29] [29]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes.arXiv preprint arXiv:1312.6114, 2013. 2

work page internal anchor Pith review Pith/arXiv arXiv 2013

[30] [30]

http://r0k

Kodak Lossless True Color Image Suite. http://r0k. us/graphics/kodak/, 1993. 2, 5

work page 1993

[31] [31]

Vu, and George Goussetis

Oltjon Kodheli, Eva Lagunas, Nicola Maturo, Shree Kr- ishna Sharma, Bhavani Shankar, Jesus Fabian Mendoza Montoya, Juan Carlos Merlano Duncan, Danilo Spano, Symeon Chatzinotas, Steven Kisseleff, Jorge Querol, Lei Lei, Thang X. Vu, and George Goussetis. Satellite communi- cations in the new space era: A survey and future challenges. IEEE Communications Sur...

work page

[32] [32]

High-fidelity audio compres- sion with improved rvqgan.Advances in Neural Information Processing Systems, 36:27980–27993, 2023

Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, and Kundan Kumar. High-fidelity audio compres- sion with improved rvqgan.Advances in Neural Information Processing Systems, 36:27980–27993, 2023. 2, 3

work page 2023

[33] [33]

Dpict: Deep progressive im- age compression using trit-planes

Jae-Han Lee, Seungmin Jeon, Kwang Pyo Choi, Youngo Park, and Chang-Su Kim. Dpict: Deep progressive im- age compression using trit-planes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16113–16122, 2022. 2, 5, 6

work page 2022

[34] [34]

Once-for-all: Controllable generative image compression with dynamic granularity adaptation

Anqi Li, Feng Li, Yuxi Liu, Runmin Cong, Yao Zhao, and Huihui Bai. Once-for-all: Controllable generative image compression with dynamic granularity adaptation. InInter- national Conference on Learning Representations (ICLR),

work page

[35] [35]

Misc: Ultra-low bitrate image semantic compres- sion driven by large multimodal model.IEEE Transactions on Image Processing, 34:335–349, 2024

Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, and Wen- jun Zhang. Misc: Ultra-low bitrate image semantic compres- sion driven by large multimodal model.IEEE Transactions on Image Processing, 34:335–349, 2024. 3

work page 2024

[36] [36]

Learned im- age compression with hierarchical progressive context mod- eling

Yuqi Li, Haotian Zhang, Li Li, and Dong Liu. Learned im- age compression with hierarchical progressive context mod- eling. InThe Twentieth IEEE/CVF International Conference on Computer Vision, 2025. 1, 2, 3, 5, 6, 8

work page 2025

[37] [37]

Towards extreme image compression with latent feature guidance and diffusion prior.IEEE Transactions on Circuits and Systems for Video Technology, 35(1):888–899,

Zhiyuan Li, Yanhui Zhou, Hao Wei, Chenyang Ge, and Jing- wen Jiang. Towards extreme image compression with latent feature guidance and diffusion prior.IEEE Transactions on Circuits and Systems for Video Technology, 35(1):888–899,

work page

[38] [38]

Learned image compression with dictionary- based entropy model

Jingbo Lu, Leheng Zhang, Xingyu Zhou, Mu Li, Wen Li, and Shuhang Gu. Learned image compression with dictionary- based entropy model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12850–12859,

work page

[39] [39]

Extreme im- age compression using fine-tuned VQGANs

Qi Mao, Tinghan Yang, Yinuo Zhang, Zijian Wang, Meng Wang, Shiqi Wang, Libiao Jin, and Siwei Ma. Extreme im- age compression using fine-tuned VQGANs. In2024 Data Compression Conference (DCC), pages 203–212. IEEE,

work page

[40] [40]

Range encoding: an algorithm for remov- ing redundancy from a digitised message

G Nigel N Martin. Range encoding: an algorithm for remov- ing redundancy from a digitised message. InProc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording, 1979. 4, 8, 2

work page 1979

[41] [41]

High-fidelity generative image compres- sion.Advances in neural information processing systems, 33:11913–11924, 2020

Fabian Mentzer, George D Toderici, Michael Tschannen, and Eirikur Agustsson. High-fidelity generative image compres- sion.Advances in neural information processing systems, 33:11913–11924, 2020. 1, 2, 5, 6, 4

work page 2020

[42] [42]

Improving statistical fi- delity for neural image compression with implicit local like- lihood models

Matthew J Muckley, Alaaeldin El-Nouby, Karen Ullrich, Herv´e J ´egou, and Jakob Verbeek. Improving statistical fi- delity for neural image compression with implicit local like- lihood models. InInternational Conference on Machine Learning (ICML), pages 25426–25443. PMLR, 2023. 1, 2, 5, 6, 7, 4

work page 2023

[43] [43]

Rectified linear units im- prove restricted boltzmann machines

Vinod Nair and Geoffrey E Hinton. Rectified linear units im- prove restricted boltzmann machines. InProceedings of the 27th international conference on machine learning (ICML- 10), pages 807–814, 2010. 4

work page 2010

[44] [44]

Generative latent coding for ultra-low bi- trate image and video compression.IEEE Transactions on Circuits and Systems for Video Technology, 35(10):10500– 10515, 2025

Linfeng Qi, Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, and Yan Lu. Generative latent coding for ultra-low bi- trate image and video compression.IEEE Transactions on Circuits and Systems for Video Technology, 35(10):10500– 10515, 2025. 1, 3

work page 2025

[45] [45]

Gener- ating diverse high-fidelity images with vq-vae-2.Advances in neural information processing systems, 32, 2019

Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. Gener- ating diverse high-fidelity images with vq-vae-2.Advances in neural information processing systems, 32, 2019. 4

work page 2019

[46] [46]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 1, 3

work page 2022

[47] [47]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014. 5, 4

work page internal anchor Pith review Pith/arXiv arXiv 2014

[48] [48]

Stablecodec: Taming one-step diffusion for extreme image compression

Zhang Tianyu, Luo Xin, Li Li, and Liu Dong. Stablecodec: Taming one-step diffusion for extreme image compression. InInternational Conference on Computer Vision (ICCV),

work page

[49] [49]

Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017

Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017. 1, 3, 5

work page 2017

[50] [50]

de/jvet/VVCSoftware_VTM/, 2025

VTM-23.10.https://vcgit.hhi.fraunhofer. de/jvet/VVCSoftware_VTM/, 2025. Accessed: 2025- 06-05. 1, 2, 5, 6, 7, 4

work page 2025

[51] [51]

The jpeg still picture compression stan- dard.Communications of the ACM, 34(4):30–44, 1991

Gregory K Wallace. The jpeg still picture compression stan- dard.Communications of the ACM, 34(4):30–44, 1991. 1, 2, 3

work page 1991

[52] [52]

Ex- ploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. InPro- ceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 5, 6

work page 2023

[53] [53]

Multi- scale structural similarity for image quality assessment

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multi- scale structural similarity for image quality assessment. In The thirty-seventh asilomar conference on signals, systems & computers, 2003, pages 1398–1402. IEEE, 2003. 5, 6

work page 2003

[54] [54]

Multirate neural im- age compression with adaptive lattice vector quantization

Hao Xu, Xiaolin Wu, and Xi Zhang. Multirate neural im- age compression with adaptive lattice vector quantization. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 7633–7642, 2025. 3

work page 2025

[55] [55]

One-step diffusion-based image compression with semantic distillation

Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, and Yan Lu. One-step diffusion-based image compression with semantic distillation. InConference on Neural Information Processing Systems (NeurIPS), 2025. 1, 3

work page 2025

[56] [56]

Dlf: Extreme image compression with dual- generative latent fusion

Naifu Xue, Zhaoyang Jia, Jiahao Li, Bin Li, Yuan Zhang, and Yan Lu. Dlf: Extreme image compression with dual- generative latent fusion. InInternational Conference on Computer Vision (ICCV), 2025. 1, 3, 2

work page 2025

[57] [57]

Lossy image compression with conditional diffusion models.Advances in Neural In- formation Processing Systems, 36:64971–64995, 2023

Ruihan Yang and Stephan Mandt. Lossy image compression with conditional diffusion models.Advances in Neural In- formation Processing Systems, 36:64971–64995, 2023. 3

work page 2023

[58] [58]

Progres- sive compression with universally quantized diffusion mod- els

Yibo Yang, Justus C Will, and Stephan Mandt. Progres- sive compression with universally quantized diffusion mod- els. InInternational Conference on Learning Representa- tions (ICLR), 2025. 2, 5, 6

work page 2025

[59] [59]

Soundstream: An end- to-end neural audio codec.IEEE/ACM Transactions on Au- dio, Speech, and Language Processing, 30:495–507, 2021

Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. Soundstream: An end- to-end neural audio codec.IEEE/ACM Transactions on Au- dio, Speech, and Language Processing, 30:495–507, 2021. 2, 3

work page 2021

[60] [60]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 2, 5, 4

work page 2018

[61] [61]

generate

Xiaosu Zhu, Jingkuan Song, Lianli Gao, Feng Zheng, and Heng Tao Shen. Unified multivariate gaussian mixture for efficient neural image compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17612–17621, 2022. 3, 4 ProGIC: Progressive and Lightweight Generative Image Compression with Residual V ector Quantiz...

work page 2022