pith. sign in

arxiv: 2605.23323 · v1 · pith:JG6D3QNSnew · submitted 2026-05-22 · 📡 eess.IV · cs.CV

Efficient Learned Image Compression without Entropy Coding

Pith reviewed 2026-05-25 03:02 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords learned image compressionvector quantizationentropy coding freeautoregressive transformlow latency compressionimage coding
0
0 comments X

The pith

EF-LIC removes statistical and correlation redundancy in learned image compression without using entropy coding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EF-LIC as a multi-rate framework that replaces sequential entropy coding with two mechanisms to produce compact image representations at low latency. Unconstrained vector quantization is introduced and shown to drive index distributions toward the maximum-entropy bound, thereby minimizing statistical redundancy. A context-conditioned autoregressive transform then reparameterizes the latents to reduce their mutual dependencies. Theoretical analysis establishes that this pair removes correlation redundancy as effectively as entropy-coded learned image compression, and experiments confirm comparable rate-distortion curves together with substantially higher encoding and decoding speeds.

Core claim

EF-LIC generates compact representations by unconstrained vector quantization, whose index distribution approaches the maximum-entropy bound, and a context-conditioned autoregressive transform that reparameterizes latents to reduce dependency, allowing removal of both statistical and correlation redundancy without entropy coding while matching the performance of entropy-coded learned image compression.

What carries the argument

Unconstrained vector quantization paired with a context-conditioned autoregressive transform, which together eliminate the need for entropy coding by driving index distributions to maximum entropy and directly reducing latent inter-dependencies.

If this is right

  • EF-LIC achieves up to 67.86 percent bitrate reduction over MS-ILLM on the Kodak dataset under the LPIPS metric.
  • Encoding runs more than three times faster and decoding more than five times faster than entropy-coded baselines.
  • Compression performance remains comparable to the entropy-coding variant of the same architecture.
  • The approach supports multiple rates within a single trained model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Real-time or low-power devices could adopt the method where sequential entropy coding creates unacceptable latency.
  • The same redundancy-removal pattern might extend to video or point-cloud compression without requiring changes to the entropy stage.
  • If the index distribution truly saturates the entropy bound, further gains would require improving the transform rather than the quantizer.

Load-bearing premise

Unconstrained vector quantization produces an index distribution that approaches the maximum-entropy bound, removing statistical redundancy without entropy coding.

What would settle it

Compute the empirical entropy of the index sequences produced by the unconstrained vector quantizer on held-out images and compare it to log2 of the codebook size; a large gap would falsify the claim that statistical redundancy is removed.

Figures

Figures reproduced from arXiv: 2605.23323 by Hao Cao, Jungong Han, Wenqi Guo, Zhijin Qin.

Figure 1
Figure 1. Figure 1: (a) EF-LIC is the proposed method, which achieves high performance and low decoding latency. EF-LIC-s is its lightweight variant. (b) Comparison of EF-LIC with its variants. “UQ+EC” denotes typical LIC with uniform quantization (UQ), context mod￾eling, and entropy coding. “VQ” is the baseline method without inter-latent decorrelation. “VQ+EC” denotes context modeling and entropy coding for discrete VQ indi… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Left: a VQ-only baseline that is fast but less efficient due to missing inter-latent decorrelation. (b) Middle: a typical entropy￾coded LIC pipeline, where the context model f CM outputs conditional probabilities for AE and AD. (c) Right: the proposed EF-LIC, which applies a context-conditional transform to produce low-correlation latents and uses unconstrained VQ to remove redundancy. et al., 2017). C… view at source ↗
Figure 3
Figure 3. Figure 3: R–D performance on the Kodak, Tecnick, DIV2K, and CLIC2020 datasets, evaluated with LPIPS and DISTS vs. BPP. Curves closer to the origin indicate better compression performance. 4. Experiments 4.1. Experimental Setup We follow the common practice (Balle et al. ´ , 2018; Jia et al., 2025) and set fy = 16 and fz = 64. Since N = 4, we set K1 = 1024, K2 = 512, K3 = 256, K4 = 128, Kz = 1024. This is an empirica… view at source ↗
Figure 4
Figure 4. Figure 4: Visual comparison on Kodak. Numbers are LPIPS/BPP. Lower LPIPS is better [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Implementation details of EF-LIC, which largely follow DCVC-RT (Jia et al., 2025). The quantizer is realized as a set of RVQ modules with different numbers of codebooks, denoted by m. A rate-selection key determines which quantizer is used for a given inference. (b) RVQ architecture, following (Kumar et al., 2023). (c) DC block architecture, following (Jia et al., 2025). Using PN i=1 ni log Ki = R′ and… view at source ↗
Figure 6
Figure 6. Figure 6: R-D performance on the Kodak dataset, evaluated with LPIPS vs. BPP. Curves closer to the lower-left are better. 1 3 5 1 3 5 1 3 5 1 3 5 1 3 5 Codebook ID 0 20 40 60 80 100 1 H (\%) Q1 Q2 Q3 Q4 Qz [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Normalized codebook entropy for each codebook in Q1–Q4 and Qz, where there are 5 codebooks in each RVQ. Each bar reports 1 − ∆H for the corresponding quantizer. A higher bar denotes less statistical redundancy. with the conclusions in Theorem 3.1 and Equation (7). In addition, the quantizer for the latents y exhibits high code￾book utilization, whereas the hyperprior quantizer Qz for z shows low utilizatio… view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative results of EF-LIC at different bitrates on Kodak. The bitrate increases from left to right. aligns more closely with mainstream evaluation paradigms in recent works (Qi et al., 2025; Xue et al., 2025b; Zhang et al., 2025; Xue et al., 2025a), it deviates from the configu￾ration we previously reported in the rebuttal. D.7. More Visualization Results In this section, we provide additional visualiz… view at source ↗
Figure 9
Figure 9. Figure 9: R–D performance on the Kodak, Tecnick, DIV2K, and CLIC2020 datasets, evaluated with PSNR, MS-SSIM, FID and KID vs. BPP. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: R–D performance on the Kodak, Tecnick, DIV2K, and CLIC2020 datasets, evaluated with NIQE, MUSIQ and CLIP-IQA vs. BPP. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visual comparison on Tecnick (Asuni et al., 2014). Numbers are LPIPS/BPP. Lower values indicate better visual quality and higher compression. 0.2433 / 0.0667 VTM-23.10 MS-ILLM 0.4605 / 0.0443 Original Image LPIPS / BPP 0.1687 / 0.0371 RDEIC EF-LIC (Ours) 0.2400 / 0.0533 DiffEIC 0.2544 / 0.0501 [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visual comparison on DIV2K (Agustsson & Timofte, 2017). Numbers are LPIPS/BPP. Lower values indicate better visual quality and higher compression. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visual comparison on Tecnick (Asuni et al., 2014). Numbers are LPIPS/BPP. Lower values indicate better visual quality and higher compression. 0.2942 / 0.1436 0.2670 / 0.1426 RDEIC EF-LIC (Ours) 0.3705 / 0.1610 MS-ILLM 0.2873 / 0.1428 Original Image DiffEIC LPIPS / BPP [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visual comparison on CLIC 2020 (CLIC, 2020). Numbers are LPIPS/BPP. Lower values indicate better visual quality and higher compression. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
read the original abstract

Entropy coding is widely used in typical learned image compression (LIC) that converts latents into a compact bitstream. However, entropy coding is typically sequential and becomes the coding latency bottleneck. To overcome it, we present Entropy-Coding Free Learned Image Compression (EF-LIC), a multi-rate framework that generates compact representation by removing statistical and correlation redundancy with low coding latency. First, we introduce unconstrained vector quantization and prove that its index distribution approaches the maximum-entropy bound, yielding minimal statistical redundancy. Second, we propose a context-conditioned autoregressive transform that directly reparameterizes the latents to reduce inter-dependency. Theoretical analysis shows that EF-LIC can remove correlation redundancy as effectively as typical LIC with entropy coding, leading to comparable compression performance. Experiments show EF-LIC achieves up to 67.86% bitrate reduction over MS-ILLM on Kodak with LPIPS. Ablation studies further show EF-LIC matches the compression performance of its entropy-coding based variant while achieving over $3\times$ faster encoding and $5\times$ faster decoding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Entropy-Coding Free Learned Image Compression (EF-LIC), a multi-rate framework for learned image compression that avoids entropy coding. It introduces unconstrained vector quantization, for which it claims to prove that the index distribution approaches the maximum-entropy bound (thereby removing statistical redundancy), and a context-conditioned autoregressive transform to reparameterize latents and remove correlation redundancy. Theoretical analysis asserts that this combination matches the redundancy removal of standard entropy-coded LIC. Experiments claim up to 67.86% bitrate reduction versus MS-ILLM on Kodak under LPIPS, with >3× faster encoding and >5× faster decoding, plus ablations showing parity with an entropy-coded variant.

Significance. If the central theoretical claims hold under the stated conditions, the result would be significant for low-latency learned compression, as it removes the sequential entropy-coding bottleneck while preserving rate-distortion performance. The reported speedups and bitrate gains would be practically relevant for real-time applications. The work would also contribute a concrete demonstration that vector quantization plus autoregressive reparameterization can substitute for entropy coding if the max-entropy bound is attained.

major comments (2)
  1. [Theoretical Analysis] Theoretical Analysis section: the manuscript states that it proves unconstrained vector quantization yields an index distribution approaching the maximum-entropy bound, but supplies neither the derivation steps nor the explicit conditions (latent statistics, training dynamics, rate constraints) under which the bound is reached. This proof is load-bearing for the claim that statistical redundancy is removed equivalently to entropy coding.
  2. [Experiments] Experiments section (Kodak results): the 67.86% bitrate reduction over MS-ILLM under LPIPS is reported without accompanying error bars, multiple random seeds, or explicit controls for baseline hyperparameter matching; this undermines confidence that the gain survives standard statistical checks and is not an artifact of post-hoc selection.
minor comments (1)
  1. [Abstract] The abstract invokes both a 'proof' and a 'theoretical analysis' equating the two redundancy-removal mechanisms; these should be cross-referenced to specific numbered equations or lemmas in the main text for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Theoretical Analysis] Theoretical Analysis section: the manuscript states that it proves unconstrained vector quantization yields an index distribution approaching the maximum-entropy bound, but supplies neither the derivation steps nor the explicit conditions (latent statistics, training dynamics, rate constraints) under which the bound is reached. This proof is load-bearing for the claim that statistical redundancy is removed equivalently to entropy coding.

    Authors: We acknowledge that the Theoretical Analysis section presented the result at a high level without full derivation steps or explicit conditions. In the revision we will expand this section to include the complete mathematical derivation, specifying the assumptions on latent statistics (e.g., i.i.d. Gaussian-like marginals after normalization), training dynamics (unconstrained VQ with uniform initialization and rate-regularized loss), and rate constraints under which the index distribution provably converges to the maximum-entropy bound. This will make explicit how statistical redundancy removal matches that of entropy coding. revision: yes

  2. Referee: [Experiments] Experiments section (Kodak results): the 67.86% bitrate reduction over MS-ILLM under LPIPS is reported without accompanying error bars, multiple random seeds, or explicit controls for baseline hyperparameter matching; this undermines confidence that the gain survives standard statistical checks and is not an artifact of post-hoc selection.

    Authors: We agree that reporting variability across random seeds and explicit baseline controls would increase confidence in the results. We will rerun the Kodak experiments with at least five random seeds, include error bars (mean ± std) for the reported bitrate reductions, and add a supplementary table listing the exact hyperparameter settings used for MS-ILLM (taken from its public implementation) to document fair matching. These additions will be incorporated in the revised Experiments section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical claims presented as independent

full rationale

The abstract asserts a proof that unconstrained VQ index distribution approaches the maximum-entropy bound and that the context-conditioned autoregressive transform removes correlation redundancy equivalently to entropy-coded LIC. No equations, self-citations, or reductions to fitted inputs are visible in the provided text that would make any prediction equivalent to its inputs by construction. The derivation chain is therefore treated as self-contained theoretical analysis supported by external experiments, consistent with a non-circular finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only; the ledger is populated from statements that must be true for the central claim to hold. No explicit free parameters, axioms, or invented entities are named in the provided text.

axioms (2)
  • domain assumption Unconstrained vector quantization produces an index distribution that approaches the maximum-entropy bound.
    Stated as proved in the abstract; required for the claim that statistical redundancy is removed without entropy coding.
  • domain assumption The context-conditioned autoregressive transform removes inter-dependency at least as effectively as entropy coding.
    Central to the theoretical equivalence claim; location is the 'theoretical analysis' paragraph of the abstract.

pith-pipeline@v0.9.0 · 5712 in / 1504 out tokens · 20723 ms · 2026-05-25T03:02:45.008954+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 3 internal anchors

  1. [1]

    and Timofte, R

    Agustsson, E. and Timofte, R. NTIRE 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 126– 135,

  2. [2]

    B´egaint, J., Racap ´e, F., Feltman, S., and Pushparaja, A. 9 Efficient Learned Image Compression without Entropy Coding CompressAI: A pytorch library and evaluation platform for end-to-end compression research.arXiv preprint arXiv:2011.03029,

  3. [3]

    Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding

    Duda, J. Asymmetric numeral systems: Entropy cod- ing combining speed of Huffman coding with com- pression rate of arithmetic coding.arXiv preprint arXiv:1311.2540,

  4. [4]

    Kingma, D. P. and Welling, M. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114,

  5. [5]

    Muckley, M

    doi: 10.1109/ LSP.2012.2227726. Muckley, M. J., El-Nouby, A., Ullrich, K., J´egou, H., and Verbeek, J. Improving statistical fidelity for neural im- age compression with implicit local likelihood models. InProceedings of International Conference on Machine Learning (ICML), pp. 25426–25443. PMLR,

  6. [6]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Simonyan, K. and Zisserman, A. Very deep convolu- tional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,

  7. [7]

    Wallace, G

    Accessed: 2025- 06-05. Wallace, G. The jpeg still picture compression standard. IEEE Transactions on Consumer Electronics, 38(1):xviii– xxxiv,

  8. [8]

    P., and Bovik, A

    Wang, Z., Simoncelli, E. P., and Bovik, A. C. Multiscale structural similarity for image quality assessment. InThe thirty-seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pp. 1398–1402. IEEE,

  9. [9]

    • Section B describes the model implementation and bitstream packing methods

    12 Efficient Learned Image Compression without Entropy Coding Appendix In the appendix, we provide the following: • Section A provides proofs of Theorems 3.1, 3.3 and 3.5. • Section B describes the model implementation and bitstream packing methods. • Section C presents additional experimental details, in- cluding the exact settings of competing methods a...

  10. [10]

    To support multiple bitrates, we use a set of independent RVQ modules, where each RVQ uses a different number of codebooks m

    We implement RVQ following (Kumar et al., 2023), as il- lustrated in Figure 5b. To support multiple bitrates, we use a set of independent RVQ modules, where each RVQ uses a different number of codebooks m. In the main text, we set m∈ {1,2,3,4,5} to cover a sufficiently wide bitrate range. At inference time, in addition to the input image, the model takes ...

  11. [11]

    The header takes 28 bits for H and W and 4 bits for q, which is negligible compared to the overall bitrate

    For transmission, we prepend a header containing H, W , and q, where H×W is the input image resolution and q is the rate-selection param- eter. The header takes 28 bits for H and W and 4 bits for q, which is negligible compared to the overall bitrate. Given a fixed model, the mapping from H×W to the index grid h×w is deterministic, and the number of codeb...

  12. [12]

    UQ+EC” and “VQ+EC

    measures the average bitrate dif- ference between two methods over a specified quality range. We compute BD-rate as the area between the two R–D curves after interpolating them with a monotonic piecewise cubic Hermite interpolating polynomial (PCHIP). A nega- tive BD-rate indicates that the proposed method achieves the same quality at a lower bitrate than...

  13. [13]

    Best results are inbold

    Table 5.Comparison of BD-rate on the Kodak, Tecnick, DIV2K, and CLIC 2020 datasets evaluated under DISTS. Best results are inbold. Second-best are underlined . Method BD-rate (DISTS) Kodak Tecnick DIV2K CLIC2020 HiFiC 90.08% 99.67% 100.76% 124.45% Control-GIC 34.18% 67.12% 62.09% 110.76% MS-ILLM 0.00% 0.00% 0.00% 0.00% DiffEIC -33.79% 23.68% 15.78% 59.91%...

  14. [14]

    Enc./Dec

    100.18 200.03 6820.28 1065.81 8.03% OneDC (Xue et al., 2025a) 100.50 235.03 7142.91 1406.42 0.00% EF-LIC (Ours)17.62 13.72 279.61 35.74 -3.33% Table 7.Comparison of GPU runtimes (ms) and memory (GB) for image encoding and decoding across different resolutions. Enc./Dec. denote encoding/decoding times. Mem. denotes memory usage. Best results are inbold. Se...

  15. [15]

    We compare against the stronger entropy coding implementations in DCVC-RT (Jia et al., 2025)

    because it has been widely adopted in most LIC (Ball´e et al., 2018; Minnen et al., 2018; He et al., 2022a; Feng et al., 2025; Li et al., 2025b). We compare against the stronger entropy coding implementations in DCVC-RT (Jia et al., 2025). The results are also included in Table

  16. [16]

    in the main paper, we provide R–D curves measured by PSNR, MS-SSIM (Wang et al., 2003), FID (Heusel et al., 2017), KID (Bi´nkowski et al., 2018), NIQE (Mittal et al., 2013), MUSIQ (Ke et al., 2021), and CLIP-IQA (Wang et al.,

  17. [17]

    can produce visually realistic images, but their content can differ sub- stantially from the original images, which leads to a large FID in our evaluation. Since our goal is image compression rather than image generation, preserving fidelity to the orig- inal content is essential, and we therefore primarily report LPIPS and DISTS in the main paper. Follow...

  18. [18]

    This protocol crops images into 256×256 non-overlapping patches to significantly aug- ment the sample size, thereby ensuring a more accurate and robust calculation of both metrics

    and KID (Bi´nkowski et al., 2018). This protocol crops images into 256×256 non-overlapping patches to significantly aug- ment the sample size, thereby ensuring a more accurate and robust calculation of both metrics. While this approach 19 Efficient Learned Image Compression without Entropy Coding 0.1690 / 0.1782 0.1229 / 0.1782 0.1825 / 0.1426 0.1311 / 0....

  19. [19]

    Although the qualitative results of different models on high-resolution images appear similar, we include them to demonstrate that EF-LIC also functions correctly on high-resolution inputs. 20 Efficient Learned Image Compression without Entropy Coding 0.05 0.10 0.15 0.20 0.25 BPP 21.0 22.5 24.0 25.5 27.0PSNR Kodak 0.05 0.10 0.15 0.20 0.25 BPP 22 24 26 28 ...