ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

Feng Ding; Haisheng Fu; Jie Liang; Jingning Han; Runyu Yang; Siyu Zhu; Xiaoxiao Li; Zhenman Fang

arxiv: 2606.00111 · v1 · pith:KAQIZAOZnew · submitted 2026-05-27 · 📡 eess.IV · cs.CV· cs.LG

ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

Haisheng Fu , Runyu Yang , Feng Ding , Siyu Zhu , Jie Liang , Xiaoxiao Li , Zhenman Fang , Jingning Han This is my paper

Pith reviewed 2026-06-29 09:53 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords learned image compressionwavelet transformtransformer attentionentropy modelingchannel-wiserate-distortionautoregressive coding

0 comments

The pith

Channel-wise wavelet transforms in transformer attention and entropy coding improve learned image compression performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes incorporating channel-wise wavelet transforms into the attention mechanism of transformers and the entropy modeling stage for learned image compression. It keeps the windowed spatial self-attention but applies wavelet to the channel features for Q/K/V projections, and uses a wavelet packet split into four subbands for better slice-based entropy coding. This leads to substantial BD-rate reductions on standard test sets. A reader would care because it shows how wavelet sparsity can enhance hybrid CNN-transformer models for more efficient image coding.

Core claim

The proposed Channel-wise Wavelet-Domain Transformer Attention (ChWDTA) computes Q/K/V on channel-wise wavelet-transformed features while preserving windowed attention, and the channel-wise wavelet packet (ChWP) decomposition enables eight-slice autoregressive entropy modeling, resulting in BD-rate reductions of -17.82%, -19.15%, and -22.56% on Kodak, CLIC, and Tecnick datasets respectively.

What carries the argument

Channel-wise Wavelet-Domain Transformer Block (ChWDTB) that applies wavelet transform before Q/K/V projections and inverse after attention output.

If this is right

The scheme achieves significant rate-distortion improvements on multiple test sets.
Using single slice per subband retains most gains with lower complexity.
Wavelet transforms provide an advantage in CNN-transformer based learned image compression.
Channel-wise wavelet packet decomposition fits well with slice-based autoregressive entropy models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar wavelet integrations could be tested in video compression frameworks.
The sparsification of channel covariance might reduce attention computation costs in other domains.
Further splits or different wavelet types could be explored for additional gains.

Load-bearing premise

That the channel-wise wavelet transform applied before Q/K/V projections and the four-subband wavelet-packet split will improve rate-distortion performance on the chosen test sets without introducing reconstruction artifacts or requiring extensive hyper-parameter retuning.

What would settle it

Removing the wavelet transforms from the ChWDTA and ChWP modules and re-measuring BD-rate on the Kodak, CLIC, and Tecnick sets; loss of the reported savings would support the claim's dependence on the wavelet components.

Figures

Figures reproduced from arXiv: 2606.00111 by Feng Ding, Haisheng Fu, Jie Liang, Jingning Han, Runyu Yang, Siyu Zhu, Xiaoxiao Li, Zhenman Fang.

**Figure 2.** Figure 2: The overall architecture of the proposed scheme. The details of the ChWDTB and the ChARM module are shown in Fig. 3 and Fig. 4, respectively. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Details of the ChWDTB block. ChWDMSA denotes Channel [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The ChARM entropy-coding model. (a) After a two-level wavelet packet (WP) decomposition, the latent representation [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Evaluation results of different methods on the Kodak dataset in terms of PSNR and MS-SSIM. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Evaluation results of different methods on the Tecnick and CLIC Professional Validation datasets in terms of PSNR. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Representative channel-correlation visualization for ChWDTB at ChWDMSA #0 on a test image. The first two panels show the selected channel [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Entropy-slice visualization of the two-level ChWP transform on Kodim 08 from Kodak dataset. The first panel shows the slice-level mean absolute [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

read the original abstract

State-of-the-art learned image compression (LIC) schemes are increasingly based on hybrid CNN-transformer architectures. To further improve rate-distortion performance, we introduce channel-wise wavelet transforms into both the transformer and entropy-coding components. First, we propose a channel-wise wavelet-domain transformer attention (ChWDTA) mechanism. ChWDTA keeps the efficient windowed spatial self-attention used in modern LIC backbones, but computes the Q/K/V projections on channel-wise wavelet-transformed features before mapping the attention output back with the inverse transform. The resulting Channel-wise Wavelet-Domain Transformer Block (ChWDTB) therefore preserves the spatial tokenization pattern of windowed attention while sparsifying the channel covariance seen by the attention projections. Second, in the entropy-coding stage, we introduce a channel-wise wavelet packet (ChWP) decomposition that produces four equal-sized subbands, which better fit channel-wise slice-based autoregressive entropy modeling. When each channel-wise subband is divided into two slices, we use eight slices for entropy coding. With this configuration, the proposed scheme obtains BD-rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively. Even when each channel-wise subband is coded as a single slice, the scheme still retains most of the coding gains with lower complexity. The results confirm the advantage of introducing wavelet transform in CNN-transformer-based LIC schemes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds channel-wise wavelets to the attention and entropy stages of a hybrid LIC model and reports 18-22% BD-rate gains, but the experiments do not isolate whether the wavelet is what produces those gains.

read the letter

The main contribution is the ChWDTA block, which applies a channel-wise wavelet transform before the Q/K/V projections inside windowed spatial attention, then inverts it afterward, plus the ChWP entropy split that turns channels into four subbands and codes them with eight slices total. Neither piece appears in the prior LIC papers referenced in the abstract, so the specific pairing is new.

The approach keeps the efficient spatial windowing that current transformer backbones rely on while trying to decorrelate the channel dimension. The reported BD-rate numbers on Kodak, CLIC Professional Validation, and Tecnick are large enough to notice, and the abstract notes that even the single-slice-per-subband version keeps most of the gain at lower cost.

The soft spot is exactly the one the stress-test note flags. Channels in these models are ordered by learned filter responses rather than spatial or temporal locality, so there is no obvious reason a wavelet basis should decorrelate them better than a 1x1 convolution or another orthogonal transform. The abstract supplies no covariance plots, no ablation that replaces the wavelet with a different channel mixer, and no test that holds slice count fixed while swapping the transform. The gains could therefore come from the extra slices, added capacity, or training differences instead of the stated mechanism. Without those controls the central claim stays unverified.

This is for people already working on CNN-transformer hybrids for compression who want to experiment with wavelet mixing in attention or entropy. A reader looking for a ready-to-try block design might find the description useful even if the performance attribution needs more work.

I would send it to peer review. The technique is distinct and the numbers are big enough that referees should see the full implementation and any additional ablations.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Channel-wise Wavelet-Domain Transformer Attention (ChWDTA) and Channel-wise Wavelet Packet (ChWP) decomposition for learned image compression. ChWDTA applies a channel-wise wavelet transform before Q/K/V projections in windowed spatial self-attention while preserving the spatial tokenization pattern; ChWP splits channels into four subbands for improved slice-based autoregressive entropy modeling (yielding eight slices). The central claim is that these yield BD-rate reductions of -17.82% on Kodak, -19.15% on CLIC Professional Validation, and -22.56% on Tecnick, confirming the advantage of wavelet transforms in CNN-transformer LIC backbones.

Significance. If the reported gains are shown to arise specifically from the wavelet mechanisms rather than capacity or training differences, the work would provide a concrete demonstration of how wavelet bases can be inserted into transformer attention and entropy modules to improve rate-distortion performance while retaining efficient windowed spatial attention.

major comments (2)

[Abstract] Abstract: The BD-rate reductions (-17.82%, -19.15%, -22.56%) are stated without any experimental protocol, baseline references, ablation studies, training details, or error bars. This renders the central empirical claim unverifiable and prevents assessment of whether the gains are attributable to ChWDTA/ChWP.
[Abstract] Abstract: The claim that the channel-wise wavelet transform sparsifies channel covariance (and thereby improves attention) is presented without supporting analysis; no covariance visualizations, comparisons against random orthogonal transforms or 1x1 convolutions, or ablations isolating the wavelet from the eight-slice change are supplied. Because learned channels lack inherent spatial locality, the a-priori advantage of a wavelet basis remains unestablished.

minor comments (1)

[Abstract] Abstract: The transition from four subbands to eight slices for entropy coding would benefit from an explicit diagram or equation clarifying the slicing procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below, clarifying that experimental details appear in the main text while agreeing that additional supporting analyses would strengthen the claims.

read point-by-point responses

Referee: [Abstract] Abstract: The BD-rate reductions (-17.82%, -19.15%, -22.56%) are stated without any experimental protocol, baseline references, ablation studies, training details, or error bars. This renders the central empirical claim unverifiable and prevents assessment of whether the gains are attributable to ChWDTA/ChWP.

Authors: The abstract is kept concise per standard practice. Full experimental protocol (training on Vimeo-90K, Adam optimizer, etc.), baselines (Cheng et al. 2020, Minnen et al. 2020, and recent transformer LIC methods), ablation studies (including the single-slice ChWP variant that retains most gains), and training details are provided in Sections 3 and 4. Results are reported as BD-rate averages; error bars can be added in revision. The single-slice experiment helps isolate contributions from the entropy modeling component. revision: partial
Referee: [Abstract] Abstract: The claim that the channel-wise wavelet transform sparsifies channel covariance (and thereby improves attention) is presented without supporting analysis; no covariance visualizations, comparisons against random orthogonal transforms or 1x1 convolutions, or ablations isolating the wavelet from the eight-slice change are supplied. Because learned channels lack inherent spatial locality, the a-priori advantage of a wavelet basis remains unestablished.

Authors: The design draws on the established decorrelating property of wavelets, applied channel-wise before Q/K/V to reduce redundancy in the projections while preserving windowed spatial attention. We acknowledge that the manuscript does not include covariance visualizations, comparisons to random orthogonal bases or 1x1 convolutions, or an ablation that fully isolates the wavelet from the slice count change. We will add these analyses (covariance heatmaps and controlled ablations) in the revised version to directly substantiate the claim. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces ChWDTA and ChWP architectural components and reports empirical BD-rate gains on external standard test sets (Kodak, CLIC Professional Validation, Tecnick). No equations, self-citations, or claims reduce the reported gains to quantities fitted on the same data, self-defined quantities, or load-bearing prior self-work. The derivation consists of proposed transforms evaluated against independent benchmarks, making the result self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical architecture paper; no explicit free parameters, axioms, or invented entities are introduced beyond standard neural-network training assumptions.

pith-pipeline@v0.9.1-grok · 5828 in / 1001 out tokens · 21813 ms · 2026-06-29T09:53:42.670996+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 9 canonical work pages · 2 internal anchors

[1]

Learned image com- pression with discretized gaussian mixture likelihoods and attention modules,

Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image com- pression with discretized gaussian mixture likelihoods and attention modules,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 7939–7948

2020
[2]

Learned image compression with gaussian- laplacian-logistic mixture model and concatenated residual modules,

H. Fu, F. Liang, J. Lin, B. Li, M. Akbari, J. Liang, G. Zhang, D. Liu, C. Tu, and J. Han, “Learned image compression with gaussian- laplacian-logistic mixture model and concatenated residual modules,” IEEE Transactions on Image Processing, vol. 32, pp. 2063–2076, 2023

2063
[3]

Learned image compression with mixed transformer-cnn architectures,

J. Liu, H. Sun, and J. Katto, “Learned image compression with mixed transformer-cnn architectures,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 14 388–14 397

2023
[4]

Frequency- aware transformer for learned image compression,

H. Li, S. Li, W. Dai, C. Li, J. Zou, and H. Xiong, “Frequency- aware transformer for learned image compression,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=HKGQDDTuvZ

2024
[5]

Lin- ear attention modeling for learned image compression,

D. Feng, Z. Cheng, S. Wang, R. Wu, H. Hu, G. Lu, and L. Song, “Lin- ear attention modeling for learned image compression,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 1–10

2025
[6]

Learned image compression with dictionary-based entropy model,

J. Lu, L. Zhang, X. Zhou, M. Li, W. Li, and S. Gu, “Learned image compression with dictionary-based entropy model,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 12 850–12 859

2025
[7]

Learned image compression with hierarchical progressive context modeling,

Y . Li, H. Zhang, L. Li, and D. Liu, “Learned image compression with hierarchical progressive context modeling,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025, pp. 18 834–18 843

2025
[8]

Variational image compression with a scale hyperprior,

J. Ball ´e, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” inInternational Conference on Learning Representations, 2018, pp. 1–23

2018
[9]

Joint autoregressive and hierarchical priors for learned image compression,

D. Minnen, J. Ball ´e, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” inAdvances in Neural Information Processing Systems, 2018, pp. 10 794–10 803

2018
[10]

Mlic: Multi- reference entropy model for learned image compression,

W. Jiang, J. Yang, Y . Zhai, P. Ning, F. Gao, and R. Wang, “Mlic: Multi- reference entropy model for learned image compression,” inProceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7618–7627

2023
[11]

Elic: Efficient learned image compression with unevenly grouped space- channel contextual adaptive coding,

D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y . Wang, “Elic: Efficient learned image compression with unevenly grouped space- channel contextual adaptive coding,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 5718–5727

2022
[12]

Mambavc: Learned visual compression with selective state spaces,

S. Qin, J. Wang, Y . Zhou, B. Chen, T. Luo, B. An, T. Dai, S. Xia, and Y . Wang, “Mambavc: Learned visual compression with selective state spaces,” 2024. [Online]. Available: https://arxiv.org/abs/2405.15413

work page arXiv 2024
[13]

Checkerboard context model for efficient learned image compression,

D. He, Y . Zheng, B. Sun, Y . Wang, and H. Qin, “Checkerboard context model for efficient learned image compression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 14 771–14 780

2021
[14]

Entroformer: A transformer-based entropy model for learned image compression,

Y . Qian, M. Lin, X. Sun, Z. Tan, and R. Jin, “Entroformer: A transformer-based entropy model for learned image compression,” in International Conference on Learning Representations, May 2022

2022
[15]

End-to-end optimized versatile image compression with wavelet-like transform,

H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile image compression with wavelet-like transform,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1247– 1263, 2022. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13

2022
[16]

Image compression with learned lifting- based dwt and learned tree-based entropy models,

U. B. Sahin and F. Kamisli, “Image compression with learned lifting- based dwt and learned tree-based entropy models,”Multimedia Systems, vol. 29, no. 6, pp. 3369–3384, 2023

2023
[17]

aiwave: V olumetric image compression with 3-d trained affine wavelet-like transform,

D. Xue, H. Ma, L. Li, D. Liu, and Z. Xiong, “aiwave: V olumetric image compression with 3-d trained affine wavelet-like transform,”IEEE Transactions on Medical Imaging, vol. 42, no. 3, pp. 606–618, 2023

2023
[18]

Weconvene: Learned image compression with wavelet-domain convolution and en- tropy model,

H. Fu, J. Liang, Z. Fang, J. Han, F. Liang, and G. Zhang, “Weconvene: Learned image compression with wavelet-domain convolution and en- tropy model,” inEuropean Conference on Computer Vision (ECCV), 2024, pp. 37–53

2024
[19]

Channel-wise autoregressive entropy models for learned image compression,

D. Minnen and S. Singh, “Channel-wise autoregressive entropy models for learned image compression,” in2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 3339–3343

2020
[20]

End-to-end optimized image compression,

J. Ball ´e, V . Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” inInternational Conference on Learning Representations, 2017

2017
[21]

Asymmetric learned image compression with multi-scale residual block, importance scaling, and post-quantization filtering,

H. Fu, F. Liang, J. Liang, B. Li, G. Zhang, and J. Han, “Asymmetric learned image compression with multi-scale residual block, importance scaling, and post-quantization filtering,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 8, pp. 4309–4321, 2023

2023
[22]

Learned multi-resolution variable-rate image compression with octave-based residual blocks,

M. Akbari, J. Liang, J. Han, and C. Tu, “Learned multi-resolution variable-rate image compression with octave-based residual blocks,” IEEE Transactions on Multimedia, vol. 23, pp. 3013–3021, Mar. 2021

2021
[23]

Learning context-based nonlocal entropy modeling for image compression,

M. Li, K. Zhang, J. Li, W. Zuo, R. Timofte, and D. Zhang, “Learning context-based nonlocal entropy modeling for image compression,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 3, pp. 1132–1145, 2023

2023
[24]

End-to-end optimized versatile image compression with wavelet-like transform,

H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile image compression with wavelet-like transform,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1247– 1263, 2022

2022
[25]

Cassic: Towards content-adaptive state-space models for learned image compression,

S. Qin, J. Wang, Y . Zhou, B. Chen, T. Luo, B. An, T. Dai, S.-T. Xia, and Y . Wang, “Cassic: Towards content-adaptive state-space models for learned image compression,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025, pp. 15 727–15 736

2025
[26]

MLIC++: Linear complexity multi-reference entropy modeling for learned image compression,

W. Jiang, J. Yang, Y . Zhai, F. Gao, and R. Wang, “MLIC++: Linear complexity multi-reference entropy modeling for learned image compression,” 2023, accepted to the ICML 2023 Neural Compression Workshop and ACM TOMM 2025. [Online]. Available: https://arxiv.org/abs/2307.15421

work page arXiv 2023
[27]

MLICv2: Enhanced multi-reference entropy modeling for learned image compression,

W. Jiang, Y . Zhai, J. Yang, F. Gao, and R. Wang, “MLICv2: Enhanced multi-reference entropy modeling for learned image compression,”
[28]

Available: https://arxiv.org/abs/2504.19119

[Online]. Available: https://arxiv.org/abs/2504.19119

work page arXiv
[29]

What Matters in Practical Learned Image Compression

K. Tatwawadi, P. Rahimzadeh, Z. Sun, Z. Chen, Z. Yang, S. Nair, D. Hasteer, and O. Rippel, “What matters in practical learned image compression,” 2026. [Online]. Available: https://arxiv.org/abs/ 2605.05148

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

Spatial competition for low-complexity learned image compression,

T. Blard, P. Philippe, T. Ladune, X. Jiang, and O. D ´eforges, “Spatial competition for low-complexity learned image compression,” 2026, accepted to ICIP 2026. [Online]. Available: https://arxiv.org/abs/2605. 13243

2026
[31]

Cool-chic 5.0: Faster Encoding and Inter-Feature Entropy Modeling for Overfitted Image Compression

T. Ladune, P. Philippe, P. Jaffuer, T. Blard, S. Kervadec, F. Henry, and G. Clare, “Cool-chic 5.0: Faster encoding and inter-feature entropy modeling for overfitted image compression,” 2026. [Online]. Available: https://arxiv.org/abs/2605.02726

work page internal anchor Pith review Pith/arXiv arXiv 2026
[32]

The jpeg 2000 still im- age compression standard,

A. Skodras, C. Christopoulos, and T. Ebrahimi, “The jpeg 2000 still im- age compression standard,”IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36–58, 2001

2000
[33]

D. S. Taubman and M. W. Marcellin,JPEG2000: Image Compression Fundamentals, Standards and Practice, ser. The Springer International Series in Engineering and Computer Science. New York, NY: Springer, 2002, vol. 642

2002
[34]

Frequency disentangled features in neural image compression,

A. Zafari, A. Khoshkhahtinat, P. Mehta, M. S. Ebrahimi Saadabadi, M. Akyash, and N. M. Nasrabadi, “Frequency disentangled features in neural image compression,” in2023 IEEE International Conference on Image Processing (ICIP), 2023, pp. 2815–2819

2023
[35]

Channel-wise feature decorrelation for enhanced learned image compression,

F. Pakdaman and M. Gabbouj, “Channel-wise feature decorrelation for enhanced learned image compression,” 2024. [Online]. Available: https://arxiv.org/abs/2403.10936

work page arXiv 2024
[36]

On disentangled training for nonlinear transform in learned image compression,

H. Li, S. Li, W. Dai, M. Cao, N. Kan, C. Li, J. Zou, and H. Xiong, “On disentangled training for nonlinear transform in learned image compression,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=U67J0QNtzo

2025
[37]

Bi-level spatial and channel-aware transformer for learned image compression,

H. Soltani and E. Ghasemi, “Bi-level spatial and channel-aware transformer for learned image compression,” 2024. [Online]. Available: https://arxiv.org/abs/2408.03842

work page arXiv 2024
[38]

Window-based channel attention for wavelet-enhanced learned image compression,

H. Xu, B. Hai, Y . Tang, and Z. He, “Window-based channel attention for wavelet-enhanced learned image compression,” 2024. [Online]. Available: https://arxiv.org/abs/2409.14090

work page arXiv 2024
[39]

3dm-weconvene: Learned image compression with 3d multi-level wavelet-domain convolution and entropy model,

H. Fu, J. Liang, F. Liang, Z. Fang, G. Zhang, and J. Han, “3dm-weconvene: Learned image compression with 3d multi-level wavelet-domain convolution and entropy model,” 2025. [Online]. Available: https://arxiv.org/abs/2504.04658

work page arXiv 2025
[40]

Entropy-based algorithms for best basis selection,

R. Coifman and M. Wickerhauser, “Entropy-based algorithms for best basis selection,”IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 713–718, 1992

1992
[41]

The lifting scheme: A custom-design construction of biorthogonal wavelets,

W. Sweldens, “The lifting scheme: A custom-design construction of biorthogonal wavelets,”Applied and Computational Harmonic Analysis, vol. 3, no. 2, pp. 186–200, 1996

1996
[42]

The devil is in the details: Window- based attention for image compression,

R. Zou, C. Song, and Z. Zhang, “The devil is in the details: Window- based attention for image compression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 17 492–17 501

2022
[43]

Openimages: A public dataset for large-scale multi-label and multi-class image classification

T. D. Ivan Krasin, N. Alldrin, A. Veit, S. Abu-El-Haija, S. Belongie, D. Cai, Z. Feng, V . Ferrari, V . Gomes, A. Gupta, D. Narayanan, C. Sun, G. Chechik, and K. Murphy., “Openimages: A public dataset for large-scale multi-label and multi-class image classification.” https: //github.com/openimages, 2016. [43]Kodak PhotoCD dataset, http://r0k.us/graphics/k...

2016
[44]

TESTIMAGES: a Large-scale Archive for Testing Visual Devices and Basic Image Processing Algorithms

N. Asuni and A. Giachetti, “TESTIMAGES: a Large-scale Archive for Testing Visual Devices and Basic Image Processing Algorithms.” The Eurographics Association, 2014

2014
[45]

2021 workshop and challenge on learned image compression (clic)

G. Toderici, R. Timofte, J. Balle, E. Agustsson, N. Johnston, and F. Mentzer, “2021 workshop and challenge on learned image compression (clic).” [Online]. Available: http://www.compression.cc

2021
[46]

Multiscale structural similarity for image quality assessment,

Z. Wang, E. Simoncelli, and A. Bovik, “Multiscale structural similarity for image quality assessment,” inThe Thirty-Seventh Asilomar Confer- ence on Signals, Systems Computers, 2003, vol. 2, 2003, pp. 1398–1402

2003

[1] [1]

Learned image com- pression with discretized gaussian mixture likelihoods and attention modules,

Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image com- pression with discretized gaussian mixture likelihoods and attention modules,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 7939–7948

2020

[2] [2]

Learned image compression with gaussian- laplacian-logistic mixture model and concatenated residual modules,

H. Fu, F. Liang, J. Lin, B. Li, M. Akbari, J. Liang, G. Zhang, D. Liu, C. Tu, and J. Han, “Learned image compression with gaussian- laplacian-logistic mixture model and concatenated residual modules,” IEEE Transactions on Image Processing, vol. 32, pp. 2063–2076, 2023

2063

[3] [3]

Learned image compression with mixed transformer-cnn architectures,

J. Liu, H. Sun, and J. Katto, “Learned image compression with mixed transformer-cnn architectures,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 14 388–14 397

2023

[4] [4]

Frequency- aware transformer for learned image compression,

H. Li, S. Li, W. Dai, C. Li, J. Zou, and H. Xiong, “Frequency- aware transformer for learned image compression,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=HKGQDDTuvZ

2024

[5] [5]

Lin- ear attention modeling for learned image compression,

D. Feng, Z. Cheng, S. Wang, R. Wu, H. Hu, G. Lu, and L. Song, “Lin- ear attention modeling for learned image compression,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 1–10

2025

[6] [6]

Learned image compression with dictionary-based entropy model,

J. Lu, L. Zhang, X. Zhou, M. Li, W. Li, and S. Gu, “Learned image compression with dictionary-based entropy model,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 12 850–12 859

2025

[7] [7]

Learned image compression with hierarchical progressive context modeling,

Y . Li, H. Zhang, L. Li, and D. Liu, “Learned image compression with hierarchical progressive context modeling,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025, pp. 18 834–18 843

2025

[8] [8]

Variational image compression with a scale hyperprior,

J. Ball ´e, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” inInternational Conference on Learning Representations, 2018, pp. 1–23

2018

[9] [9]

Joint autoregressive and hierarchical priors for learned image compression,

D. Minnen, J. Ball ´e, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” inAdvances in Neural Information Processing Systems, 2018, pp. 10 794–10 803

2018

[10] [10]

Mlic: Multi- reference entropy model for learned image compression,

W. Jiang, J. Yang, Y . Zhai, P. Ning, F. Gao, and R. Wang, “Mlic: Multi- reference entropy model for learned image compression,” inProceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7618–7627

2023

[11] [11]

Elic: Efficient learned image compression with unevenly grouped space- channel contextual adaptive coding,

D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y . Wang, “Elic: Efficient learned image compression with unevenly grouped space- channel contextual adaptive coding,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 5718–5727

2022

[12] [12]

Mambavc: Learned visual compression with selective state spaces,

S. Qin, J. Wang, Y . Zhou, B. Chen, T. Luo, B. An, T. Dai, S. Xia, and Y . Wang, “Mambavc: Learned visual compression with selective state spaces,” 2024. [Online]. Available: https://arxiv.org/abs/2405.15413

work page arXiv 2024

[13] [13]

Checkerboard context model for efficient learned image compression,

D. He, Y . Zheng, B. Sun, Y . Wang, and H. Qin, “Checkerboard context model for efficient learned image compression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 14 771–14 780

2021

[14] [14]

Entroformer: A transformer-based entropy model for learned image compression,

Y . Qian, M. Lin, X. Sun, Z. Tan, and R. Jin, “Entroformer: A transformer-based entropy model for learned image compression,” in International Conference on Learning Representations, May 2022

2022

[15] [15]

End-to-end optimized versatile image compression with wavelet-like transform,

H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile image compression with wavelet-like transform,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1247– 1263, 2022. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13

2022

[16] [16]

Image compression with learned lifting- based dwt and learned tree-based entropy models,

U. B. Sahin and F. Kamisli, “Image compression with learned lifting- based dwt and learned tree-based entropy models,”Multimedia Systems, vol. 29, no. 6, pp. 3369–3384, 2023

2023

[17] [17]

aiwave: V olumetric image compression with 3-d trained affine wavelet-like transform,

D. Xue, H. Ma, L. Li, D. Liu, and Z. Xiong, “aiwave: V olumetric image compression with 3-d trained affine wavelet-like transform,”IEEE Transactions on Medical Imaging, vol. 42, no. 3, pp. 606–618, 2023

2023

[18] [18]

Weconvene: Learned image compression with wavelet-domain convolution and en- tropy model,

H. Fu, J. Liang, Z. Fang, J. Han, F. Liang, and G. Zhang, “Weconvene: Learned image compression with wavelet-domain convolution and en- tropy model,” inEuropean Conference on Computer Vision (ECCV), 2024, pp. 37–53

2024

[19] [19]

Channel-wise autoregressive entropy models for learned image compression,

D. Minnen and S. Singh, “Channel-wise autoregressive entropy models for learned image compression,” in2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 3339–3343

2020

[20] [20]

End-to-end optimized image compression,

J. Ball ´e, V . Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” inInternational Conference on Learning Representations, 2017

2017

[21] [21]

Asymmetric learned image compression with multi-scale residual block, importance scaling, and post-quantization filtering,

H. Fu, F. Liang, J. Liang, B. Li, G. Zhang, and J. Han, “Asymmetric learned image compression with multi-scale residual block, importance scaling, and post-quantization filtering,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 8, pp. 4309–4321, 2023

2023

[22] [22]

Learned multi-resolution variable-rate image compression with octave-based residual blocks,

M. Akbari, J. Liang, J. Han, and C. Tu, “Learned multi-resolution variable-rate image compression with octave-based residual blocks,” IEEE Transactions on Multimedia, vol. 23, pp. 3013–3021, Mar. 2021

2021

[23] [23]

Learning context-based nonlocal entropy modeling for image compression,

M. Li, K. Zhang, J. Li, W. Zuo, R. Timofte, and D. Zhang, “Learning context-based nonlocal entropy modeling for image compression,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 3, pp. 1132–1145, 2023

2023

[24] [24]

End-to-end optimized versatile image compression with wavelet-like transform,

H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized versatile image compression with wavelet-like transform,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1247– 1263, 2022

2022

[25] [25]

Cassic: Towards content-adaptive state-space models for learned image compression,

S. Qin, J. Wang, Y . Zhou, B. Chen, T. Luo, B. An, T. Dai, S.-T. Xia, and Y . Wang, “Cassic: Towards content-adaptive state-space models for learned image compression,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025, pp. 15 727–15 736

2025

[26] [26]

MLIC++: Linear complexity multi-reference entropy modeling for learned image compression,

W. Jiang, J. Yang, Y . Zhai, F. Gao, and R. Wang, “MLIC++: Linear complexity multi-reference entropy modeling for learned image compression,” 2023, accepted to the ICML 2023 Neural Compression Workshop and ACM TOMM 2025. [Online]. Available: https://arxiv.org/abs/2307.15421

work page arXiv 2023

[27] [27]

MLICv2: Enhanced multi-reference entropy modeling for learned image compression,

W. Jiang, Y . Zhai, J. Yang, F. Gao, and R. Wang, “MLICv2: Enhanced multi-reference entropy modeling for learned image compression,”

[28] [28]

Available: https://arxiv.org/abs/2504.19119

[Online]. Available: https://arxiv.org/abs/2504.19119

work page arXiv

[29] [29]

What Matters in Practical Learned Image Compression

K. Tatwawadi, P. Rahimzadeh, Z. Sun, Z. Chen, Z. Yang, S. Nair, D. Hasteer, and O. Rippel, “What matters in practical learned image compression,” 2026. [Online]. Available: https://arxiv.org/abs/ 2605.05148

work page internal anchor Pith review Pith/arXiv arXiv 2026

[30] [30]

Spatial competition for low-complexity learned image compression,

T. Blard, P. Philippe, T. Ladune, X. Jiang, and O. D ´eforges, “Spatial competition for low-complexity learned image compression,” 2026, accepted to ICIP 2026. [Online]. Available: https://arxiv.org/abs/2605. 13243

2026

[31] [31]

Cool-chic 5.0: Faster Encoding and Inter-Feature Entropy Modeling for Overfitted Image Compression

T. Ladune, P. Philippe, P. Jaffuer, T. Blard, S. Kervadec, F. Henry, and G. Clare, “Cool-chic 5.0: Faster encoding and inter-feature entropy modeling for overfitted image compression,” 2026. [Online]. Available: https://arxiv.org/abs/2605.02726

work page internal anchor Pith review Pith/arXiv arXiv 2026

[32] [32]

The jpeg 2000 still im- age compression standard,

A. Skodras, C. Christopoulos, and T. Ebrahimi, “The jpeg 2000 still im- age compression standard,”IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36–58, 2001

2000

[33] [33]

D. S. Taubman and M. W. Marcellin,JPEG2000: Image Compression Fundamentals, Standards and Practice, ser. The Springer International Series in Engineering and Computer Science. New York, NY: Springer, 2002, vol. 642

2002

[34] [34]

Frequency disentangled features in neural image compression,

A. Zafari, A. Khoshkhahtinat, P. Mehta, M. S. Ebrahimi Saadabadi, M. Akyash, and N. M. Nasrabadi, “Frequency disentangled features in neural image compression,” in2023 IEEE International Conference on Image Processing (ICIP), 2023, pp. 2815–2819

2023

[35] [35]

Channel-wise feature decorrelation for enhanced learned image compression,

F. Pakdaman and M. Gabbouj, “Channel-wise feature decorrelation for enhanced learned image compression,” 2024. [Online]. Available: https://arxiv.org/abs/2403.10936

work page arXiv 2024

[36] [36]

On disentangled training for nonlinear transform in learned image compression,

H. Li, S. Li, W. Dai, M. Cao, N. Kan, C. Li, J. Zou, and H. Xiong, “On disentangled training for nonlinear transform in learned image compression,” inThe Thirteenth International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=U67J0QNtzo

2025

[37] [37]

Bi-level spatial and channel-aware transformer for learned image compression,

H. Soltani and E. Ghasemi, “Bi-level spatial and channel-aware transformer for learned image compression,” 2024. [Online]. Available: https://arxiv.org/abs/2408.03842

work page arXiv 2024

[38] [38]

Window-based channel attention for wavelet-enhanced learned image compression,

H. Xu, B. Hai, Y . Tang, and Z. He, “Window-based channel attention for wavelet-enhanced learned image compression,” 2024. [Online]. Available: https://arxiv.org/abs/2409.14090

work page arXiv 2024

[39] [39]

3dm-weconvene: Learned image compression with 3d multi-level wavelet-domain convolution and entropy model,

H. Fu, J. Liang, F. Liang, Z. Fang, G. Zhang, and J. Han, “3dm-weconvene: Learned image compression with 3d multi-level wavelet-domain convolution and entropy model,” 2025. [Online]. Available: https://arxiv.org/abs/2504.04658

work page arXiv 2025

[40] [40]

Entropy-based algorithms for best basis selection,

R. Coifman and M. Wickerhauser, “Entropy-based algorithms for best basis selection,”IEEE Transactions on Information Theory, vol. 38, no. 2, pp. 713–718, 1992

1992

[41] [41]

The lifting scheme: A custom-design construction of biorthogonal wavelets,

W. Sweldens, “The lifting scheme: A custom-design construction of biorthogonal wavelets,”Applied and Computational Harmonic Analysis, vol. 3, no. 2, pp. 186–200, 1996

1996

[42] [42]

The devil is in the details: Window- based attention for image compression,

R. Zou, C. Song, and Z. Zhang, “The devil is in the details: Window- based attention for image compression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 17 492–17 501

2022

[43] [43]

Openimages: A public dataset for large-scale multi-label and multi-class image classification

T. D. Ivan Krasin, N. Alldrin, A. Veit, S. Abu-El-Haija, S. Belongie, D. Cai, Z. Feng, V . Ferrari, V . Gomes, A. Gupta, D. Narayanan, C. Sun, G. Chechik, and K. Murphy., “Openimages: A public dataset for large-scale multi-label and multi-class image classification.” https: //github.com/openimages, 2016. [43]Kodak PhotoCD dataset, http://r0k.us/graphics/k...

2016

[44] [44]

TESTIMAGES: a Large-scale Archive for Testing Visual Devices and Basic Image Processing Algorithms

N. Asuni and A. Giachetti, “TESTIMAGES: a Large-scale Archive for Testing Visual Devices and Basic Image Processing Algorithms.” The Eurographics Association, 2014

2014

[45] [45]

2021 workshop and challenge on learned image compression (clic)

G. Toderici, R. Timofte, J. Balle, E. Agustsson, N. Johnston, and F. Mentzer, “2021 workshop and challenge on learned image compression (clic).” [Online]. Available: http://www.compression.cc

2021

[46] [46]

Multiscale structural similarity for image quality assessment,

Z. Wang, E. Simoncelli, and A. Bovik, “Multiscale structural similarity for image quality assessment,” inThe Thirty-Seventh Asilomar Confer- ence on Signals, Systems Computers, 2003, vol. 2, 2003, pp. 1398–1402

2003