A Cross Channel Context Model for Latents in Deep Image Compression

Changyue Ma; Ruling Liao; Yan Ye; Zhao Wang

arxiv: 2103.02884 · v1 · pith:WPAXRRDSnew · submitted 2021-03-04 · 📡 eess.IV · cs.CV

A Cross Channel Context Model for Latents in Deep Image Compression

Changyue Ma , Zhao Wang , Ruling Liao , Yan Ye This is my paper

classification 📡 eess.IV cs.CV

keywords latentsmodelchannelcontextcrossimageentropycoding

0 comments

read the original abstract

This paper presents a cross channel context model for latents in deep image compression. Generally, deep image compression is based on an autoencoder framework, which transforms the original image to latents at the encoder and recovers the reconstructed image from the quantized latents at the decoder. The transform is usually combined with an entropy model, which estimates the probability distribution of the quantized latents for arithmetic coding. Currently, joint autoregressive and hierarchical prior entropy models are widely adopted to capture both the global contexts from the hyper latents and the local contexts from the quantized latent elements. For the local contexts, the widely adopted 2D mask convolution can only capture the spatial context. However, we observe that there are strong correlations between different channels in the latents. To utilize the cross channel correlations, we propose to divide the latents into several groups according to channel index and code the groups one by one, where previously coded groups are utilized to provide cross channel context for the current group. The proposed cross channel context model is combined with the joint autoregressive and hierarchical prior entropy model. Experimental results show that, using PSNR as the distortion metric, the combined model achieves BD-rate reductions of 6.30% and 6.31% over the baseline entropy model, and 2.50% and 2.20% over the latest video coding standard Versatile Video Coding (VVC) for the Kodak and CVPR CLIC2020 professional dataset, respectively. In addition, when optimized for the MS-SSIM metric, our approach generates visually more pleasant reconstructed images.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MambaRaw: Selective State Space Modeling for Efficient 4K Raw Image Reconstruction
cs.CV 2026-06 unverdicted novelty 6.0

MambaRaw uses SSM-based context modeling with TileMambaBlock and EAR modules for efficient JPEG-guided 4K raw reconstruction, reporting 1.2-1.4 dB PSNR gains and 9% lower latency over baselines on Sony, Olympus, and S...