Variational image compression with a scale hyperprior

Johannes Ball\'e , David Minnen , Saurabh Singh , Sung Jin Hwang , Nick Johnston

Authors on Pith no claims yet

classification 📡 eess.IV cs.ITmath.IT

keywords compressionimagemodelhyperpriorautoencodermethodsvariationalwhen

read the original abstract

We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generalizable 3D Gaussian Splatting enabled Semantic Coding for Real-Time Immersive Video Communications
eess.IV 2026-04 unverdicted novelty 7.0

GS-SCNet unifies 3D Gaussian Splatting with a disparity-guided semantic codec and direct Gaussian parameter prediction for efficient real-time 3D video communications with strong generalization.
GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow
cs.CV 2026-03 unverdicted novelty 7.0

GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS r...
"Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust Watermarking
cs.CR 2026-05 unverdicted novelty 6.0

W-IR is the first watermarking framework to combine certified robustness via randomized smoothing in pixel and coordinate spaces with identity leakage mitigation via residual information loss minimization.
Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
cs.CV 2026-04 unverdicted novelty 6.0

RDVQ enables joint rate-distortion optimization for vector-quantized generative image compression via differentiable codebook distribution relaxation and an autoregressive entropy model.
ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality
eess.IV 2026-05 unverdicted novelty 5.0

ML-CLIPSim aggregates multi-layer patch and global similarities from frozen CLIP to approximate machine utility for images and outperforms standard IQA metrics on machine-preference tasks while staying competitive on ...
Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction
cs.DC 2026-05 unverdicted novelty 5.0

A generative compression model using historical priors for Earth observation data achieves up to 10,000x reduction after exascale training on an Armv9 supercomputer.
SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression
cs.CV 2026-05 unverdicted novelty 5.0

SAMIC introduces semantic-aware Mamba blocks and SVD-based redundancy reduction to achieve efficient perceptual image compression with improved rate-distortion-perception tradeoffs.
PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines
cs.CV 2026-04 unverdicted novelty 5.0

PAT-VCM adds lightweight auxiliary tokens to a shared baseline video stream to support multiple downstream machine tasks without task-specific codecs.
Autoencoder-Based CSI Compression for Beyond Wi-Fi 8 Coordinated Beamforming
cs.NI 2026-04 conditional novelty 4.0

Autoencoder CSI compression reduces channel sounding overhead by more than 50% versus standard IEEE 802.11 methods and improves throughput and latency in coordinated beamforming.