Recognition: unknown
Variational image compression with a scale hyperprior
read the original abstract
We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.
This paper has not been read by Pith yet.
Forward citations
Cited by 9 Pith papers
-
Generalizable 3D Gaussian Splatting enabled Semantic Coding for Real-Time Immersive Video Communications
GS-SCNet unifies 3D Gaussian Splatting with a disparity-guided semantic codec and direct Gaussian parameter prediction for efficient real-time 3D video communications with strong generalization.
-
GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow
GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS r...
-
"Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust Watermarking
W-IR is the first watermarking framework to combine certified robustness via randomized smoothing in pixel and coordinate spaces with identity leakage mitigation via residual information loss minimization.
-
Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
RDVQ enables joint rate-distortion optimization for vector-quantized generative image compression via differentiable codebook distribution relaxation and an autoregressive entropy model.
-
ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality
ML-CLIPSim aggregates multi-layer patch and global similarities from frozen CLIP to approximate machine utility for images and outperforms standard IQA metrics on machine-preference tasks while staying competitive on ...
-
Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction
A generative compression model using historical priors for Earth observation data achieves up to 10,000x reduction after exascale training on an Armv9 supercomputer.
-
SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression
SAMIC introduces semantic-aware Mamba blocks and SVD-based redundancy reduction to achieve efficient perceptual image compression with improved rate-distortion-perception tradeoffs.
-
PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines
PAT-VCM adds lightweight auxiliary tokens to a shared baseline video stream to support multiple downstream machine tasks without task-specific codecs.
-
Autoencoder-Based CSI Compression for Beyond Wi-Fi 8 Coordinated Beamforming
Autoencoder CSI compression reduces channel sounding overhead by more than 50% versus standard IEEE 802.11 methods and improves throughput and latency in coordinated beamforming.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.