pith. machine review for the scientific record. sign in

arxiv: 1802.01436 · v2 · submitted 2018-02-01 · 📡 eess.IV · cs.IT· math.IT

Recognition: unknown

Variational image compression with a scale hyperprior

Authors on Pith no claims yet
classification 📡 eess.IV cs.ITmath.IT
keywords compressionimagemodelhyperpriorautoencodermethodsvariationalwhen
0
0 comments X
read the original abstract

We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Generalizable 3D Gaussian Splatting enabled Semantic Coding for Real-Time Immersive Video Communications

    eess.IV 2026-04 unverdicted novelty 7.0

    GS-SCNet unifies 3D Gaussian Splatting with a disparity-guided semantic codec and direct Gaussian parameter prediction for efficient real-time 3D video communications with strong generalization.

  2. GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow

    cs.CV 2026-03 unverdicted novelty 7.0

    GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS r...

  3. "Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust Watermarking

    cs.CR 2026-05 unverdicted novelty 6.0

    W-IR is the first watermarking framework to combine certified robustness via randomized smoothing in pixel and coordinate spaces with identity leakage mitigation via residual information loss minimization.

  4. Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression

    cs.CV 2026-04 unverdicted novelty 6.0

    RDVQ enables joint rate-distortion optimization for vector-quantized generative image compression via differentiable codebook distribution relaxation and an autoregressive entropy model.

  5. ML-CLIPSim: Multi-Layer CLIP Similarity for Machine-Oriented Image Quality

    eess.IV 2026-05 unverdicted novelty 5.0

    ML-CLIPSim aggregates multi-layer patch and global similarities from frozen CLIP to approximate machine utility for images and outperforms standard IQA metrics on machine-preference tasks while staying competitive on ...

  6. Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction

    cs.DC 2026-05 unverdicted novelty 5.0

    A generative compression model using historical priors for Earth observation data achieves up to 10,000x reduction after exascale training on an Armv9 supercomputer.

  7. SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression

    cs.CV 2026-05 unverdicted novelty 5.0

    SAMIC introduces semantic-aware Mamba blocks and SVD-based redundancy reduction to achieve efficient perceptual image compression with improved rate-distortion-perception tradeoffs.

  8. PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines

    cs.CV 2026-04 unverdicted novelty 5.0

    PAT-VCM adds lightweight auxiliary tokens to a shared baseline video stream to support multiple downstream machine tasks without task-specific codecs.

  9. Autoencoder-Based CSI Compression for Beyond Wi-Fi 8 Coordinated Beamforming

    cs.NI 2026-04 conditional novelty 4.0

    Autoencoder CSI compression reduces channel sounding overhead by more than 50% versus standard IEEE 802.11 methods and improves throughput and latency in coordinated beamforming.