What Matters in Practical Learned Image Compression

· 2026 · cs.CV · arXiv 2605.05148

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed. In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime -- including within the ablations several novel techniques. We then perform performance-aware neural architecture search over millions of backbone configurations to identify models that achieve the target on-device runtime while maximizing compression performance as captured by perceptual metrics. We combine the various optimizations to construct a new codec that achieves a significantly improved tradeoff between speed and perceptual quality. Based on rigorous subjective user studies, it provides 2.3-3x bitrate savings against AV1, AV2, VVC, ECM and JPEG-AI, and 20-40% bitrate savings against the best learned codec alternatives. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as 230ms, and decodes them in 150ms -- faster than most top ML-based codecs run on a V100 GPU.

representative citing papers

ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

eess.IV · 2026-05-27 · unverdicted · novelty 6.0

Channel-wise wavelet-domain transformer attention plus wavelet-packet entropy modeling yields BD-rate reductions of 17.8-22.6% on Kodak, CLIC, and Tecnick relative to prior LIC baselines.

KD-NVC: A Search-and-Distill Framework to Accelerate Neural Video Coding

eess.IV · 2026-06-03 · unverdicted · novelty 5.0

KD-NVC combines acceleration-efficiency neural architecture search with energy-aware feature distillation to produce neural video codecs that reach 69 FPS 1080p decoding on RTX 5060 while matching VTM-LDB rate-distortion performance.

citing papers explorer

Showing 2 of 2 citing papers after filters.

ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression eess.IV · 2026-05-27 · unverdicted · none · ref 29 · internal anchor
Channel-wise wavelet-domain transformer attention plus wavelet-packet entropy modeling yields BD-rate reductions of 17.8-22.6% on Kodak, CLIC, and Tecnick relative to prior LIC baselines.
KD-NVC: A Search-and-Distill Framework to Accelerate Neural Video Coding eess.IV · 2026-06-03 · unverdicted · none · ref 47 · internal anchor
KD-NVC combines acceleration-efficiency neural architecture search with energy-aware feature distillation to produce neural video codecs that reach 69 FPS 1080p decoding on RTX 5060 while matching VTM-LDB rate-distortion performance.

What Matters in Practical Learned Image Compression

fields

years

verdicts

representative citing papers

citing papers explorer