Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors
read the original abstract
We present a novel paradigm for ultra-low-bitrate image compression (ULB-IC) that exploits the ``temporal'' evolution in generative image compression. Specifically, we define an explicit intermediate state during decoding: a compact anchor frame, which preserves the scene geometry and semantic layout while discarding high-frequency details. We then reinterpret generative decoding as a virtual temporal transition from this anchor to the final reconstructed image. To model this progression, we leverage a pretrained video diffusion model (VDM) as a temporal prior: the anchor frame serves as the initial frame and the original image as the target frame, transforming the decoding process into a next-frame prediction task. In contrast to image diffusion-based ULB-IC models, our decoding proceeds from a visible, semantically faithful anchor, which improves both fidelity and realism for perceptual image compression. Extensive experiments demonstrate that our method achieves superior rate-distortion performance. On the CLIC2020 test set, our method achieves over 50% bitrate savings across LPIPS, DISTS, FID, and KID compared to DiffC, while also delivering a significant decoding speedup of up to $\times$5. Code will be released at https://github.com/UnoC-727/NeFIC.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.