Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Lingdong Wang , Guan-Ming Su , Divya Kothandaraman , Tsung-Wei Huang , Mohammad Hajiesmaili , Ramesh K. Sitaraman

Authors on Pith no claims yet

classification 💻 cs.CV cs.AI

keywords videocodecssemanticbitratescompactcompressiondiffusionpixel

read the original abstract

Traditional video codecs optimized for pixel fidelity collapse at ultra-low bitrates and produce severe artifacts. This failure arises from a fundamental misalignment between pixel accuracy and human perception. We propose a semantic video compression framework named DiSCo that transmits only the most meaningful information while relying on generative priors for detail synthesis. The source video is decomposed into three compact modalities: a textual description, a spatiotemporally degraded video, and optional sketches or poses that respectively capture semantic, appearance, and motion cues. A conditional video diffusion model then reconstructs high-quality, temporally coherent videos from these compact representations. Temporal forward filling, token interleaving, and modality-specific codecs are proposed to improve multimodal generation and modality compactness. Experiments show that our method outperforms baseline semantic and traditional codecs by 2-10X on perceptual metrics at low bitrates.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

NeuralLVC: Neural Lossless Video Compression via Masked Diffusion with Temporal Conditioning
eess.IV 2026-04 unverdicted novelty 7.0

NeuralLVC achieves better lossless compression than H.264 and H.265 on video sequences by combining masked diffusion with temporal conditioning on frame differences.