pith. machine review for the scientific record. sign in

arxiv: 2512.00408 · v2 · submitted 2025-11-29 · 💻 cs.CV · cs.AI

Recognition: unknown

Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Authors on Pith no claims yet
classification 💻 cs.CV cs.AI
keywords videocodecssemanticbitratescompactcompressiondiffusionpixel
0
0 comments X
read the original abstract

Traditional video codecs optimized for pixel fidelity collapse at ultra-low bitrates and produce severe artifacts. This failure arises from a fundamental misalignment between pixel accuracy and human perception. We propose a semantic video compression framework named DiSCo that transmits only the most meaningful information while relying on generative priors for detail synthesis. The source video is decomposed into three compact modalities: a textual description, a spatiotemporally degraded video, and optional sketches or poses that respectively capture semantic, appearance, and motion cues. A conditional video diffusion model then reconstructs high-quality, temporally coherent videos from these compact representations. Temporal forward filling, token interleaving, and modality-specific codecs are proposed to improve multimodal generation and modality compactness. Experiments show that our method outperforms baseline semantic and traditional codecs by 2-10X on perceptual metrics at low bitrates.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. NeuralLVC: Neural Lossless Video Compression via Masked Diffusion with Temporal Conditioning

    eess.IV 2026-04 unverdicted novelty 7.0

    NeuralLVC achieves better lossless compression than H.264 and H.265 on video sequences by combining masked diffusion with temporal conditioning on frame differences.