pith. sign in

Symmetric Entropy-Constrained Video Coding for Machines

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

As video transmission increasingly serves machine vision systems (MVS) instead of human vision systems (HVS), video coding for machines (VCM) has become a critical research topic. Existing VCM methods often bind codecs to specific downstream models, requiring retraining or supervised data, thus limiting generalization in multi-task scenarios. Recently, unified VCM frameworks have employed visual backbones (VB) and visual foundation models (VFM) to support multiple video understanding tasks with a single codec. They mainly utilize VB/VFM to maintain semantic consistency or suppress non-semantic information, but seldom explore how to directly link video coding with understanding under VB/VFM guidance. Hence, we propose a Symmetric Entropy-Constrained Video Coding framework for Machines (SEC-VCM). It establishes a symmetric alignment between the video codec and VB, allowing the codec to leverage VB's representation capabilities to preserve semantics and discard MVS-irrelevant information. Specifically, a bi-directional entropy-constraint (BiEC) mechanism ensures symmetry between the process of video decoding and VB encoding by suppressing conditional entropy. This helps the codec to explicitly handle semantic information beneficial to MVS while squeezing useless information. Furthermore, a semantic-pixel dual-path fusion (SPDF) module injects pixel-level priors into the final reconstruction. Through semantic-pixel fusion, it suppresses artifacts harmful to MVS and improves machine-oriented reconstruction quality. Experimental results on classical video understanding tasks and MLLM-based tasks show SOTA rate-task performance. It achieves significant bitrate savings over H.266/VVC reference software VTM on video instance segmentation (37.4%), video object segmentation (29.8%), object detection (46.2%), multiple object tracking (44.9%), and MLLM-based video grounding (97.6%).

fields

eess.IV 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

KD-NVC: A Search-and-Distill Framework to Accelerate Neural Video Coding

eess.IV · 2026-06-03 · unverdicted · novelty 5.0

KD-NVC combines acceleration-efficiency neural architecture search with energy-aware feature distillation to produce neural video codecs that reach 69 FPS 1080p decoding on RTX 5060 while matching VTM-LDB rate-distortion performance.

citing papers explorer

Showing 1 of 1 citing paper.

  • KD-NVC: A Search-and-Distill Framework to Accelerate Neural Video Coding eess.IV · 2026-06-03 · unverdicted · none · ref 23 · internal anchor

    KD-NVC combines acceleration-efficiency neural architecture search with energy-aware feature distillation to produce neural video codecs that reach 69 FPS 1080p decoding on RTX 5060 while matching VTM-LDB rate-distortion performance.