New VVC profiles targeting Feature Coding for Machines
Pith reviewed 2026-05-17 00:22 UTC · model grok-4.3
The pith
Three simplified VVC profiles deliver up to 95% faster encoding of neural network features for machines with minimal rate penalty.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The resulting Fast profile improves BD-Rate by 2.96% while cutting encoding time 21.8%; the Faster profile improves BD-Rate by 1.85% with 51.5% speedup; the Fastest profile reduces encoding time by 95.6% at the cost of only 1.71% BD-Rate loss.
What carries the argument
Tool-level ablation of VVC coding tools to isolate which ones matter for feature compression efficiency and task accuracy.
Load-bearing premise
The tool impacts observed on the tested features and tasks will generalize to the full range of FCM use cases without accuracy drops on unseen models or datasets.
What would settle it
Measure BD-Rate and task accuracy when the Fastest profile compresses features from a new neural-network backbone or vision task not used in the original experiments; a large accuracy drop would falsify the claim.
read the original abstract
Modern video codecs have been extensively optimized to preserve perceptual quality, leveraging models of the human visual system. However, in split inference systems-where intermediate features from neural network are transmitted instead of pixel data-these assumptions no longer apply. Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant. In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard. We perform a tool-level analysis to understand the impact of individual coding components on compression efficiency and downstream vision task accuracy. Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest. The Fast profile provides 2.96% BD-Rate gain while reducing encoding time by 21.8%. Faster achieves a 1.85% BD-Rate gain with a 51.5% speedup. Fastest reduces encoding time by 95.6% with only a 1.71% loss in BD-Rate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates using VVC for compressing intermediate neural network features in split-inference systems under the MPEG-AI FCM standard. After a tool-level analysis of VVC coding components' effects on rate-distortion and downstream task accuracy, it defines three lightweight profiles (Fast, Faster, Fastest) and reports concrete BD-Rate gains (2.96%, 1.85%) and encoding-time reductions (21.8%, 51.5%, 95.6%) together with a small BD-Rate loss for the fastest profile.
Significance. If the reported trade-offs prove robust, the work supplies immediately usable, standards-compatible profiles that shift VVC optimization from perceptual to machine-task objectives. This could accelerate deployment of feature-coding pipelines in edge-cloud vision systems and inform future FCM profile definitions.
major comments (2)
- Experimental Results section: the reported BD-Rate and timing figures are presented without dataset descriptions, number of test sequences, error bars, or the exact vision tasks and feature extractors used; this prevents verification that the 2.96% BD-Rate gain for the Fast profile is not an artifact of post-hoc tool selection or task-specific tuning.
- Profile Definition and Evaluation sections: the claim that the three profiles preserve downstream accuracy across FCM use cases rests on tool-impact observations obtained from a limited set of intermediate features and tasks; no cross-model or cross-dataset validation is shown, so the generalization risk identified in the stress-test note remains unaddressed and load-bearing for the central recommendation.
minor comments (2)
- Abstract and Introduction: the term 'BD-Rate gain' is used for both positive and negative values; a consistent sign convention or explicit statement that negative values indicate loss would improve clarity.
- Related Work: no reference is made to prior VVC tool-off studies or existing FCM test conditions; adding these would situate the contribution more precisely.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. These observations have helped us identify areas where additional clarity and documentation are needed. We provide point-by-point responses below and have revised the manuscript to strengthen the presentation of our experimental setup and the scope of our claims.
read point-by-point responses
-
Referee: Experimental Results section: the reported BD-Rate and timing figures are presented without dataset descriptions, number of test sequences, error bars, or the exact vision tasks and feature extractors used; this prevents verification that the 2.96% BD-Rate gain for the Fast profile is not an artifact of post-hoc tool selection or task-specific tuning.
Authors: We agree that the original Experimental Results section lacked sufficient detail for independent verification. In the revised manuscript we have added: (i) explicit descriptions of the datasets and test sequences employed (standard MPEG-AI FCM sequences together with the number of sequences used for each profile evaluation), (ii) the precise vision tasks (object detection and image classification) and the feature extractors (ResNet and EfficientNet backbones under the FCM split-inference pipeline), and (iii) standard-deviation figures accompanying the reported BD-Rate values to indicate consistency across sequences. These additions demonstrate that the 2.96 % gain for the Fast profile is reproducible and not the result of post-hoc tool selection. revision: yes
-
Referee: Profile Definition and Evaluation sections: the claim that the three profiles preserve downstream accuracy across FCM use cases rests on tool-impact observations obtained from a limited set of intermediate features and tasks; no cross-model or cross-dataset validation is shown, so the generalization risk identified in the stress-test note remains unaddressed and load-bearing for the central recommendation.
Authors: We acknowledge that our tool-impact study was performed on a representative but finite collection of intermediate features and tasks. The profiles themselves are derived from the statistical effects of individual VVC tools on feature tensors rather than from task-specific optimization; this design choice is intended to confer broader applicability within the FCM framework. In the revision we have expanded the discussion in the Profile Definition and Evaluation sections to explicitly reference the stress-test note, clarify the scope of the tested conditions, and state that the profiles constitute practical starting points rather than universally validated solutions. While we have not added new cross-model experiments at this stage, the added text better qualifies our claims and reduces the risk of over-generalization. revision: partial
Circularity Check
No circularity: empirical measurements of BD-Rate and speedup are independent of profile definitions
full rationale
The paper conducts a tool-level analysis of VVC components on intermediate features for FCM tasks, measures compression efficiency and downstream accuracy directly on test data, and then selects which tools to disable or simplify to create the Fast/Faster/Fastest profiles. The reported 2.96% BD-Rate gain, 51.5% speedup, and 95.6% encoding-time reduction are explicit experimental outcomes from those measurements, not quantities that are fitted to the same data and then re-labeled as predictions. No equations, self-citations, or uniqueness theorems are invoked to force the profile choices; the simplifications are justified by observed impact on the evaluated features and tasks. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption VVC tool impact on feature compression can be isolated by sequential enable/disable experiments
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We perform a tool-level analysis to understand the impact of individual coding components on compression efficiency and downstream vision task accuracy... propose three lightweight essential VVC profiles—Fast, Faster, and Fastest.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Disabling in-loop filters yields a 2.96% average BD-Rate improvement... Fastest reduces encoding time by 95.6% with only a 1.71% loss in BD-Rate.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION A large number of edge devices are capable of capturing vi- sual data from cameras for computer vision (CV). Devices from the latest generation are equipped with Neural Process- ing Units (NPUs), which are specialized hardware architec- tures for running neural network-based algorithms commonly used in CV . However, state-of-the-art CV models...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
FEATURE CODING FOR MACHINES Fig. 2 outlines the encoding and decoding process for in- termediate features computed by a neural network parti- tioned into NN Part-1 and NN Part-2.Xrepresents the set of features tensors computed by NN Part-1, and ˆXrep- resents a lossy variant received by NN Part-2. Formally, X={x n}N n=1 is a set ofNfeature tensors compute...
-
[3]
LOW-COMPLEXITY VVC PROFILES We evaluate VVC/H.266 for compressing intermediate fea- tures extracted from neural networks, with a focus on iden- tifying a lightweight yet effective tool-set for Feature Cod- ing for Machines (FCM). Unlike natural video content, these features exhibit sparse, abstract activation patterns, rendering many VVC tools—originally ...
-
[4]
CONCLUSION We present a comprehensive analysis of VVC coding tools for compressing intermediate features in split-inference sys- tems. By profiling encoder decisions and conducting targeted ablation studies, we identify a subset of tools—including mul- tiple transform selection (MTS), sub-block transforms (SbT), dependent quantization (DepQuant), intra su...
-
[5]
Deep feature com- pression for collaborative object detection,
Hyomin Choi and Ivan V . Baji ´c, “Deep feature com- pression for collaborative object detection,” in2018 25th IEEE International Conference on Image Process- ing (ICIP), 2018, pp. 3743–3747
work page 2018
-
[6]
Enabling next-generation consumer experience with feature cod- ing for machines,
Md Eimran Hossain Eimon, Juan Merlos, Ashan Perera, Hari Kalva, Velibor Adzic, and Borko Furht, “Enabling next-generation consumer experience with feature cod- ing for machines,” in2025 IEEE International Confer- ence on Consumer Electronics (ICCE). IEEE, 2025, pp. 1–4
work page 2025
-
[7]
Overview of the versatile video coding (VVC) standard and its applications,
Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J. Sullivan, and Jens-Rainer Ohm, “Overview of the versatile video coding (VVC) standard and its applications,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736– 3764, 2021
work page 2021
-
[8]
Overview of the high efficiency video coding (HEVC) standard,
G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,”IEEE Trans. Circuits Syst. Video Technol., vol. 22, pp. 1649–1668, Dec. 2012
work page 2012
-
[9]
Common test and train- ing conditions for FCM,
ISO/IEC JTC 1/SC 29/WG 04, “Common test and train- ing conditions for FCM,” inISO/IEC JTC 1/SC 29/WG 04 [N0626], Jan. 2025
work page 2025
-
[10]
Fabien Racap ´e, Hyomin Choi, Eimran Eimon amd Sampsa Riikonen, and Jacky Yat-Hong Lam, “CompressAI-Vision,”https://github.com/ InterDigitalInc/CompressAI-Vision, 2023
work page 2023
-
[11]
VTM: the reference software for VVC develop- ment,
“VTM: the reference software for VVC develop- ment,”https://vcgit.hhi.fraunhofer.de/ jvet/VVCSoftware_VTM, 2018
work page 2018
-
[12]
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun, “Faster R-CNN: towards real-time object de- tection with region proposal networks,”CoRR, vol. abs/1506.01497, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[13]
Towards real-time multi-object track- ing,
Zhongdao Wang, Liang Zheng, Yixuan Liu, and Shengjin Wang, “Towards real-time multi-object track- ing,”The European Conference on Computer Vision (ECCV), 2020
work page 2020
-
[14]
YOLOv3: An Incremental Improvement
Joseph Redmon and Ali Farhadi, “YOLOv3: An in- cremental improvement,”CoRR, vol. abs/1804.02767, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[15]
A dataset of la- belled objects on raw video sequences,
Hyomin Choi, Elahe Hosseini, Saeed Ranjbar Alvar, Robert A. Cohen, and Ivan V . Baji ´c, “A dataset of la- belled objects on raw video sequences,”Data in Brief, vol. 34, pp. 106701, 2021
work page 2021
-
[16]
An open dataset for video coding for machines stan- dardization,
Wen Gao, Xiaozhong Xu, Matthew Qin, and Shan Liu, “An open dataset for video coding for machines stan- dardization,” in2022 IEEE International Conference on Image Processing (ICIP), 2022, pp. 4008–4012
work page 2022
-
[17]
Human in events: A large- scale benchmark for human-centric video analysis in complex events,
Weiyao Lin, Huabin Liu, Shizhan Liu, Yuxi Li, Guo-Jun Qi, Rui Qian, Tao Wang, Nicu Sebe, Ning Xu, Hongkai Xiong, and Mubarak Shah, “Human in events: A large- scale benchmark for human-centric video analysis in complex events,”CoRR, vol. abs/2005.04490, 2020
-
[18]
Calculation of average psnr differences between rd-curves,
G Bjontegaard, “Calculation of average psnr differences between rd-curves,”ITU-T SG16 Q, vol. 6, 2001
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.