Performance Analysis of Hardware-Accelerated 10-Bit 4:2:2 Encoding with Split-Frame Encoding for High-Fidelity V-PCC Streaming
Pith reviewed 2026-06-30 02:39 UTC · model grok-4.3
The pith
4-way split-frame encoding on Blackwell GPUs reaches 122 fps for 8K 10-bit 4:2:2 V-PCC, enabling real-time high-fidelity volumetric streaming on standard hardware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a Blackwell GPU equipped with four parallel on-die hardware encoders, 4-way split-frame encoding delivers 122 fps throughput for 8K 10-bit 4:2:2 HEVC while incurring a BD-rate penalty of up to 5 percent from the loss of spatial redundancies across slice boundaries; the resulting throughput and power figures establish standard commercial GPUs as a viable baseline for real-time high-density V-PCC.
What carries the argument
Split-Frame Encoding (SFE) that partitions each frame across four parallel on-die hardware encoders on the Blackwell architecture to parallelize 10-bit 4:2:2 HEVC encoding.
If this is right
- Real-time 8K V-PCC encoding at 120 fps becomes achievable on off-the-shelf GPUs without custom ASICs.
- Power efficiency measured on the testbed supports deployment in consumer or edge devices for live volumetric capture.
- The 5 percent BD-rate overhead sets a concrete baseline that future encoder or partitioning improvements can target.
- High-density point-cloud pipelines can now target standard GPU hardware rather than specialized chips.
Where Pith is reading between the lines
- The same split-frame approach may transfer to other GPU vendors once they add comparable 10-bit 4:2:2 hardware support.
- Wider availability could accelerate V-PCC use in consumer VR and telepresence where low-latency 3D video is required.
- Extending the test to multi-GPU or higher frame counts would reveal whether the four-encoder limit is the new bottleneck.
Load-bearing premise
The four parallel encoders on the tested Blackwell GPU represent typical production behavior and the observed rate penalty arises only from missing redundancies across slice boundaries.
What would settle it
Running the identical 4-way SFE configuration on the same Blackwell hardware and obtaining sustained throughput below 120 fps or a BD-rate penalty substantially above 5 percent would falsify the viability claim.
read the original abstract
Video-based Point Cloud Compression (V-PCC) encodes volumetric data by projecting 3D geometry and texture onto 2D video frames. To prevent spatial distortion and color bleeding during 3D reconstruction, this process requires 10-bit color depth and 4:2:2 chroma subsampling, rather than the standard 8-bit 4:2:0 format. Additionally, capturing high-density dynamic point clouds requires demanding encoding parameters, such as 8K resolution at framerates up to 120 fps. Historically, the lack of 4:2:2 chroma support in older GPU hardware encoders restricted real-time V-PCC to custom Application-Specific Integrated Circuits (ASICs). However, the recent introduction of NVIDIA's Blackwell GPU architecture, featuring on-chip hardware encoders with 10-bit 4:2:2 support, presents an opportunity to shift this workload to general-purpose hardware. This paper investigates the feasibility of such an approach. Using a commercially available Blackwell GPU equipped with four parallel on-die hardware encoders as a testbed, we evaluate the throughput, rate-distortion (RD) performance, and power consumption of 8K 10-bit 4:2:2 HEVC across various Split-Frame Encoding (SFE) configurations. Our results demonstrate that 4-way SFE achieves an encoding throughput of 122 fps, successfully meeting the strict real-time constraints of high-density V-PCC. Although the inability to exploit spatial redundancies across slice boundaries results in a BD-Rate penalty of up to 5%, the measured throughput and power efficiency establish standard, commercial off-the-shelf GPUs as a highly viable baseline for real-time volumetric video streaming.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates hardware-accelerated 10-bit 4:2:2 HEVC encoding for V-PCC on NVIDIA Blackwell GPUs using split-frame encoding (SFE). It reports that a 4-way SFE configuration on a testbed with four parallel on-die encoders achieves 122 fps throughput for 8K content at up to 120 fps, incurring a BD-Rate penalty of up to 5% attributed to lost cross-slice spatial redundancies, while demonstrating power efficiency that positions COTS GPUs as viable for real-time high-density volumetric streaming.
Significance. If the reported throughput, RD, and power measurements hold under broader conditions, the work supplies concrete empirical benchmarks showing that recent GPU hardware encoders can satisfy the strict real-time and fidelity constraints of V-PCC without custom ASICs. The explicit 122 fps figure and 5% penalty provide actionable reference points for the field.
major comments (3)
- [Experimental Setup] Experimental Setup (assumed section describing testbed): The viability conclusion rests on the four-encoder Blackwell configuration being representative of production COTS GPU performance, yet no additional GPU models, driver versions, or scaling analysis are provided to support generalization beyond this single testbed.
- [Results] Results section (discussion of BD-Rate): The claim that the entire up to 5% BD-Rate penalty stems solely from inability to exploit spatial redundancies across slice boundaries is not supported by ablation experiments isolating this factor from per-slice header overhead, memory-bandwidth contention across encoders, or driver scheduling effects.
- [Abstract / Methodology] Abstract and methodology description: No dataset descriptions, error bars, verification steps, or preprocessing pipeline details are supplied, preventing full evaluation of the 122 fps and BD-Rate figures.
minor comments (2)
- Notation for SFE configurations (e.g., 2-way vs. 4-way) should be defined explicitly on first use with a table or diagram for clarity.
- Power consumption figures would benefit from comparison against a baseline single-encoder run or CPU reference to strengthen the efficiency claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below with specific revisions where appropriate.
read point-by-point responses
-
Referee: [Experimental Setup] The viability conclusion rests on the four-encoder Blackwell configuration being representative of production COTS GPU performance, yet no additional GPU models, driver versions, or scaling analysis are provided to support generalization beyond this single testbed.
Authors: We agree that the results are specific to the Blackwell testbed and do not include data from other GPU models or drivers. The paper presents this configuration as the first COTS hardware with native 10-bit 4:2:2 support, establishing an empirical baseline rather than claiming broad generalization. We will revise the discussion and conclusion sections to explicitly state the single-testbed limitation and note the absence of cross-model scaling analysis. revision: partial
-
Referee: [Results] The claim that the entire up to 5% BD-Rate penalty stems solely from inability to exploit spatial redundancies across slice boundaries is not supported by ablation experiments isolating this factor from per-slice header overhead, memory-bandwidth contention across encoders, or driver scheduling effects.
Authors: The manuscript attributes the penalty primarily to lost cross-slice redundancies based on HEVC slice boundary behavior. We acknowledge that without isolating ablations, contributions from header overhead, bandwidth contention, or scheduling cannot be excluded. We will revise the results discussion to present the observed BD-Rate penalty without exclusive attribution and add a limitations paragraph noting these potential confounding factors. revision: yes
-
Referee: [Abstract / Methodology] No dataset descriptions, error bars, verification steps, or preprocessing pipeline details are supplied, preventing full evaluation of the 122 fps and BD-Rate figures.
Authors: We will expand the methodology section to provide complete dataset descriptions (including the specific point cloud sequences and V-PCC projection parameters), details of the preprocessing pipeline, verification steps for throughput and RD measurements, and error bars or standard deviations for the reported figures based on repeated runs. revision: yes
- Empirical results on additional GPU models, driver versions, or scaling analysis beyond the single Blackwell testbed, as no such hardware was available for the study.
Circularity Check
No circularity; purely empirical hardware measurements with no derivations or fitted predictions.
full rationale
The paper reports direct measurements of throughput (122 fps), BD-Rate penalty (~5%), and power on a specific Blackwell GPU testbed under various SFE configurations. No equations, parameter fits, predictions, or self-citations appear in the provided text; claims follow immediately from the experimental data without reduction to inputs by construction. The representativeness concern raised by the skeptic is a validity issue, not circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION V olumetric media formats, such as point clouds, require significant data compression for storage and transmission [1]. V-PCC addresses this by packing 3D spatial geometry and surface attributes into 2D frames, which are then compressed using conventional 2D video codecs such as HEVC, A V1, or VVC. In this paradigm, the structural accuracy of...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Encoding Throughput Benchmark To evaluate the upper bounds of the hardware video encoder on modern COTS GPUs, we utilized the NVIDIA RTX PRO 6000 Blackwell GPU as our testbed
EXPERIMENT SETUP 2.1. Encoding Throughput Benchmark To evaluate the upper bounds of the hardware video encoder on modern COTS GPUs, we utilized the NVIDIA RTX PRO 6000 Blackwell GPU as our testbed. This GPU was selected because it integrates four NVENC engines, the maximum count in the architecture, permitting a 4-Way SFE configura- tion. While the tests ...
2025
-
[3]
All sequences have a framerate of 59.94 fps and are 15 seconds in length
ITE Ultra-High Definition Standard Test Sequences (Series A) [18]:As described in the official documenta- tion, this dataset includes 10 sequences at 4K resolution and 11 sequences at 8K resolution. All sequences have a framerate of 59.94 fps and are 15 seconds in length
-
[4]
Netflix Sol Levante [19]:A 4K (3840×2160) animation sequence at 24 fps with a running time of 4:32 minutes
-
[5]
Netflix Meridian [19]:A 4K (3840×2160) live-action se- quence at 59.94 fps with a running time of 11:58 minutes
-
[6]
Netflix Nocturne [19]:A 4K (3840×2160) High-Frame- Rate (HFR) sequence at 120 fps with a running time of 11:04 minutes
-
[7]
Netflix Chimera [19]:A DCI 4K (4096×2160) sequence at 59.94 fps consisting of 23 distinct scenes with a total runtime of 30:49 minutes. 2.1.2. Mezzanine File Preparation The source content was originally provided in uncompressed RAW formats (TIFF, EXR, DPX). To facilitate efficient large- scale testing and eliminate disk I/O bottlenecks during the encodin...
2084
-
[8]
Encoding Throughput Table 3 presents the detailed encoding throughput results
RESULTS AND ANALYSIS 3.1. Encoding Throughput Table 3 presents the detailed encoding throughput results. As mentioned earlier, for this analysis, we focus primarily on theHigh Quality (HQ)tuning, as it represents the optimal option for professional broadcasting environments. From the results, it was found that by enabling 4-way SFE, a com- mercial GPGPU c...
-
[9]
CONCLUSION AND FUTURE WORK This paper presented a comprehensive evaluation of hardware- accelerated 10-bit 4:2:2 video encoding on commercial off- the-shelf (COTS) GPUs, targeting the strict computational demands of real-time V-PCC and next-generation volumetric video streaming. Historically restricted to specialized ASICs, this high-fidelity encoding pro...
-
[10]
Transcoding v-pcc point cloud streams in real-time,
M. Rudolph, S. Schneegass, and A. Rizk, “Transcoding v-pcc point cloud streams in real-time,”ACM Trans. Multimedia Comput. Commun. Appl., vol. 21, no. 9, Sep. 2025. [Online]. Available: https://doi.org/10.1145/ 3682062 1
2025
-
[11]
Upsampling algorithm for v-pcc-coded 3d point clouds,
T.-L. Lin, B.-W. Su, P.-C. Shen, D.-Y . Chen, C.- F. Liang, Y .-C. Chen, Y . Wen, and M. Shahid, “Upsampling algorithm for v-pcc-coded 3d point clouds,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 20, no. 12, Nov. 2024. [Online]. Available: https://doi.org/10.1145/3690641 1
-
[12]
Evaluation of 2d video interpolation and extrapolation methods for real-time v-pcc error concealment,
E. Nakajima, F. Lin, K. Arunruangsirilert, and J. Katto, “Evaluation of 2d video interpolation and extrapolation methods for real-time v-pcc error concealment,” in2025 International Conference on Visual Communications and Image Processing (VCIP), 2025, pp. 1–5. 1
2025
-
[13]
Learning to restore compressed point cloud attribute: A fully data- driven approach and a rules-unrolling-based optimiza- tion,
J. Zhang, J. Zhang, D. Ding, and Z. Ma, “Learning to restore compressed point cloud attribute: A fully data- driven approach and a rules-unrolling-based optimiza- tion,”IEEE Trans. Vis. Comput. Graph., vol. 31, no. 4, pp. 1985–1998, 2025. 1
1985
-
[14]
A versatile point cloud compressor using universal multi- scale conditional coding – part ii: Attribute,
J. Wang, R. Xue, J. Li, D. Ding, Y . Lin, and Z. Ma, “A versatile point cloud compressor using universal multi- scale conditional coding – part ii: Attribute,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 47, no. 1, pp. 252–268, 2025. 1
2025
-
[15]
Basics: Broad quality assessment of static point clouds in a compression sce- nario,
A. Ak, E. Zerman, M. Quach, A. Chetouani, A. Smolic, G. Valenzise, and P. Le Callet, “Basics: Broad quality assessment of static point clouds in a compression sce- nario,”IEEE Transactions on Multimedia, vol. 26, pp. 6730–6742, 2024. 1
2024
-
[16]
A single-chip 4k 60-fps 4:2:2 hevc video encoder lsi em- ploying efficient motion estimation and mode decision framework with scalability to 8k,
T. Onishi, T. Sano, Y . Nishida, K. Yokohari, K. Naka- mura, K. Nitta, K. Kawashima, J. Okamoto, N. Ono, A. Sagata, H. Iwasaki, M. Ikeda, and A. Shimizu, “A single-chip 4k 60-fps 4:2:2 hevc video encoder lsi em- ploying efficient motion estimation and mode decision framework with scalability to 8k,”IEEE Transactions on Very Large Scale Integration (VLSI) ...
1930
-
[17]
A real-time 4k hevc multi- channel encoding system with content-aware bitrate con- trol,
D. Kobayashi, K. Nakamura, T. Osawa, Y . Omori, T. Onishi, and H. Iwasaki, “A real-time 4k hevc multi- channel encoding system with content-aware bitrate con- trol,” in2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1–6. 2
2019
-
[18]
An 8k@120fps advanced entropy coding hardware design for avs3,
W. Li, L. Huang, C. He, M. Jing, W. Hu, and Y . Fan, “An 8k@120fps advanced entropy coding hardware design for avs3,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 8, pp. 8372–8376, 2025. 2
2025
-
[19]
[Online]
NVIDIA,NVIDIA RTX BLACKWELL GPU ARCHI- TECTURE Built for Neural Rendering ii NVIDIA RTX Blackwell GPU Architecture, Mar 2024. [Online]. Available: https://images.nvidia.com/aem- dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell- gpu-architecture.pdf 2
2024
-
[20]
[Online]
——,NVIDIA RTX PRO BLACKWELL GPU ARCHITECTURE Built for Neural Rendering ii NVIDIA RTX B lackwell GP U Architec- ture, Mar 2024. [Online]. Available: https: //www.nvidia.com/content/dam/en-zz/Solutions/design- visualization/quadro-product-literature/NVIDIA-RTX- Blackwell-PRO-GPU-Architecture-v1.0.pdf 2
2024
-
[21]
The required video bitrate for 8k120-hz real-time temporal scalable coding,
S. Iwasaki, X. Lei, K. Chida, Y . Sugito, K. Iguchi, K. Kanda, H. Miyoshi, and Y . Uehara, “The required video bitrate for 8k120-hz real-time temporal scalable coding,” in2020 IEEE International Conference on Con- sumer Electronics (ICCE), 2020, pp. 1–5. 2
2020
-
[22]
Evaluation of nvenc split-frame encoding (sfe) for uhd video transcoding,
K. Arunruangsirilert and J. Katto, “Evaluation of nvenc split-frame encoding (sfe) for uhd video transcoding,” in 2025 Picture Coding Symposium (PCS), 2025, pp. 1–5. 2, 3, 5
2025
-
[23]
Video encoding at 8k60 with split-frame encoding and nvidia ada lovelace architecture,
NVIDIA, “Video encoding at 8k60 with split-frame encoding and nvidia ada lovelace architecture,” Jan 2024. [Online]. Available: https://developer.nvidia.com/blog/video-encoding- at-8k60-with-split-frame-encoding-and-nvidia-ada- lovelace-architecture/ 2
2024
-
[24]
Evaluation of hardware-based video encoders on modern gpus for uhd live-streaming,
K. Arunruangsirilert and J. Katto, “Evaluation of hardware-based video encoders on modern gpus for uhd live-streaming,” in2024 33rd International Conference on Computer Communications and Networks (ICCCN), 2024, pp. 1–9. 2
2024
-
[25]
Evaluation of gpu video encoder for low-latency real-time 4k uhd encoding,
——, “Evaluation of gpu video encoder for low-latency real-time 4k uhd encoding,” in2025 International Con- ference on Visual Communications and Image Processing (VCIP), 2025, pp. 1–5. 2
2025
-
[26]
——, “Evolution of nvenc efficiency: A longitudinal analysis of hq and uhq tuning efficiency, latency and energy trade-offs,” 2026. [Online]. Available: https://arxiv.org/abs/2605.01187 2
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[27]
Ultra-high definition/wide-color-gamut standard test sequences – series a,
T. I. of Image Information and T. Engineers, “Ultra-high definition/wide-color-gamut standard test sequences – series a,” Jan 2016. [Online]. Available: https: //www.ite.or.jp/content/test-materials/uhdtv a/ 3
2016
-
[28]
Netflix open content
I. Netflix, “Netflix open content.” [Online]. Available: https://opencontent.netflix.com/ 3
-
[29]
avcodec/nvenc: Add 4-way multi nvenc split frame encoding (sdk 13.0) in hevc and av1 for rtx pro 6000 blackwell #21371,
FFmpeg, “avcodec/nvenc: Add 4-way multi nvenc split frame encoding (sdk 13.0) in hevc and av1 for rtx pro 6000 blackwell #21371,” Jan 2026. [Online]. Available: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21371 3
2026
-
[30]
Twitch help portal
Twitch, “Twitch help portal.” [Online]. Avail- able: https://help.twitch.tv/s/article/broadcasting- guidelines?language=en US 3
-
[31]
Choose live encoder settings, bitrates, and resolutions - youtube help,
Google, “Choose live encoder settings, bitrates, and resolutions - youtube help,” 2019. [Online]. Available: https://support.google.com/youtube/answer/2853702 3
-
[32]
N. Barman, M. G. Martini, and Y . Reznik, “Bjøntegaard delta (bd): A tutorial overview of the metric, evolution, challenges, and recommendations,” 2024. [Online]. Available: https://arxiv.org/abs/2401.04039 3
-
[33]
Corner case is giving wrong vmaf and psnr values!
waveletbeam, “Corner case is giving wrong vmaf and psnr values!” Oct 2019. [Online]. Available: https://github.com/Netflix/vmaf/issues/371 3
2019
-
[34]
Toward a better quality metric for the video community,
Z. Li, K. Swanson, C. Bampis, L. Krasula, and A. Aaron, “Toward a better quality metric for the video community,” Dec 2020. [Online]. Available: https://netflixtechblog.com/toward-a-better-quality- metric-for-the-video-community-7ed94e752a30 3
2020
-
[35]
K. Arunruangsirilert and J. Katto, “Sustainable real-time 8k60 hevc encoding for v2x: Repurposing legacy nvenc hardware at the vehicular edge,” 2026. [Online]. Available: https://arxiv.org/abs/2605.16738 4 A. ENCODING THROUGHPUT BENCHMARK SCRIPT The batch scripts included below are for benchmarking encoding throughput at 8K UHD resolution. For 4K UHD, use...
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.