Performance Analysis of Hardware-Accelerated 10-Bit 4:2:2 Encoding with Split-Frame Encoding for High-Fidelity V-PCC Streaming

Jiro Katto; Kasidis Arunruangsirilert

arxiv: 2606.29179 · v1 · pith:W5X7ICWHnew · submitted 2026-06-28 · 📡 eess.IV · cs.AR· cs.MM

Performance Analysis of Hardware-Accelerated 10-Bit 4:2:2 Encoding with Split-Frame Encoding for High-Fidelity V-PCC Streaming

Kasidis Arunruangsirilert , Jiro Katto This is my paper

Pith reviewed 2026-06-30 02:39 UTC · model grok-4.3

classification 📡 eess.IV cs.ARcs.MM

keywords V-PCCHEVChardware encodingSplit-Frame EncodingBlackwell GPUvolumetric videoreal-time encoding10-bit 4:2:2

0 comments

The pith

4-way split-frame encoding on Blackwell GPUs reaches 122 fps for 8K 10-bit 4:2:2 V-PCC, enabling real-time high-fidelity volumetric streaming on standard hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether commercial Blackwell GPUs, with their new on-die support for 10-bit 4:2:2 HEVC, can replace custom ASICs for the strict requirements of high-density V-PCC. By testing split-frame encoding across the four parallel hardware encoders, it measures throughput, rate-distortion cost, and power use at 8K resolution and 120 fps target. A sympathetic reader would care because this removes a long-standing hardware barrier and opens consumer-grade GPUs for live volumetric capture and streaming. If the measured 122 fps holds, real-time V-PCC becomes practical without specialized silicon.

Core claim

Using a Blackwell GPU equipped with four parallel on-die hardware encoders, 4-way split-frame encoding delivers 122 fps throughput for 8K 10-bit 4:2:2 HEVC while incurring a BD-rate penalty of up to 5 percent from the loss of spatial redundancies across slice boundaries; the resulting throughput and power figures establish standard commercial GPUs as a viable baseline for real-time high-density V-PCC.

What carries the argument

Split-Frame Encoding (SFE) that partitions each frame across four parallel on-die hardware encoders on the Blackwell architecture to parallelize 10-bit 4:2:2 HEVC encoding.

If this is right

Real-time 8K V-PCC encoding at 120 fps becomes achievable on off-the-shelf GPUs without custom ASICs.
Power efficiency measured on the testbed supports deployment in consumer or edge devices for live volumetric capture.
The 5 percent BD-rate overhead sets a concrete baseline that future encoder or partitioning improvements can target.
High-density point-cloud pipelines can now target standard GPU hardware rather than specialized chips.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same split-frame approach may transfer to other GPU vendors once they add comparable 10-bit 4:2:2 hardware support.
Wider availability could accelerate V-PCC use in consumer VR and telepresence where low-latency 3D video is required.
Extending the test to multi-GPU or higher frame counts would reveal whether the four-encoder limit is the new bottleneck.

Load-bearing premise

The four parallel encoders on the tested Blackwell GPU represent typical production behavior and the observed rate penalty arises only from missing redundancies across slice boundaries.

What would settle it

Running the identical 4-way SFE configuration on the same Blackwell hardware and obtaining sustained throughput below 120 fps or a BD-rate penalty substantially above 5 percent would falsify the viability claim.

read the original abstract

Video-based Point Cloud Compression (V-PCC) encodes volumetric data by projecting 3D geometry and texture onto 2D video frames. To prevent spatial distortion and color bleeding during 3D reconstruction, this process requires 10-bit color depth and 4:2:2 chroma subsampling, rather than the standard 8-bit 4:2:0 format. Additionally, capturing high-density dynamic point clouds requires demanding encoding parameters, such as 8K resolution at framerates up to 120 fps. Historically, the lack of 4:2:2 chroma support in older GPU hardware encoders restricted real-time V-PCC to custom Application-Specific Integrated Circuits (ASICs). However, the recent introduction of NVIDIA's Blackwell GPU architecture, featuring on-chip hardware encoders with 10-bit 4:2:2 support, presents an opportunity to shift this workload to general-purpose hardware. This paper investigates the feasibility of such an approach. Using a commercially available Blackwell GPU equipped with four parallel on-die hardware encoders as a testbed, we evaluate the throughput, rate-distortion (RD) performance, and power consumption of 8K 10-bit 4:2:2 HEVC across various Split-Frame Encoding (SFE) configurations. Our results demonstrate that 4-way SFE achieves an encoding throughput of 122 fps, successfully meeting the strict real-time constraints of high-density V-PCC. Although the inability to exploit spatial redundancies across slice boundaries results in a BD-Rate penalty of up to 5%, the measured throughput and power efficiency establish standard, commercial off-the-shelf GPUs as a highly viable baseline for real-time volumetric video streaming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Blackwell testbed shows 122 fps V-PCC encoding with 4-way SFE but BD-rate penalty needs better isolation from other factors.

read the letter

The one or two things to know are that this paper measures 122 fps with 4-way split-frame encoding on a Blackwell GPU for 8K 10-bit 4:2:2 HEVC in V-PCC, and that this comes with a BD-rate penalty of up to 5 percent.

What is actually new is the use of the recently introduced 10-bit 4:2:2 support in Blackwell's on-die encoders for V-PCC workloads, along with the specific performance numbers for split-frame configurations. The paper does well by testing multiple SFE setups and reporting not just speed but also rate-distortion performance and power consumption on real hardware. This gives a concrete picture of what commodity GPUs can do now that the format is supported.

The soft spots are in the analysis of the BD-rate penalty and the generalizability of the results. The paper links the penalty to the inability to exploit spatial redundancies across slice boundaries, but it does not provide ablations to separate this from possible contributions like increased header overhead or memory bandwidth issues when running four encoders in parallel. The results are from one commercially available Blackwell GPU, so it is not clear if the 122 fps figure would hold on other cards or in full production pipelines that include additional V-PCC preprocessing steps. The abstract mentions no error bars, dataset details, or verification methods, which makes the central claim harder to evaluate fully.

This paper is for people in the volumetric video and hardware acceleration community who are interested in moving away from custom ASICs for real-time high-density V-PCC. A reader looking for baseline numbers on new GPU encoders will get value from the throughput and efficiency data. It has enough empirical grounding to deserve a serious referee, even if the viability conclusion could use more supporting controls.

I recommend sending it to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript evaluates hardware-accelerated 10-bit 4:2:2 HEVC encoding for V-PCC on NVIDIA Blackwell GPUs using split-frame encoding (SFE). It reports that a 4-way SFE configuration on a testbed with four parallel on-die encoders achieves 122 fps throughput for 8K content at up to 120 fps, incurring a BD-Rate penalty of up to 5% attributed to lost cross-slice spatial redundancies, while demonstrating power efficiency that positions COTS GPUs as viable for real-time high-density volumetric streaming.

Significance. If the reported throughput, RD, and power measurements hold under broader conditions, the work supplies concrete empirical benchmarks showing that recent GPU hardware encoders can satisfy the strict real-time and fidelity constraints of V-PCC without custom ASICs. The explicit 122 fps figure and 5% penalty provide actionable reference points for the field.

major comments (3)

[Experimental Setup] Experimental Setup (assumed section describing testbed): The viability conclusion rests on the four-encoder Blackwell configuration being representative of production COTS GPU performance, yet no additional GPU models, driver versions, or scaling analysis are provided to support generalization beyond this single testbed.
[Results] Results section (discussion of BD-Rate): The claim that the entire up to 5% BD-Rate penalty stems solely from inability to exploit spatial redundancies across slice boundaries is not supported by ablation experiments isolating this factor from per-slice header overhead, memory-bandwidth contention across encoders, or driver scheduling effects.
[Abstract / Methodology] Abstract and methodology description: No dataset descriptions, error bars, verification steps, or preprocessing pipeline details are supplied, preventing full evaluation of the 122 fps and BD-Rate figures.

minor comments (2)

Notation for SFE configurations (e.g., 2-way vs. 4-way) should be defined explicitly on first use with a table or diagram for clarity.
Power consumption figures would benefit from comparison against a baseline single-encoder run or CPU reference to strengthen the efficiency claim.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with specific revisions where appropriate.

read point-by-point responses

Referee: [Experimental Setup] The viability conclusion rests on the four-encoder Blackwell configuration being representative of production COTS GPU performance, yet no additional GPU models, driver versions, or scaling analysis are provided to support generalization beyond this single testbed.

Authors: We agree that the results are specific to the Blackwell testbed and do not include data from other GPU models or drivers. The paper presents this configuration as the first COTS hardware with native 10-bit 4:2:2 support, establishing an empirical baseline rather than claiming broad generalization. We will revise the discussion and conclusion sections to explicitly state the single-testbed limitation and note the absence of cross-model scaling analysis. revision: partial
Referee: [Results] The claim that the entire up to 5% BD-Rate penalty stems solely from inability to exploit spatial redundancies across slice boundaries is not supported by ablation experiments isolating this factor from per-slice header overhead, memory-bandwidth contention across encoders, or driver scheduling effects.

Authors: The manuscript attributes the penalty primarily to lost cross-slice redundancies based on HEVC slice boundary behavior. We acknowledge that without isolating ablations, contributions from header overhead, bandwidth contention, or scheduling cannot be excluded. We will revise the results discussion to present the observed BD-Rate penalty without exclusive attribution and add a limitations paragraph noting these potential confounding factors. revision: yes
Referee: [Abstract / Methodology] No dataset descriptions, error bars, verification steps, or preprocessing pipeline details are supplied, preventing full evaluation of the 122 fps and BD-Rate figures.

Authors: We will expand the methodology section to provide complete dataset descriptions (including the specific point cloud sequences and V-PCC projection parameters), details of the preprocessing pipeline, verification steps for throughput and RD measurements, and error bars or standard deviations for the reported figures based on repeated runs. revision: yes

standing simulated objections not resolved

Empirical results on additional GPU models, driver versions, or scaling analysis beyond the single Blackwell testbed, as no such hardware was available for the study.

Circularity Check

0 steps flagged

No circularity; purely empirical hardware measurements with no derivations or fitted predictions.

full rationale

The paper reports direct measurements of throughput (122 fps), BD-Rate penalty (~5%), and power on a specific Blackwell GPU testbed under various SFE configurations. No equations, parameter fits, predictions, or self-citations appear in the provided text; claims follow immediately from the experimental data without reduction to inputs by construction. The representativeness concern raised by the skeptic is a validity issue, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical performance study. No mathematical derivations, free parameters, axioms, or new entities are introduced; all claims rest on hardware measurements.

pith-pipeline@v0.9.1-grok · 5860 in / 1166 out tokens · 46858 ms · 2026-06-30T02:39:50.895511+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Performance Analysis of Hardware-Accelerated 10-Bit 4:2:2 Encoding with Split-Frame Encoding for High-Fidelity V-PCC Streaming

INTRODUCTION V olumetric media formats, such as point clouds, require significant data compression for storage and transmission [1]. V-PCC addresses this by packing 3D spatial geometry and surface attributes into 2D frames, which are then compressed using conventional 2D video codecs such as HEVC, A V1, or VVC. In this paradigm, the structural accuracy of...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Encoding Throughput Benchmark To evaluate the upper bounds of the hardware video encoder on modern COTS GPUs, we utilized the NVIDIA RTX PRO 6000 Blackwell GPU as our testbed

EXPERIMENT SETUP 2.1. Encoding Throughput Benchmark To evaluate the upper bounds of the hardware video encoder on modern COTS GPUs, we utilized the NVIDIA RTX PRO 6000 Blackwell GPU as our testbed. This GPU was selected because it integrates four NVENC engines, the maximum count in the architecture, permitting a 4-Way SFE configura- tion. While the tests ...

2025
[3]

All sequences have a framerate of 59.94 fps and are 15 seconds in length

ITE Ultra-High Definition Standard Test Sequences (Series A) [18]:As described in the official documenta- tion, this dataset includes 10 sequences at 4K resolution and 11 sequences at 8K resolution. All sequences have a framerate of 59.94 fps and are 15 seconds in length
[4]

Netflix Sol Levante [19]:A 4K (3840×2160) animation sequence at 24 fps with a running time of 4:32 minutes
[5]

Netflix Meridian [19]:A 4K (3840×2160) live-action se- quence at 59.94 fps with a running time of 11:58 minutes
[6]

Netflix Nocturne [19]:A 4K (3840×2160) High-Frame- Rate (HFR) sequence at 120 fps with a running time of 11:04 minutes
[7]

Netflix Chimera [19]:A DCI 4K (4096×2160) sequence at 59.94 fps consisting of 23 distinct scenes with a total runtime of 30:49 minutes. 2.1.2. Mezzanine File Preparation The source content was originally provided in uncompressed RAW formats (TIFF, EXR, DPX). To facilitate efficient large- scale testing and eliminate disk I/O bottlenecks during the encodin...

2084
[8]

Encoding Throughput Table 3 presents the detailed encoding throughput results

RESULTS AND ANALYSIS 3.1. Encoding Throughput Table 3 presents the detailed encoding throughput results. As mentioned earlier, for this analysis, we focus primarily on theHigh Quality (HQ)tuning, as it represents the optimal option for professional broadcasting environments. From the results, it was found that by enabling 4-way SFE, a com- mercial GPGPU c...
[9]

CONCLUSION AND FUTURE WORK This paper presented a comprehensive evaluation of hardware- accelerated 10-bit 4:2:2 video encoding on commercial off- the-shelf (COTS) GPUs, targeting the strict computational demands of real-time V-PCC and next-generation volumetric video streaming. Historically restricted to specialized ASICs, this high-fidelity encoding pro...
[10]

Transcoding v-pcc point cloud streams in real-time,

M. Rudolph, S. Schneegass, and A. Rizk, “Transcoding v-pcc point cloud streams in real-time,”ACM Trans. Multimedia Comput. Commun. Appl., vol. 21, no. 9, Sep. 2025. [Online]. Available: https://doi.org/10.1145/ 3682062 1

2025
[11]

Upsampling algorithm for v-pcc-coded 3d point clouds,

T.-L. Lin, B.-W. Su, P.-C. Shen, D.-Y . Chen, C.- F. Liang, Y .-C. Chen, Y . Wen, and M. Shahid, “Upsampling algorithm for v-pcc-coded 3d point clouds,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 20, no. 12, Nov. 2024. [Online]. Available: https://doi.org/10.1145/3690641 1

work page doi:10.1145/3690641 2024
[12]

Evaluation of 2d video interpolation and extrapolation methods for real-time v-pcc error concealment,

E. Nakajima, F. Lin, K. Arunruangsirilert, and J. Katto, “Evaluation of 2d video interpolation and extrapolation methods for real-time v-pcc error concealment,” in2025 International Conference on Visual Communications and Image Processing (VCIP), 2025, pp. 1–5. 1

2025
[13]

Learning to restore compressed point cloud attribute: A fully data- driven approach and a rules-unrolling-based optimiza- tion,

J. Zhang, J. Zhang, D. Ding, and Z. Ma, “Learning to restore compressed point cloud attribute: A fully data- driven approach and a rules-unrolling-based optimiza- tion,”IEEE Trans. Vis. Comput. Graph., vol. 31, no. 4, pp. 1985–1998, 2025. 1

1985
[14]

A versatile point cloud compressor using universal multi- scale conditional coding – part ii: Attribute,

J. Wang, R. Xue, J. Li, D. Ding, Y . Lin, and Z. Ma, “A versatile point cloud compressor using universal multi- scale conditional coding – part ii: Attribute,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 47, no. 1, pp. 252–268, 2025. 1

2025
[15]

Basics: Broad quality assessment of static point clouds in a compression sce- nario,

A. Ak, E. Zerman, M. Quach, A. Chetouani, A. Smolic, G. Valenzise, and P. Le Callet, “Basics: Broad quality assessment of static point clouds in a compression sce- nario,”IEEE Transactions on Multimedia, vol. 26, pp. 6730–6742, 2024. 1

2024
[16]

A single-chip 4k 60-fps 4:2:2 hevc video encoder lsi em- ploying efficient motion estimation and mode decision framework with scalability to 8k,

T. Onishi, T. Sano, Y . Nishida, K. Yokohari, K. Naka- mura, K. Nitta, K. Kawashima, J. Okamoto, N. Ono, A. Sagata, H. Iwasaki, M. Ikeda, and A. Shimizu, “A single-chip 4k 60-fps 4:2:2 hevc video encoder lsi em- ploying efficient motion estimation and mode decision framework with scalability to 8k,”IEEE Transactions on Very Large Scale Integration (VLSI) ...

1930
[17]

A real-time 4k hevc multi- channel encoding system with content-aware bitrate con- trol,

D. Kobayashi, K. Nakamura, T. Osawa, Y . Omori, T. Onishi, and H. Iwasaki, “A real-time 4k hevc multi- channel encoding system with content-aware bitrate con- trol,” in2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1–6. 2

2019
[18]

An 8k@120fps advanced entropy coding hardware design for avs3,

W. Li, L. Huang, C. He, M. Jing, W. Hu, and Y . Fan, “An 8k@120fps advanced entropy coding hardware design for avs3,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 8, pp. 8372–8376, 2025. 2

2025
[19]

[Online]

NVIDIA,NVIDIA RTX BLACKWELL GPU ARCHI- TECTURE Built for Neural Rendering ii NVIDIA RTX Blackwell GPU Architecture, Mar 2024. [Online]. Available: https://images.nvidia.com/aem- dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell- gpu-architecture.pdf 2

2024
[20]

[Online]

——,NVIDIA RTX PRO BLACKWELL GPU ARCHITECTURE Built for Neural Rendering ii NVIDIA RTX B lackwell GP U Architec- ture, Mar 2024. [Online]. Available: https: //www.nvidia.com/content/dam/en-zz/Solutions/design- visualization/quadro-product-literature/NVIDIA-RTX- Blackwell-PRO-GPU-Architecture-v1.0.pdf 2

2024
[21]

The required video bitrate for 8k120-hz real-time temporal scalable coding,

S. Iwasaki, X. Lei, K. Chida, Y . Sugito, K. Iguchi, K. Kanda, H. Miyoshi, and Y . Uehara, “The required video bitrate for 8k120-hz real-time temporal scalable coding,” in2020 IEEE International Conference on Con- sumer Electronics (ICCE), 2020, pp. 1–5. 2

2020
[22]

Evaluation of nvenc split-frame encoding (sfe) for uhd video transcoding,

K. Arunruangsirilert and J. Katto, “Evaluation of nvenc split-frame encoding (sfe) for uhd video transcoding,” in 2025 Picture Coding Symposium (PCS), 2025, pp. 1–5. 2, 3, 5

2025
[23]

Video encoding at 8k60 with split-frame encoding and nvidia ada lovelace architecture,

NVIDIA, “Video encoding at 8k60 with split-frame encoding and nvidia ada lovelace architecture,” Jan 2024. [Online]. Available: https://developer.nvidia.com/blog/video-encoding- at-8k60-with-split-frame-encoding-and-nvidia-ada- lovelace-architecture/ 2

2024
[24]

Evaluation of hardware-based video encoders on modern gpus for uhd live-streaming,

K. Arunruangsirilert and J. Katto, “Evaluation of hardware-based video encoders on modern gpus for uhd live-streaming,” in2024 33rd International Conference on Computer Communications and Networks (ICCCN), 2024, pp. 1–9. 2

2024
[25]

Evaluation of gpu video encoder for low-latency real-time 4k uhd encoding,

——, “Evaluation of gpu video encoder for low-latency real-time 4k uhd encoding,” in2025 International Con- ference on Visual Communications and Image Processing (VCIP), 2025, pp. 1–5. 2

2025
[26]

Evolution of NVENC Efficiency: A Longitudinal Analysis of HQ and UHQ Tuning Efficiency, Latency and Energy Trade-offs

——, “Evolution of nvenc efficiency: A longitudinal analysis of hq and uhq tuning efficiency, latency and energy trade-offs,” 2026. [Online]. Available: https://arxiv.org/abs/2605.01187 2

work page internal anchor Pith review Pith/arXiv arXiv 2026
[27]

Ultra-high definition/wide-color-gamut standard test sequences – series a,

T. I. of Image Information and T. Engineers, “Ultra-high definition/wide-color-gamut standard test sequences – series a,” Jan 2016. [Online]. Available: https: //www.ite.or.jp/content/test-materials/uhdtv a/ 3

2016
[28]

Netflix open content

I. Netflix, “Netflix open content.” [Online]. Available: https://opencontent.netflix.com/ 3
[29]

avcodec/nvenc: Add 4-way multi nvenc split frame encoding (sdk 13.0) in hevc and av1 for rtx pro 6000 blackwell #21371,

FFmpeg, “avcodec/nvenc: Add 4-way multi nvenc split frame encoding (sdk 13.0) in hevc and av1 for rtx pro 6000 blackwell #21371,” Jan 2026. [Online]. Available: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21371 3

2026
[30]

Twitch help portal

Twitch, “Twitch help portal.” [Online]. Avail- able: https://help.twitch.tv/s/article/broadcasting- guidelines?language=en US 3
[31]

Choose live encoder settings, bitrates, and resolutions - youtube help,

Google, “Choose live encoder settings, bitrates, and resolutions - youtube help,” 2019. [Online]. Available: https://support.google.com/youtube/answer/2853702 3

work page arXiv 2019
[32]

Bjøntegaard delta (bd): A tutorial overview of the metric, evolution, challenges, and recommendations,

N. Barman, M. G. Martini, and Y . Reznik, “Bjøntegaard delta (bd): A tutorial overview of the metric, evolution, challenges, and recommendations,” 2024. [Online]. Available: https://arxiv.org/abs/2401.04039 3

work page arXiv 2024
[33]

Corner case is giving wrong vmaf and psnr values!

waveletbeam, “Corner case is giving wrong vmaf and psnr values!” Oct 2019. [Online]. Available: https://github.com/Netflix/vmaf/issues/371 3

2019
[34]

Toward a better quality metric for the video community,

Z. Li, K. Swanson, C. Bampis, L. Krasula, and A. Aaron, “Toward a better quality metric for the video community,” Dec 2020. [Online]. Available: https://netflixtechblog.com/toward-a-better-quality- metric-for-the-video-community-7ed94e752a30 3

2020
[35]

Sustainable Real-Time 8K60 HEVC Encoding for V2X: Repurposing Legacy NVENC Hardware at the Vehicular Edge

K. Arunruangsirilert and J. Katto, “Sustainable real-time 8k60 hevc encoding for v2x: Repurposing legacy nvenc hardware at the vehicular edge,” 2026. [Online]. Available: https://arxiv.org/abs/2605.16738 4 A. ENCODING THROUGHPUT BENCHMARK SCRIPT The batch scripts included below are for benchmarking encoding throughput at 8K UHD resolution. For 4K UHD, use...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[1] [1]

Performance Analysis of Hardware-Accelerated 10-Bit 4:2:2 Encoding with Split-Frame Encoding for High-Fidelity V-PCC Streaming

INTRODUCTION V olumetric media formats, such as point clouds, require significant data compression for storage and transmission [1]. V-PCC addresses this by packing 3D spatial geometry and surface attributes into 2D frames, which are then compressed using conventional 2D video codecs such as HEVC, A V1, or VVC. In this paradigm, the structural accuracy of...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Encoding Throughput Benchmark To evaluate the upper bounds of the hardware video encoder on modern COTS GPUs, we utilized the NVIDIA RTX PRO 6000 Blackwell GPU as our testbed

EXPERIMENT SETUP 2.1. Encoding Throughput Benchmark To evaluate the upper bounds of the hardware video encoder on modern COTS GPUs, we utilized the NVIDIA RTX PRO 6000 Blackwell GPU as our testbed. This GPU was selected because it integrates four NVENC engines, the maximum count in the architecture, permitting a 4-Way SFE configura- tion. While the tests ...

2025

[3] [3]

All sequences have a framerate of 59.94 fps and are 15 seconds in length

ITE Ultra-High Definition Standard Test Sequences (Series A) [18]:As described in the official documenta- tion, this dataset includes 10 sequences at 4K resolution and 11 sequences at 8K resolution. All sequences have a framerate of 59.94 fps and are 15 seconds in length

[4] [4]

Netflix Sol Levante [19]:A 4K (3840×2160) animation sequence at 24 fps with a running time of 4:32 minutes

[5] [5]

Netflix Meridian [19]:A 4K (3840×2160) live-action se- quence at 59.94 fps with a running time of 11:58 minutes

[6] [6]

Netflix Nocturne [19]:A 4K (3840×2160) High-Frame- Rate (HFR) sequence at 120 fps with a running time of 11:04 minutes

[7] [7]

Netflix Chimera [19]:A DCI 4K (4096×2160) sequence at 59.94 fps consisting of 23 distinct scenes with a total runtime of 30:49 minutes. 2.1.2. Mezzanine File Preparation The source content was originally provided in uncompressed RAW formats (TIFF, EXR, DPX). To facilitate efficient large- scale testing and eliminate disk I/O bottlenecks during the encodin...

2084

[8] [8]

Encoding Throughput Table 3 presents the detailed encoding throughput results

RESULTS AND ANALYSIS 3.1. Encoding Throughput Table 3 presents the detailed encoding throughput results. As mentioned earlier, for this analysis, we focus primarily on theHigh Quality (HQ)tuning, as it represents the optimal option for professional broadcasting environments. From the results, it was found that by enabling 4-way SFE, a com- mercial GPGPU c...

[9] [9]

CONCLUSION AND FUTURE WORK This paper presented a comprehensive evaluation of hardware- accelerated 10-bit 4:2:2 video encoding on commercial off- the-shelf (COTS) GPUs, targeting the strict computational demands of real-time V-PCC and next-generation volumetric video streaming. Historically restricted to specialized ASICs, this high-fidelity encoding pro...

[10] [10]

Transcoding v-pcc point cloud streams in real-time,

M. Rudolph, S. Schneegass, and A. Rizk, “Transcoding v-pcc point cloud streams in real-time,”ACM Trans. Multimedia Comput. Commun. Appl., vol. 21, no. 9, Sep. 2025. [Online]. Available: https://doi.org/10.1145/ 3682062 1

2025

[11] [11]

Upsampling algorithm for v-pcc-coded 3d point clouds,

T.-L. Lin, B.-W. Su, P.-C. Shen, D.-Y . Chen, C.- F. Liang, Y .-C. Chen, Y . Wen, and M. Shahid, “Upsampling algorithm for v-pcc-coded 3d point clouds,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 20, no. 12, Nov. 2024. [Online]. Available: https://doi.org/10.1145/3690641 1

work page doi:10.1145/3690641 2024

[12] [12]

Evaluation of 2d video interpolation and extrapolation methods for real-time v-pcc error concealment,

E. Nakajima, F. Lin, K. Arunruangsirilert, and J. Katto, “Evaluation of 2d video interpolation and extrapolation methods for real-time v-pcc error concealment,” in2025 International Conference on Visual Communications and Image Processing (VCIP), 2025, pp. 1–5. 1

2025

[13] [13]

Learning to restore compressed point cloud attribute: A fully data- driven approach and a rules-unrolling-based optimiza- tion,

J. Zhang, J. Zhang, D. Ding, and Z. Ma, “Learning to restore compressed point cloud attribute: A fully data- driven approach and a rules-unrolling-based optimiza- tion,”IEEE Trans. Vis. Comput. Graph., vol. 31, no. 4, pp. 1985–1998, 2025. 1

1985

[14] [14]

A versatile point cloud compressor using universal multi- scale conditional coding – part ii: Attribute,

J. Wang, R. Xue, J. Li, D. Ding, Y . Lin, and Z. Ma, “A versatile point cloud compressor using universal multi- scale conditional coding – part ii: Attribute,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 47, no. 1, pp. 252–268, 2025. 1

2025

[15] [15]

Basics: Broad quality assessment of static point clouds in a compression sce- nario,

A. Ak, E. Zerman, M. Quach, A. Chetouani, A. Smolic, G. Valenzise, and P. Le Callet, “Basics: Broad quality assessment of static point clouds in a compression sce- nario,”IEEE Transactions on Multimedia, vol. 26, pp. 6730–6742, 2024. 1

2024

[16] [16]

A single-chip 4k 60-fps 4:2:2 hevc video encoder lsi em- ploying efficient motion estimation and mode decision framework with scalability to 8k,

T. Onishi, T. Sano, Y . Nishida, K. Yokohari, K. Naka- mura, K. Nitta, K. Kawashima, J. Okamoto, N. Ono, A. Sagata, H. Iwasaki, M. Ikeda, and A. Shimizu, “A single-chip 4k 60-fps 4:2:2 hevc video encoder lsi em- ploying efficient motion estimation and mode decision framework with scalability to 8k,”IEEE Transactions on Very Large Scale Integration (VLSI) ...

1930

[17] [17]

A real-time 4k hevc multi- channel encoding system with content-aware bitrate con- trol,

D. Kobayashi, K. Nakamura, T. Osawa, Y . Omori, T. Onishi, and H. Iwasaki, “A real-time 4k hevc multi- channel encoding system with content-aware bitrate con- trol,” in2019 IEEE Global Communications Conference (GLOBECOM), 2019, pp. 1–6. 2

2019

[18] [18]

An 8k@120fps advanced entropy coding hardware design for avs3,

W. Li, L. Huang, C. He, M. Jing, W. Hu, and Y . Fan, “An 8k@120fps advanced entropy coding hardware design for avs3,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 8, pp. 8372–8376, 2025. 2

2025

[19] [19]

[Online]

NVIDIA,NVIDIA RTX BLACKWELL GPU ARCHI- TECTURE Built for Neural Rendering ii NVIDIA RTX Blackwell GPU Architecture, Mar 2024. [Online]. Available: https://images.nvidia.com/aem- dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell- gpu-architecture.pdf 2

2024

[20] [20]

[Online]

——,NVIDIA RTX PRO BLACKWELL GPU ARCHITECTURE Built for Neural Rendering ii NVIDIA RTX B lackwell GP U Architec- ture, Mar 2024. [Online]. Available: https: //www.nvidia.com/content/dam/en-zz/Solutions/design- visualization/quadro-product-literature/NVIDIA-RTX- Blackwell-PRO-GPU-Architecture-v1.0.pdf 2

2024

[21] [21]

The required video bitrate for 8k120-hz real-time temporal scalable coding,

S. Iwasaki, X. Lei, K. Chida, Y . Sugito, K. Iguchi, K. Kanda, H. Miyoshi, and Y . Uehara, “The required video bitrate for 8k120-hz real-time temporal scalable coding,” in2020 IEEE International Conference on Con- sumer Electronics (ICCE), 2020, pp. 1–5. 2

2020

[22] [22]

Evaluation of nvenc split-frame encoding (sfe) for uhd video transcoding,

K. Arunruangsirilert and J. Katto, “Evaluation of nvenc split-frame encoding (sfe) for uhd video transcoding,” in 2025 Picture Coding Symposium (PCS), 2025, pp. 1–5. 2, 3, 5

2025

[23] [23]

Video encoding at 8k60 with split-frame encoding and nvidia ada lovelace architecture,

NVIDIA, “Video encoding at 8k60 with split-frame encoding and nvidia ada lovelace architecture,” Jan 2024. [Online]. Available: https://developer.nvidia.com/blog/video-encoding- at-8k60-with-split-frame-encoding-and-nvidia-ada- lovelace-architecture/ 2

2024

[24] [24]

Evaluation of hardware-based video encoders on modern gpus for uhd live-streaming,

K. Arunruangsirilert and J. Katto, “Evaluation of hardware-based video encoders on modern gpus for uhd live-streaming,” in2024 33rd International Conference on Computer Communications and Networks (ICCCN), 2024, pp. 1–9. 2

2024

[25] [25]

Evaluation of gpu video encoder for low-latency real-time 4k uhd encoding,

——, “Evaluation of gpu video encoder for low-latency real-time 4k uhd encoding,” in2025 International Con- ference on Visual Communications and Image Processing (VCIP), 2025, pp. 1–5. 2

2025

[26] [26]

Evolution of NVENC Efficiency: A Longitudinal Analysis of HQ and UHQ Tuning Efficiency, Latency and Energy Trade-offs

——, “Evolution of nvenc efficiency: A longitudinal analysis of hq and uhq tuning efficiency, latency and energy trade-offs,” 2026. [Online]. Available: https://arxiv.org/abs/2605.01187 2

work page internal anchor Pith review Pith/arXiv arXiv 2026

[27] [27]

Ultra-high definition/wide-color-gamut standard test sequences – series a,

T. I. of Image Information and T. Engineers, “Ultra-high definition/wide-color-gamut standard test sequences – series a,” Jan 2016. [Online]. Available: https: //www.ite.or.jp/content/test-materials/uhdtv a/ 3

2016

[28] [28]

Netflix open content

I. Netflix, “Netflix open content.” [Online]. Available: https://opencontent.netflix.com/ 3

[29] [29]

avcodec/nvenc: Add 4-way multi nvenc split frame encoding (sdk 13.0) in hevc and av1 for rtx pro 6000 blackwell #21371,

FFmpeg, “avcodec/nvenc: Add 4-way multi nvenc split frame encoding (sdk 13.0) in hevc and av1 for rtx pro 6000 blackwell #21371,” Jan 2026. [Online]. Available: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21371 3

2026

[30] [30]

Twitch help portal

Twitch, “Twitch help portal.” [Online]. Avail- able: https://help.twitch.tv/s/article/broadcasting- guidelines?language=en US 3

[31] [31]

Choose live encoder settings, bitrates, and resolutions - youtube help,

Google, “Choose live encoder settings, bitrates, and resolutions - youtube help,” 2019. [Online]. Available: https://support.google.com/youtube/answer/2853702 3

work page arXiv 2019

[32] [32]

Bjøntegaard delta (bd): A tutorial overview of the metric, evolution, challenges, and recommendations,

N. Barman, M. G. Martini, and Y . Reznik, “Bjøntegaard delta (bd): A tutorial overview of the metric, evolution, challenges, and recommendations,” 2024. [Online]. Available: https://arxiv.org/abs/2401.04039 3

work page arXiv 2024

[33] [33]

Corner case is giving wrong vmaf and psnr values!

waveletbeam, “Corner case is giving wrong vmaf and psnr values!” Oct 2019. [Online]. Available: https://github.com/Netflix/vmaf/issues/371 3

2019

[34] [34]

Toward a better quality metric for the video community,

Z. Li, K. Swanson, C. Bampis, L. Krasula, and A. Aaron, “Toward a better quality metric for the video community,” Dec 2020. [Online]. Available: https://netflixtechblog.com/toward-a-better-quality- metric-for-the-video-community-7ed94e752a30 3

2020

[35] [35]

Sustainable Real-Time 8K60 HEVC Encoding for V2X: Repurposing Legacy NVENC Hardware at the Vehicular Edge

K. Arunruangsirilert and J. Katto, “Sustainable real-time 8k60 hevc encoding for v2x: Repurposing legacy nvenc hardware at the vehicular edge,” 2026. [Online]. Available: https://arxiv.org/abs/2605.16738 4 A. ENCODING THROUGHPUT BENCHMARK SCRIPT The batch scripts included below are for benchmarking encoding throughput at 8K UHD resolution. For 4K UHD, use...

work page internal anchor Pith review Pith/arXiv arXiv 2026