CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery

Lingdong Wang; Ramesh K. Sitaraman; Subhransu Maji; Tung-I Chen

arxiv: 2605.18054 · v1 · pith:XO5ZDRVRnew · submitted 2026-05-18 · 📡 eess.IV · cs.CV· cs.MM

CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery

Tung-I Chen , Lingdong Wang , Subhransu Maji , Ramesh K. Sitaraman This is my paper

Pith reviewed 2026-05-20 00:32 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.MM

keywords volumetric compressionradiance fieldstriplane representationcodec-adaptive trainingrate-distortion optimizationstandard video codecsfree-viewpoint videostraight-through estimator

0 comments

The pith

Training triplane radiance fields with real codec roundtrips lets volumetric content reach better rate-distortion performance than codec-agnostic baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CATRF as a compression framework that folds standard codecs such as HEVC and AV1 directly into the training of plane-factorized radiance fields. Feature planes are quantized, packed into codec-friendly canvases, encoded and decoded, then unpacked and dequantized before rendering; a straight-through estimator lets gradients flow through the non-differentiable codec so the features can adapt to the actual client-side distortions. The resulting representations are tested on both static and dynamic volumetric benchmarks and compared against codec-agnostic training and against recent compressed 3D Gaussian splatting methods. A sympathetic reader cares because the method keeps the representation compatible with widely deployed video codecs while still improving efficiency and decoding speed, which matters for practical free-viewpoint video delivery.

Core claim

CATRF trains triplane radiance fields by quantizing and packing the 2D feature planes into canvases, running a full roundtrip through a chosen standard codec, unpacking the decoded features, and using a straight-through estimator to back-propagate through the entire non-differentiable pipeline, so that the learned features become resilient to the specific quantization and coding artifacts that the target codec will introduce at inference time.

What carries the argument

Codec-in-the-loop training with straight-through estimator, which simulates the complete quantization-packing-encoding-decoding-unpacking pipeline on the triplane features so they can adapt to real codec distortions without any learned codec parameters.

If this is right

CATRF achieves a better rate-distortion trade-off than both codec-agnostic and learned-codec baselines on static and dynamic volumetric benchmarks.
The method also outperforms recent compressed 3D Gaussian splatting approaches in both compression efficiency and decoding speed.
The approach supplies a practical route to low-bitrate, compression-resilient volumetric representations suitable for free-viewpoint video streaming.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptation strategy could be applied to other plane-based or hybrid implicit representations without changing the codec pipeline.
Client-side decoding speed gains may enable higher frame-rate or multi-user free-viewpoint experiences on consumer hardware.
Because the method uses only standard codecs, it can be deployed immediately on existing video infrastructure while still benefiting from neural representations.

Load-bearing premise

The straight-through estimator lets the radiance-field features adapt to the non-differentiable distortions of standard codecs without training instability or unmodeled quantization artifacts dominating final quality.

What would settle it

Running the same static and dynamic benchmarks and finding that CATRF's rate-distortion curves lie below or on top of the codec-agnostic baselines at all operating points would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.18054 by Lingdong Wang, Ramesh K. Sitaraman, Subhransu Maji, Tung-I Chen.

**Figure 2.** Figure 2: Illustration of the encode–decode codec round trip. In [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison with baselines on the NeRF Synthetic (left) and Tanks and Temples (right) benchmarks. On NeRF Synthetic, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons on Chair from NeRF Synthetic and Family from Tanks and Temples. Compared with recent learnedcodec and 3DGS compression baselines, CATRF-JPEG offers a flexible operating range: it can achieve substantially lower bitrate with only modest quality loss, or deliver sharper details and fewer artifacts at a slightly higher rate. weights ϕ, and any parameters required to retrieve (P, D). 3… view at source ↗

**Figure 5.** Figure 5: Rate-distortion (RD) curves. We compare CATRF with codec-agnostic TeTriRF [ [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparisons on the Neural 3D Video benchmark. At high bitrate, both CA and SCL methods produce comparable [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of appearance-plane canvases under dif [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Training diagnostics of CATRF with STE as the gradient [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of visualized appearance planes packed with [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: More qualitative comparisons of NeRF Synthetic and Tanks and Temples benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: More qualitative comparisons of Neural 3D Video and NHR benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

read the original abstract

Volumetric media promises next-generation content delivery applications, but its bandwidth demand remains a key bottleneck. Implicit and hybrid volumetric representations reduce model sizes, yet still require careful coding to reach 2D video-like bitrates. We present CATRF, a standard-codec-in-the-loop compression framework for plane-factorized radiance fields. During training, we quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip (JPEG/VP9/HEVC/AV1), then unpack and dequantize the decoded features before volume rendering. We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions without introducing any learned codec parameters. On both static and dynamic benchmarks, CATRF consistently achieves a better rate-distortion trade-off over codec-agnostic and learned-codec-in-the-loop baselines, and also outperforms recent compressed 3DGS methods in both compression efficiency and decoding speed. These results highlight a practical path toward low-bitrate, compression-resilient volumetric representations for free-viewpoint video streaming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CATRF puts standard codecs inside the triplane training loop with STE and gets measurable RD gains plus faster decode than compressed 3DGS.

read the letter

The main point is that they insert JPEG, VP9, HEVC or AV1 roundtrips directly into triplane optimization. They quantize and pack the feature planes into codec-friendly canvases, run the real codec, unpack the decoded output, and use a straight-through estimator so gradients can still flow back to the radiance field parameters. This is the concrete novelty: a codec-agnostic training procedure that still lets the features adapt to the exact distortions the client will see, without training any new codec network.

Referee Report

2 major / 2 minor

Summary. The paper introduces CATRF, a standard-codec-in-the-loop compression method for triplane radiance fields. Feature planes are quantized and packed into codec-friendly canvases, passed through a non-differentiable roundtrip of JPEG/VP9/HEVC/AV1, unpacked and dequantized, then used for volume rendering. A straight-through estimator enables end-to-end training so that the radiance-field features adapt to the actual client-side codec distortions. The authors report consistent rate-distortion gains over codec-agnostic and learned-codec baselines on both static and dynamic benchmarks, plus better compression efficiency and decoding speed than recent compressed 3DGS approaches.

Significance. If the adaptation mechanism proves robust, the work provides a practical route to low-bitrate volumetric delivery that exploits mature, hardware-supported codecs rather than requiring learned compression modules. The reported gains in rate-distortion trade-off and decoding speed on both static and dynamic scenes would be relevant for free-viewpoint video streaming applications.

major comments (2)

[§3] §3 (Method), around the STE insertion: the manuscript does not supply sufficient detail on the precise quantization thresholds, canvas packing layout, or the exact STE implementation (e.g., whether gradient clipping or scaling is applied). These choices directly affect whether the identity-gradient approximation remains informative for the highly nonlinear rate-distortion behavior of HEVC/AV1; without them the central claim that features adapt to real codec distortions cannot be fully evaluated.
[§4] §4 (Experiments), rate-distortion curves and tables: the paper should include an ablation that trains the same triplanes codec-agnostically and then applies the identical quantization/packing/codec roundtrip at test time. If the reported gains largely disappear in this setting, the adaptation benefit attributed to STE training would be undermined.

minor comments (2)

[Figure 3] Figure 3 (canvas packing illustration): the diagram would benefit from explicit annotation of the quantization step sizes and the exact spatial arrangement used for each codec.
[Related Work] Related-work section: the comparison to compressed 3DGS methods should cite the specific decoding-time measurements (e.g., FPS on the same hardware) to support the speed claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of reproducibility and experimental validation. We address each point below and have updated the manuscript to incorporate the suggested clarifications and additional analysis.

read point-by-point responses

Referee: [§3] §3 (Method), around the STE insertion: the manuscript does not supply sufficient detail on the precise quantization thresholds, canvas packing layout, or the exact STE implementation (e.g., whether gradient clipping or scaling is applied). These choices directly affect whether the identity-gradient approximation remains informative for the highly nonlinear rate-distortion behavior of HEVC/AV1; without them the central claim that features adapt to real codec distortions cannot be fully evaluated.

Authors: We agree that more precise implementation details are required for full reproducibility and to allow readers to assess the STE approximation under nonlinear codecs. In the revised manuscript we have expanded §3 with the following: uniform quantization uses a fixed step size of 1/255 on normalized feature values in [0,1]; the canvas packing layout tiles the three feature planes (each 256×256×C) into a single 512×768 canvas with explicit row/column offsets and zero-padding to codec-friendly dimensions; the STE employs the identity function in the forward pass with straight-through gradient (no clipping or scaling applied, as empirical tests showed stable convergence). We also add a short discussion of why the identity approximation remains informative despite codec nonlinearity, supported by gradient-norm statistics collected during training. revision: yes
Referee: [§4] §4 (Experiments), rate-distortion curves and tables: the paper should include an ablation that trains the same triplanes codec-agnostically and then applies the identical quantization/packing/codec roundtrip at test time. If the reported gains largely disappear in this setting, the adaptation benefit attributed to STE training would be undermined.

Authors: We appreciate this suggestion for isolating the contribution of in-loop adaptation. We have added the requested ablation to §4: the identical triplane architecture is trained without any codec in the loop (codec-agnostic baseline) and then subjected to the exact same quantization, canvas packing, and codec roundtrip (JPEG/VP9/HEVC/AV1) at test time. The new results, presented in an additional row of Table 2 and as dashed curves in Figure 4, show a consistent drop in rate-distortion performance relative to CATRF (average BD-rate increase of 18–27 % across codecs). This confirms that the observed gains are attributable to feature adaptation during STE training rather than to the quantization/packing procedure alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents CATRF as an empirical compression framework that inserts a standard codec roundtrip into the training loop via straight-through estimator. All reported gains are obtained from direct comparisons against external codec-agnostic baselines, learned-codec baselines, and compressed 3DGS methods on static and dynamic benchmarks. No equations, fitted parameters, or self-citations are shown to reduce the claimed rate-distortion improvements to quantities defined on the same test data or to prior results by the same authors. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities beyond the standard assumptions of radiance-field training and the use of a straight-through estimator.

pith-pipeline@v0.9.0 · 5755 in / 1173 out tokens · 32276 ms · 2026-05-20T00:32:02.959870+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

[1]

Simoncelli

Johannes Ball ´e, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compression. InICLR, 2017. 3

work page 2017
[2]

Simoncelli

Johannes Ball ´e, David Minnen, Saurabh Singh, Sung Jin Hwang Johnston, and Eero P. Simoncelli. Variational im- age compression with a scale hyperprior.IEEE Transactions on Image Processing, 2018. 3

work page 2018
[3]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Yoshua Bengio, Nicholas L ´eonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432, 2013. 3, 6, 4

work page internal anchor Pith review Pith/arXiv arXiv 2013
[4]

Low latency live streaming implementation in dash and hls

Abdelhak Bentaleb, Zhengdao Zhan, Farzad Tashtarian, May Lim, Saad Harous, Christian Timmerer, Hermann Hellwag- ner, and Roger Zimmermann. Low latency live streaming implementation in dash and hls. InProceedings of the 30th ACM International Conference on Multimedia, pages 7343– 7346, 2022. 1

work page 2022
[5]

Abdelhak Bentaleb, May Lim, Sarra Hammoudi, Saad Harous, and Roger Zimmermann. Solutions, challenges, and opportunities in volumetric video streaming: an architectural perspective.ACM Transactions on Multimedia Computing, Communications and Applications, 21(7):1–35, 2025. 1

work page 2025
[6]

Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc

Gisle Bjontegaard. Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc. VCEG-M33, 2001. 7

work page 2001
[7]

Proxylessnas: Direct neural architecture search on target task and hardware

Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In ICLR, 2019. 3

work page 2019
[8]

Efficient geometry-aware 3d generative adversarial networks

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16123–16133, 2022. 1, 3

work page 2022
[9]

Tensorf: Tensorial radiance fields

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean con- ference on computer vision, pages 333–350. Springer, 2022. 1, 3, 6, 7, 8, 5

work page 2022
[10]

How far can we compress instant-ngp-based nerf? In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20321–20330, 2024

Yihang Chen, Qianyi Wu, Mehrtash Harandi, and Jianfei Cai. How far can we compress instant-ngp-based nerf? In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20321–20330, 2024. 2, 3, 4, 6, 7, 8, 1, 5

work page 2024
[11]

Hac: Hash-grid assisted context for 3d gaussian splatting compression

Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InEuropean Conference on Computer Vision, pages 422–438. Springer, 2024. 1, 3

work page 2024
[12]

Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2, 3, 4, 7, 8, 5

work page 2025
[13]

High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG), 34(4):1–13,

Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Den- nis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG), 34(4):1–13,

work page
[14]

Binaryconnect: Training deep neural networks with binary weights during propagations

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. InNeurIPS, 2015. 3

work page 2015
[15]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022. 3, 6

work page 2022
[16]

K-planes: Explicit radiance fields in space, time, and appearance

Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12479–12488, 2023. 3, 6, 7

work page 2023
[17]

Queen: Quantized efficient encoding of dynamic gaussians for streaming free-viewpoint videos.Advances in Neural Information Processing Systems, 37:43435–43467, 2024

Sharath Girish, Tianye Li, Amrita Mazumdar, Abhinav Shri- vastava, Shalini De Mello, et al. Queen: Quantized efficient encoding of dynamic gaussians for streaming free-viewpoint videos.Advances in Neural Information Processing Systems, 37:43435–43467, 2024. 3, 7

work page 2024
[18]

Danillo Graziosi, Ohji Nakagami, Satoru Kuma, Alexandre Zaghetto, Teruhiko Suzuki, and Ali Tabatabai. An overview of ongoing point cloud compression standardization activi- ties: Video-based (v-pcc) and geometry-based (g-pcc).AP- SIPA Transactions on Signal and Information Processing, 9: e13, 2020. 2

work page 2020
[19]

3dgen: Triplane la- tent diffusion for textured mesh generation.arXiv preprint arXiv:2303.05371,

Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, and Bar- las O˘guz. 3dgen: Triplane latent diffusion for textured mesh generation.arXiv preprint arXiv:2303.05371, 2023. 3

work page arXiv 2023
[20]

Vrvvc: Variable-rate nerf-based volumetric video compression

Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, and Yanfeng Wang. Vrvvc: Variable-rate nerf-based volumetric video compression. InProceedings of the AAAI Conference on Ar- tificial Intelligence, pages 3563–3571, 2025. 2, 3, 6

work page 2025
[21]

Sandwiched video compression: Efficiently extending the reach of standard codecs with neu- ral wrappers

Berivan Isik, Onur G Guleryuz, Danhang Tang, Jonathan Taylor, and Philip A Chou. Sandwiched video compression: Efficiently extending the reach of standard codecs with neu- ral wrappers. In2023 IEEE International Conference on Im- age Processing (ICIP), pages 2055–2059. IEEE, 2023. 2, 3

work page 2055
[22]

Towards practical real-time neural video compression

Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, and Yan Lu. Towards practical real-time neural video compression. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12543–12552,

work page
[23]

From capture to display: A survey on volumetric video

Yili Jin, Kaiyuan Hu, Junhua Liu, Fangxin Wang, and Xue Liu. From capture to display: A survey on volumetric video. arXiv preprint arXiv:2309.05658, 2023. 1

work page arXiv 2023
[24]

Codecnerf: Toward fast encoding and decoding, compact, and high-quality novel-view synthesis

Gyeongjin Kang, Younggeun Lee, Seungjun Oh, and Eun- byung Park. Codecnerf: Toward fast encoding and decoding, compact, and high-quality novel-view synthesis. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 4203–4211, 2025. 2

work page 2025
[25]

Plenoptic png: Real-time neural radiance fields in 150 kb

Jae Yong Lee, Yuqun Wu, Chuhang Zou, Derek Hoiem, and Shenlong Wang. Plenoptic png: Real-time neural radiance fields in 150 kb. In2025 International Conference on 3D Vision (3DV), pages 502–511. IEEE, 2025. 7

work page 2025
[26]

Ecrf: Entropy-constrained neural ra- diance fields compression with frequency domain optimiza- tion

Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, and Cornelius Hellge. Ecrf: Entropy-constrained neural ra- diance fields compression with frequency domain optimiza- tion. In2024 IEEE 26th International Workshop on Multi- media Signal Processing (MMSP), pages 1–6. IEEE, 2024. 2, 6, 5

work page 2024
[27]

Compression of 3d gaussian splatting with optimized feature planes and standard video codecs,

Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, and Cornelius Hellge. Compression of 3d gaussian splatting with optimized feature planes and standard video codecs. arXiv preprint arXiv:2501.03399, 2025. 2, 3

work page arXiv 2025
[28]

Gifstream: 4d gaussian-based immersive video with feature stream

Hao Li, Sicheng Li, Xiang Gao, Abudouaihati Batuer, Lu Yu, and Yiyi Liao. Gifstream: 4d gaussian-based immersive video with feature stream. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21761– 21770, 2025. 2, 3, 5, 7, 8

work page 2025
[29]

Neural video compression with feature modulation

Jiahao Li, Bin Li, and Yan Lu. Neural video compression with feature modulation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26099–26108, 2024. 2, 3

work page 2024
[30]

Streaming radiance fields for 3d video synthe- sis.Advances in Neural Information Processing Systems, 35: 13485–13498, 2022

Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, and Ping Tan. Streaming radiance fields for 3d video synthe- sis.Advances in Neural Information Processing Systems, 35: 13485–13498, 2022. 7

work page 2022
[31]

Instant3d: Instant text- to-3d generation.International Journal of Computer Vision, 132(10):4456–4472, 2024

Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, and Xiangyu Xu. Instant3d: Instant text- to-3d generation.International Journal of Computer Vision, 132(10):4456–4472, 2024. 3

work page 2024
[32]

Nerfcodec: Neural feature compression meets neural radiance fields for memory-efficient scene representation

Sicheng Li, Hao Li, Yiyi Liao, and Lu Yu. Nerfcodec: Neural feature compression meets neural radiance fields for memory-efficient scene representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21274–21283, 2024. 2, 3, 6, 7, 8, 1, 5

work page 2024
[33]

Neural 3d video synthesis from multi-view video

Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5521–5531, 2022. 6

work page 2022
[34]

Spacetime gaus- sian feature splatting for real-time dynamic view synthesis

Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaus- sian feature splatting for real-time dynamic view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8508–8520, 2024. 3, 7

work page 2024
[35]

Neural sparse voxel fields.Advances in Neural Information Processing Systems, 33:15651–15663,

Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. Neural sparse voxel fields.Advances in Neural Information Processing Systems, 33:15651–15663,

work page
[36]

Efficient evaluation of quantization-effects in neural codecs.arXiv preprint arXiv:2502.04770, 2025

Wolfgang Mack, Ahmed Mustafa, Rafał Łaganowski, and Samer Hijazy. Efficient evaluation of quantization-effects in neural codecs.arXiv preprint arXiv:2502.04770, 2025. 6, 3, 4

work page arXiv 2025
[37]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 1, 6, 5

work page 2021
[38]

Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 3, 6

work page 2022
[39]

Compressed 3d gaussian splatting for accelerated novel view synthesis

Simon Niedermayr, Josef Stumpfegger, and R ¨udiger West- ermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10349–10358, 2024. 1

work page 2024
[40]

Holoportation: Virtual 3d teleportation in real-time

Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, et al. Holoportation: Virtual 3d teleportation in real-time. InPro- ceedings of the 29th annual symposium on user interface software and technology, pages 741–754, 2016. 1

work page 2016
[41]

Differentiable signal processing with black-box audio effects

Marco A Mart ´ınez Ram´ırez, Oliver Wang, Paris Smaragdis, and Nicholas J Bryan. Differentiable signal processing with black-box audio effects. InICASSP 2021-2021 IEEE Inter- national Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), pages 66–70. IEEE, 2021. 3

work page 2021
[42]

Adaptive bitrate selection: A survey.IEEE Communications Surveys & Tutorials, 19(4):2985–3014, 2017

Yusuf Sani, Andreas Mauthe, and Christopher Edwards. Adaptive bitrate selection: A survey.IEEE Communications Surveys & Tutorials, 19(4):2985–3014, 2017. 2

work page 2017
[43]

Swings: sliding windows for dynamic 3d gaussian splatting

Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, and Eduardo P´erez-Pellitero. Swings: sliding windows for dynamic 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 37–54. Springer, 2024. 1

work page 2024
[44]

Binary radiance fields.Ad- vances in neural information processing systems, 36:55919– 55931, 2023

Seungjoo Shin and Jaesik Park. Binary radiance fields.Ad- vances in neural information processing systems, 36:55919– 55931, 2023. 3, 6

work page 2023
[45]

3d neural field generation using triplane diffusion

J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, and Gordon Wetzstein. 3d neural field generation using triplane diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023. 3

work page 2023
[46]

The mpeg-dash standard for multimedia streaming over the internet.IEEE multimedia, 18(4):62–67,

Iraj Sodagar. The mpeg-dash standard for multimedia streaming over the internet.IEEE multimedia, 18(4):62–67,

work page
[47]

Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023

Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023. 6, 7

work page 2023
[48]

An overview of the simultaneous perturba- tion method for efficient optimization.Johns Hopkins apl technical digest, 19(4):482–492, 1998

James C Spall. An overview of the simultaneous perturba- tion method for efficient optimization.Johns Hopkins apl technical digest, 19(4):482–492, 1998. 6, 3, 4

work page 1998
[49]

Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction

Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5459– 5469, 2022. 3, 6, 7, 8

work page 2022
[50]

3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free- viewpoint videos

Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, and Wei Xing. 3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free- viewpoint videos. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20675–20685, 2024. 3, 7

work page 2024
[51]

Videorf: Ren- dering dynamic radiance fields as 2d feature video streams

Liao Wang, Kaixin Yao, Chengcheng Guo, Zhirui Zhang, Qiang Hu, Jingyi Yu, Lan Xu, and Minye Wu. Videorf: Ren- dering dynamic radiance fields as 2d feature video streams. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 470–481, 2024. 2

work page 2024
[52]

Vˆ 3: View- ing volumetric videos on mobiles via streamable 2d dynamic gaussians.ACM Transactions on Graphics (TOG), 43(6):1– 13, 2024

Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, and Lan Xu. Vˆ 3: View- ing volumetric videos on mobiles via streamable 2d dynamic gaussians.ACM Transactions on Graphics (TOG), 43(6):1– 13, 2024. 2

work page 2024
[53]

4d gaussian splatting for real-time dynamic scene rendering

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 20310–20320, 2024. 1

work page 2024
[54]

Multi-view neural human rendering

Minye Wu, Yuehao Wang, Qiang Hu, and Jingyi Yu. Multi-view neural human rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1682–1691, 2020. 6

work page 2020
[55]

Tetrirf: Temporal tri-plane radiance fields for efficient free-viewpoint video

Minye Wu, Zehao Wang, Georgios Kouros, and Tinne Tuyte- laars. Tetrirf: Temporal tri-plane radiance fields for efficient free-viewpoint video. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6487–6496, 2024. 2, 3, 5, 6, 7

work page 2024
[56]

Improving the straight- through estimator with zeroth-order information.arXiv preprint arXiv:2510.23926, 2025

Ningfeng Yang and Tor M Aamodt. Improving the straight- through estimator with zeroth-order information.arXiv preprint arXiv:2510.23926, 2025. 3

work page arXiv 2025
[57]

Neural adaptive content-aware internet video delivery

Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han. Neural adaptive content-aware internet video delivery. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 645– 661, 2018. 2

work page 2018
[58]

Nemo: enabling neural-enhanced video streaming on commodity mobile devices

Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. InProceedings of the 26th Annual International Conference on Mobile Com- puting and Networking, pages 1–14, 2020. 2

work page 2020
[59]

Rate-aware compression for nerf- based volumetric video

Zhiyu Zhang, Guo Lu, Huanxiong Liang, Zhengxue Cheng, Anni Tang, and Li Song. Rate-aware compression for nerf- based volumetric video. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3974–3983,

work page
[60]

Jointrf: end-to- end joint optimization for dynamic neural radiance field rep- resentation and compression

Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, and Yanfeng Wang. Jointrf: end-to- end joint optimization for dynamic neural radiance field rep- resentation and compression. In2024 IEEE International Conference on Image Processing (ICIP), pages 3292–3298. IEEE, 2024. 3, 6 CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetr...

work page 2024
[61]

Sample a frame indext, rays{(o,d)}, and camera poseπ

work page
[62]

If the cache is empty org−g cache ≥M, markrefreshas true

work page
[63]

If the relative change between the current(P ax t , Dt)and their cached snapshots exceeds a thresholdϵ, also markrefreshas true

work page
[64]

2) for the video segment

Ifrefreshis true, then run the encode–decode codec round trip (as illustrated in Fig. 2) for the video segment

work page
[65]

(b) For density: bDt ←cached decoded density, eDt ← bDt + Dt −detach(D t)

STE substitution using cached reconstructions: (a) For each axisax∈ {xy, xz, yz}: bP ax t ←cached decoded plane, eP ax t ← bP ax t + P ax t −detach(P ax t ) . (b) For density: bDt ←cached decoded density, eDt ← bDt + Dt −detach(D t)

work page
[66]

4 shows that increasing the refresh intervalMyields a favorable trade-off between accuracy and training efficiency

Render and compute losses: I← R( ePt,eDt, π;ϕ), Tab. 4 shows that increasing the refresh intervalMyields a favorable trade-off between accuracy and training efficiency. It suggests that relatively infrequent cache updates (e.g.,M= 128) already capture most of the benefit of SCL training, while keeping the overhead of expensive codec round trips manageable...

work page 2000

[1] [1]

Simoncelli

Johannes Ball ´e, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compression. InICLR, 2017. 3

work page 2017

[2] [2]

Simoncelli

Johannes Ball ´e, David Minnen, Saurabh Singh, Sung Jin Hwang Johnston, and Eero P. Simoncelli. Variational im- age compression with a scale hyperprior.IEEE Transactions on Image Processing, 2018. 3

work page 2018

[3] [3]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Yoshua Bengio, Nicholas L ´eonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432, 2013. 3, 6, 4

work page internal anchor Pith review Pith/arXiv arXiv 2013

[4] [4]

Low latency live streaming implementation in dash and hls

Abdelhak Bentaleb, Zhengdao Zhan, Farzad Tashtarian, May Lim, Saad Harous, Christian Timmerer, Hermann Hellwag- ner, and Roger Zimmermann. Low latency live streaming implementation in dash and hls. InProceedings of the 30th ACM International Conference on Multimedia, pages 7343– 7346, 2022. 1

work page 2022

[5] [5]

Abdelhak Bentaleb, May Lim, Sarra Hammoudi, Saad Harous, and Roger Zimmermann. Solutions, challenges, and opportunities in volumetric video streaming: an architectural perspective.ACM Transactions on Multimedia Computing, Communications and Applications, 21(7):1–35, 2025. 1

work page 2025

[6] [6]

Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc

Gisle Bjontegaard. Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc. VCEG-M33, 2001. 7

work page 2001

[7] [7]

Proxylessnas: Direct neural architecture search on target task and hardware

Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In ICLR, 2019. 3

work page 2019

[8] [8]

Efficient geometry-aware 3d generative adversarial networks

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16123–16133, 2022. 1, 3

work page 2022

[9] [9]

Tensorf: Tensorial radiance fields

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean con- ference on computer vision, pages 333–350. Springer, 2022. 1, 3, 6, 7, 8, 5

work page 2022

[10] [10]

How far can we compress instant-ngp-based nerf? In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20321–20330, 2024

Yihang Chen, Qianyi Wu, Mehrtash Harandi, and Jianfei Cai. How far can we compress instant-ngp-based nerf? In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20321–20330, 2024. 2, 3, 4, 6, 7, 8, 1, 5

work page 2024

[11] [11]

Hac: Hash-grid assisted context for 3d gaussian splatting compression

Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InEuropean Conference on Computer Vision, pages 422–438. Springer, 2024. 1, 3

work page 2024

[12] [12]

Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2, 3, 4, 7, 8, 5

work page 2025

[13] [13]

High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG), 34(4):1–13,

Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Den- nis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG), 34(4):1–13,

work page

[14] [14]

Binaryconnect: Training deep neural networks with binary weights during propagations

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. InNeurIPS, 2015. 3

work page 2015

[15] [15]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022. 3, 6

work page 2022

[16] [16]

K-planes: Explicit radiance fields in space, time, and appearance

Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12479–12488, 2023. 3, 6, 7

work page 2023

[17] [17]

Queen: Quantized efficient encoding of dynamic gaussians for streaming free-viewpoint videos.Advances in Neural Information Processing Systems, 37:43435–43467, 2024

Sharath Girish, Tianye Li, Amrita Mazumdar, Abhinav Shri- vastava, Shalini De Mello, et al. Queen: Quantized efficient encoding of dynamic gaussians for streaming free-viewpoint videos.Advances in Neural Information Processing Systems, 37:43435–43467, 2024. 3, 7

work page 2024

[18] [18]

Danillo Graziosi, Ohji Nakagami, Satoru Kuma, Alexandre Zaghetto, Teruhiko Suzuki, and Ali Tabatabai. An overview of ongoing point cloud compression standardization activi- ties: Video-based (v-pcc) and geometry-based (g-pcc).AP- SIPA Transactions on Signal and Information Processing, 9: e13, 2020. 2

work page 2020

[19] [19]

3dgen: Triplane la- tent diffusion for textured mesh generation.arXiv preprint arXiv:2303.05371,

Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, and Bar- las O˘guz. 3dgen: Triplane latent diffusion for textured mesh generation.arXiv preprint arXiv:2303.05371, 2023. 3

work page arXiv 2023

[20] [20]

Vrvvc: Variable-rate nerf-based volumetric video compression

Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, and Yanfeng Wang. Vrvvc: Variable-rate nerf-based volumetric video compression. InProceedings of the AAAI Conference on Ar- tificial Intelligence, pages 3563–3571, 2025. 2, 3, 6

work page 2025

[21] [21]

Sandwiched video compression: Efficiently extending the reach of standard codecs with neu- ral wrappers

Berivan Isik, Onur G Guleryuz, Danhang Tang, Jonathan Taylor, and Philip A Chou. Sandwiched video compression: Efficiently extending the reach of standard codecs with neu- ral wrappers. In2023 IEEE International Conference on Im- age Processing (ICIP), pages 2055–2059. IEEE, 2023. 2, 3

work page 2055

[22] [22]

Towards practical real-time neural video compression

Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, and Yan Lu. Towards practical real-time neural video compression. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12543–12552,

work page

[23] [23]

From capture to display: A survey on volumetric video

Yili Jin, Kaiyuan Hu, Junhua Liu, Fangxin Wang, and Xue Liu. From capture to display: A survey on volumetric video. arXiv preprint arXiv:2309.05658, 2023. 1

work page arXiv 2023

[24] [24]

Codecnerf: Toward fast encoding and decoding, compact, and high-quality novel-view synthesis

Gyeongjin Kang, Younggeun Lee, Seungjun Oh, and Eun- byung Park. Codecnerf: Toward fast encoding and decoding, compact, and high-quality novel-view synthesis. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 4203–4211, 2025. 2

work page 2025

[25] [25]

Plenoptic png: Real-time neural radiance fields in 150 kb

Jae Yong Lee, Yuqun Wu, Chuhang Zou, Derek Hoiem, and Shenlong Wang. Plenoptic png: Real-time neural radiance fields in 150 kb. In2025 International Conference on 3D Vision (3DV), pages 502–511. IEEE, 2025. 7

work page 2025

[26] [26]

Ecrf: Entropy-constrained neural ra- diance fields compression with frequency domain optimiza- tion

Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, and Cornelius Hellge. Ecrf: Entropy-constrained neural ra- diance fields compression with frequency domain optimiza- tion. In2024 IEEE 26th International Workshop on Multi- media Signal Processing (MMSP), pages 1–6. IEEE, 2024. 2, 6, 5

work page 2024

[27] [27]

Compression of 3d gaussian splatting with optimized feature planes and standard video codecs,

Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, and Cornelius Hellge. Compression of 3d gaussian splatting with optimized feature planes and standard video codecs. arXiv preprint arXiv:2501.03399, 2025. 2, 3

work page arXiv 2025

[28] [28]

Gifstream: 4d gaussian-based immersive video with feature stream

Hao Li, Sicheng Li, Xiang Gao, Abudouaihati Batuer, Lu Yu, and Yiyi Liao. Gifstream: 4d gaussian-based immersive video with feature stream. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21761– 21770, 2025. 2, 3, 5, 7, 8

work page 2025

[29] [29]

Neural video compression with feature modulation

Jiahao Li, Bin Li, and Yan Lu. Neural video compression with feature modulation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26099–26108, 2024. 2, 3

work page 2024

[30] [30]

Streaming radiance fields for 3d video synthe- sis.Advances in Neural Information Processing Systems, 35: 13485–13498, 2022

Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, and Ping Tan. Streaming radiance fields for 3d video synthe- sis.Advances in Neural Information Processing Systems, 35: 13485–13498, 2022. 7

work page 2022

[31] [31]

Instant3d: Instant text- to-3d generation.International Journal of Computer Vision, 132(10):4456–4472, 2024

Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, and Xiangyu Xu. Instant3d: Instant text- to-3d generation.International Journal of Computer Vision, 132(10):4456–4472, 2024. 3

work page 2024

[32] [32]

Nerfcodec: Neural feature compression meets neural radiance fields for memory-efficient scene representation

Sicheng Li, Hao Li, Yiyi Liao, and Lu Yu. Nerfcodec: Neural feature compression meets neural radiance fields for memory-efficient scene representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21274–21283, 2024. 2, 3, 6, 7, 8, 1, 5

work page 2024

[33] [33]

Neural 3d video synthesis from multi-view video

Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5521–5531, 2022. 6

work page 2022

[34] [34]

Spacetime gaus- sian feature splatting for real-time dynamic view synthesis

Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaus- sian feature splatting for real-time dynamic view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8508–8520, 2024. 3, 7

work page 2024

[35] [35]

Neural sparse voxel fields.Advances in Neural Information Processing Systems, 33:15651–15663,

Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. Neural sparse voxel fields.Advances in Neural Information Processing Systems, 33:15651–15663,

work page

[36] [36]

Efficient evaluation of quantization-effects in neural codecs.arXiv preprint arXiv:2502.04770, 2025

Wolfgang Mack, Ahmed Mustafa, Rafał Łaganowski, and Samer Hijazy. Efficient evaluation of quantization-effects in neural codecs.arXiv preprint arXiv:2502.04770, 2025. 6, 3, 4

work page arXiv 2025

[37] [37]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 1, 6, 5

work page 2021

[38] [38]

Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 3, 6

work page 2022

[39] [39]

Compressed 3d gaussian splatting for accelerated novel view synthesis

Simon Niedermayr, Josef Stumpfegger, and R ¨udiger West- ermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10349–10358, 2024. 1

work page 2024

[40] [40]

Holoportation: Virtual 3d teleportation in real-time

Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, et al. Holoportation: Virtual 3d teleportation in real-time. InPro- ceedings of the 29th annual symposium on user interface software and technology, pages 741–754, 2016. 1

work page 2016

[41] [41]

Differentiable signal processing with black-box audio effects

Marco A Mart ´ınez Ram´ırez, Oliver Wang, Paris Smaragdis, and Nicholas J Bryan. Differentiable signal processing with black-box audio effects. InICASSP 2021-2021 IEEE Inter- national Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), pages 66–70. IEEE, 2021. 3

work page 2021

[42] [42]

Adaptive bitrate selection: A survey.IEEE Communications Surveys & Tutorials, 19(4):2985–3014, 2017

Yusuf Sani, Andreas Mauthe, and Christopher Edwards. Adaptive bitrate selection: A survey.IEEE Communications Surveys & Tutorials, 19(4):2985–3014, 2017. 2

work page 2017

[43] [43]

Swings: sliding windows for dynamic 3d gaussian splatting

Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, and Eduardo P´erez-Pellitero. Swings: sliding windows for dynamic 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 37–54. Springer, 2024. 1

work page 2024

[44] [44]

Binary radiance fields.Ad- vances in neural information processing systems, 36:55919– 55931, 2023

Seungjoo Shin and Jaesik Park. Binary radiance fields.Ad- vances in neural information processing systems, 36:55919– 55931, 2023. 3, 6

work page 2023

[45] [45]

3d neural field generation using triplane diffusion

J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, and Gordon Wetzstein. 3d neural field generation using triplane diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023. 3

work page 2023

[46] [46]

The mpeg-dash standard for multimedia streaming over the internet.IEEE multimedia, 18(4):62–67,

Iraj Sodagar. The mpeg-dash standard for multimedia streaming over the internet.IEEE multimedia, 18(4):62–67,

work page

[47] [47]

Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023

Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023. 6, 7

work page 2023

[48] [48]

An overview of the simultaneous perturba- tion method for efficient optimization.Johns Hopkins apl technical digest, 19(4):482–492, 1998

James C Spall. An overview of the simultaneous perturba- tion method for efficient optimization.Johns Hopkins apl technical digest, 19(4):482–492, 1998. 6, 3, 4

work page 1998

[49] [49]

Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction

Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5459– 5469, 2022. 3, 6, 7, 8

work page 2022

[50] [50]

3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free- viewpoint videos

Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, and Wei Xing. 3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free- viewpoint videos. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20675–20685, 2024. 3, 7

work page 2024

[51] [51]

Videorf: Ren- dering dynamic radiance fields as 2d feature video streams

Liao Wang, Kaixin Yao, Chengcheng Guo, Zhirui Zhang, Qiang Hu, Jingyi Yu, Lan Xu, and Minye Wu. Videorf: Ren- dering dynamic radiance fields as 2d feature video streams. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 470–481, 2024. 2

work page 2024

[52] [52]

Vˆ 3: View- ing volumetric videos on mobiles via streamable 2d dynamic gaussians.ACM Transactions on Graphics (TOG), 43(6):1– 13, 2024

Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, and Lan Xu. Vˆ 3: View- ing volumetric videos on mobiles via streamable 2d dynamic gaussians.ACM Transactions on Graphics (TOG), 43(6):1– 13, 2024. 2

work page 2024

[53] [53]

4d gaussian splatting for real-time dynamic scene rendering

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 20310–20320, 2024. 1

work page 2024

[54] [54]

Multi-view neural human rendering

Minye Wu, Yuehao Wang, Qiang Hu, and Jingyi Yu. Multi-view neural human rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1682–1691, 2020. 6

work page 2020

[55] [55]

Tetrirf: Temporal tri-plane radiance fields for efficient free-viewpoint video

Minye Wu, Zehao Wang, Georgios Kouros, and Tinne Tuyte- laars. Tetrirf: Temporal tri-plane radiance fields for efficient free-viewpoint video. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6487–6496, 2024. 2, 3, 5, 6, 7

work page 2024

[56] [56]

Improving the straight- through estimator with zeroth-order information.arXiv preprint arXiv:2510.23926, 2025

Ningfeng Yang and Tor M Aamodt. Improving the straight- through estimator with zeroth-order information.arXiv preprint arXiv:2510.23926, 2025. 3

work page arXiv 2025

[57] [57]

Neural adaptive content-aware internet video delivery

Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han. Neural adaptive content-aware internet video delivery. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 645– 661, 2018. 2

work page 2018

[58] [58]

Nemo: enabling neural-enhanced video streaming on commodity mobile devices

Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. InProceedings of the 26th Annual International Conference on Mobile Com- puting and Networking, pages 1–14, 2020. 2

work page 2020

[59] [59]

Rate-aware compression for nerf- based volumetric video

Zhiyu Zhang, Guo Lu, Huanxiong Liang, Zhengxue Cheng, Anni Tang, and Li Song. Rate-aware compression for nerf- based volumetric video. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3974–3983,

work page

[60] [60]

Jointrf: end-to- end joint optimization for dynamic neural radiance field rep- resentation and compression

Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, and Yanfeng Wang. Jointrf: end-to- end joint optimization for dynamic neural radiance field rep- resentation and compression. In2024 IEEE International Conference on Image Processing (ICIP), pages 3292–3298. IEEE, 2024. 3, 6 CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetr...

work page 2024

[61] [61]

Sample a frame indext, rays{(o,d)}, and camera poseπ

work page

[62] [62]

If the cache is empty org−g cache ≥M, markrefreshas true

work page

[63] [63]

If the relative change between the current(P ax t , Dt)and their cached snapshots exceeds a thresholdϵ, also markrefreshas true

work page

[64] [64]

2) for the video segment

Ifrefreshis true, then run the encode–decode codec round trip (as illustrated in Fig. 2) for the video segment

work page

[65] [65]

(b) For density: bDt ←cached decoded density, eDt ← bDt + Dt −detach(D t)

STE substitution using cached reconstructions: (a) For each axisax∈ {xy, xz, yz}: bP ax t ←cached decoded plane, eP ax t ← bP ax t + P ax t −detach(P ax t ) . (b) For density: bDt ←cached decoded density, eDt ← bDt + Dt −detach(D t)

work page

[66] [66]

4 shows that increasing the refresh intervalMyields a favorable trade-off between accuracy and training efficiency

Render and compute losses: I← R( ePt,eDt, π;ϕ), Tab. 4 shows that increasing the refresh intervalMyields a favorable trade-off between accuracy and training efficiency. It suggests that relatively infrequent cache updates (e.g.,M= 128) already capture most of the benefit of SCL training, while keeping the overhead of expensive codec round trips manageable...

work page 2000