CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content Delivery
Pith reviewed 2026-05-20 00:32 UTC · model grok-4.3
The pith
Training triplane radiance fields with real codec roundtrips lets volumetric content reach better rate-distortion performance than codec-agnostic baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CATRF trains triplane radiance fields by quantizing and packing the 2D feature planes into canvases, running a full roundtrip through a chosen standard codec, unpacking the decoded features, and using a straight-through estimator to back-propagate through the entire non-differentiable pipeline, so that the learned features become resilient to the specific quantization and coding artifacts that the target codec will introduce at inference time.
What carries the argument
Codec-in-the-loop training with straight-through estimator, which simulates the complete quantization-packing-encoding-decoding-unpacking pipeline on the triplane features so they can adapt to real codec distortions without any learned codec parameters.
If this is right
- CATRF achieves a better rate-distortion trade-off than both codec-agnostic and learned-codec baselines on static and dynamic volumetric benchmarks.
- The method also outperforms recent compressed 3D Gaussian splatting approaches in both compression efficiency and decoding speed.
- The approach supplies a practical route to low-bitrate, compression-resilient volumetric representations suitable for free-viewpoint video streaming.
Where Pith is reading between the lines
- The same adaptation strategy could be applied to other plane-based or hybrid implicit representations without changing the codec pipeline.
- Client-side decoding speed gains may enable higher frame-rate or multi-user free-viewpoint experiences on consumer hardware.
- Because the method uses only standard codecs, it can be deployed immediately on existing video infrastructure while still benefiting from neural representations.
Load-bearing premise
The straight-through estimator lets the radiance-field features adapt to the non-differentiable distortions of standard codecs without training instability or unmodeled quantization artifacts dominating final quality.
What would settle it
Running the same static and dynamic benchmarks and finding that CATRF's rate-distortion curves lie below or on top of the codec-agnostic baselines at all operating points would falsify the central claim.
Figures
read the original abstract
Volumetric media promises next-generation content delivery applications, but its bandwidth demand remains a key bottleneck. Implicit and hybrid volumetric representations reduce model sizes, yet still require careful coding to reach 2D video-like bitrates. We present CATRF, a standard-codec-in-the-loop compression framework for plane-factorized radiance fields. During training, we quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip (JPEG/VP9/HEVC/AV1), then unpack and dequantize the decoded features before volume rendering. We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions without introducing any learned codec parameters. On both static and dynamic benchmarks, CATRF consistently achieves a better rate-distortion trade-off over codec-agnostic and learned-codec-in-the-loop baselines, and also outperforms recent compressed 3DGS methods in both compression efficiency and decoding speed. These results highlight a practical path toward low-bitrate, compression-resilient volumetric representations for free-viewpoint video streaming.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CATRF, a standard-codec-in-the-loop compression method for triplane radiance fields. Feature planes are quantized and packed into codec-friendly canvases, passed through a non-differentiable roundtrip of JPEG/VP9/HEVC/AV1, unpacked and dequantized, then used for volume rendering. A straight-through estimator enables end-to-end training so that the radiance-field features adapt to the actual client-side codec distortions. The authors report consistent rate-distortion gains over codec-agnostic and learned-codec baselines on both static and dynamic benchmarks, plus better compression efficiency and decoding speed than recent compressed 3DGS approaches.
Significance. If the adaptation mechanism proves robust, the work provides a practical route to low-bitrate volumetric delivery that exploits mature, hardware-supported codecs rather than requiring learned compression modules. The reported gains in rate-distortion trade-off and decoding speed on both static and dynamic scenes would be relevant for free-viewpoint video streaming applications.
major comments (2)
- [§3] §3 (Method), around the STE insertion: the manuscript does not supply sufficient detail on the precise quantization thresholds, canvas packing layout, or the exact STE implementation (e.g., whether gradient clipping or scaling is applied). These choices directly affect whether the identity-gradient approximation remains informative for the highly nonlinear rate-distortion behavior of HEVC/AV1; without them the central claim that features adapt to real codec distortions cannot be fully evaluated.
- [§4] §4 (Experiments), rate-distortion curves and tables: the paper should include an ablation that trains the same triplanes codec-agnostically and then applies the identical quantization/packing/codec roundtrip at test time. If the reported gains largely disappear in this setting, the adaptation benefit attributed to STE training would be undermined.
minor comments (2)
- [Figure 3] Figure 3 (canvas packing illustration): the diagram would benefit from explicit annotation of the quantization step sizes and the exact spatial arrangement used for each codec.
- [Related Work] Related-work section: the comparison to compressed 3DGS methods should cite the specific decoding-time measurements (e.g., FPS on the same hardware) to support the speed claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of reproducibility and experimental validation. We address each point below and have updated the manuscript to incorporate the suggested clarifications and additional analysis.
read point-by-point responses
-
Referee: [§3] §3 (Method), around the STE insertion: the manuscript does not supply sufficient detail on the precise quantization thresholds, canvas packing layout, or the exact STE implementation (e.g., whether gradient clipping or scaling is applied). These choices directly affect whether the identity-gradient approximation remains informative for the highly nonlinear rate-distortion behavior of HEVC/AV1; without them the central claim that features adapt to real codec distortions cannot be fully evaluated.
Authors: We agree that more precise implementation details are required for full reproducibility and to allow readers to assess the STE approximation under nonlinear codecs. In the revised manuscript we have expanded §3 with the following: uniform quantization uses a fixed step size of 1/255 on normalized feature values in [0,1]; the canvas packing layout tiles the three feature planes (each 256×256×C) into a single 512×768 canvas with explicit row/column offsets and zero-padding to codec-friendly dimensions; the STE employs the identity function in the forward pass with straight-through gradient (no clipping or scaling applied, as empirical tests showed stable convergence). We also add a short discussion of why the identity approximation remains informative despite codec nonlinearity, supported by gradient-norm statistics collected during training. revision: yes
-
Referee: [§4] §4 (Experiments), rate-distortion curves and tables: the paper should include an ablation that trains the same triplanes codec-agnostically and then applies the identical quantization/packing/codec roundtrip at test time. If the reported gains largely disappear in this setting, the adaptation benefit attributed to STE training would be undermined.
Authors: We appreciate this suggestion for isolating the contribution of in-loop adaptation. We have added the requested ablation to §4: the identical triplane architecture is trained without any codec in the loop (codec-agnostic baseline) and then subjected to the exact same quantization, canvas packing, and codec roundtrip (JPEG/VP9/HEVC/AV1) at test time. The new results, presented in an additional row of Table 2 and as dashed curves in Figure 4, show a consistent drop in rate-distortion performance relative to CATRF (average BD-rate increase of 18–27 % across codecs). This confirms that the observed gains are attributable to feature adaptation during STE training rather than to the quantization/packing procedure alone. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents CATRF as an empirical compression framework that inserts a standard codec roundtrip into the training loop via straight-through estimator. All reported gains are obtained from direct comparisons against external codec-agnostic baselines, learned-codec baselines, and compressed 3DGS methods on static and dynamic benchmarks. No equations, fitted parameters, or self-citations are shown to reduce the claimed rate-distortion improvements to quantities defined on the same test data or to prior results by the same authors. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Johannes Ball ´e, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compression. InICLR, 2017. 3
work page 2017
-
[2]
Johannes Ball ´e, David Minnen, Saurabh Singh, Sung Jin Hwang Johnston, and Eero P. Simoncelli. Variational im- age compression with a scale hyperprior.IEEE Transactions on Image Processing, 2018. 3
work page 2018
-
[3]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio, Nicholas L ´eonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432, 2013. 3, 6, 4
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[4]
Low latency live streaming implementation in dash and hls
Abdelhak Bentaleb, Zhengdao Zhan, Farzad Tashtarian, May Lim, Saad Harous, Christian Timmerer, Hermann Hellwag- ner, and Roger Zimmermann. Low latency live streaming implementation in dash and hls. InProceedings of the 30th ACM International Conference on Multimedia, pages 7343– 7346, 2022. 1
work page 2022
-
[5]
Abdelhak Bentaleb, May Lim, Sarra Hammoudi, Saad Harous, and Roger Zimmermann. Solutions, challenges, and opportunities in volumetric video streaming: an architectural perspective.ACM Transactions on Multimedia Computing, Communications and Applications, 21(7):1–35, 2025. 1
work page 2025
-
[6]
Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc
Gisle Bjontegaard. Calculation of average psnr differences between rd-curves.ITU-T SG16, Doc. VCEG-M33, 2001. 7
work page 2001
-
[7]
Proxylessnas: Direct neural architecture search on target task and hardware
Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In ICLR, 2019. 3
work page 2019
-
[8]
Efficient geometry-aware 3d generative adversarial networks
Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16123–16133, 2022. 1, 3
work page 2022
-
[9]
Tensorf: Tensorial radiance fields
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean con- ference on computer vision, pages 333–350. Springer, 2022. 1, 3, 6, 7, 8, 5
work page 2022
-
[10]
Yihang Chen, Qianyi Wu, Mehrtash Harandi, and Jianfei Cai. How far can we compress instant-ngp-based nerf? In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20321–20330, 2024. 2, 3, 4, 6, 7, 8, 1, 5
work page 2024
-
[11]
Hac: Hash-grid assisted context for 3d gaussian splatting compression
Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InEuropean Conference on Computer Vision, pages 422–438. Springer, 2024. 1, 3
work page 2024
-
[12]
Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2, 3, 4, 7, 8, 5
work page 2025
-
[13]
High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG), 34(4):1–13,
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Den- nis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. High-quality streamable free-viewpoint video.ACM Transactions on Graphics (ToG), 34(4):1–13,
-
[14]
Binaryconnect: Training deep neural networks with binary weights during propagations
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. InNeurIPS, 2015. 3
work page 2015
-
[15]
Plenoxels: Radiance fields without neural networks
Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022. 3, 6
work page 2022
-
[16]
K-planes: Explicit radiance fields in space, time, and appearance
Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12479–12488, 2023. 3, 6, 7
work page 2023
-
[17]
Sharath Girish, Tianye Li, Amrita Mazumdar, Abhinav Shri- vastava, Shalini De Mello, et al. Queen: Quantized efficient encoding of dynamic gaussians for streaming free-viewpoint videos.Advances in Neural Information Processing Systems, 37:43435–43467, 2024. 3, 7
work page 2024
-
[18]
Danillo Graziosi, Ohji Nakagami, Satoru Kuma, Alexandre Zaghetto, Teruhiko Suzuki, and Ali Tabatabai. An overview of ongoing point cloud compression standardization activi- ties: Video-based (v-pcc) and geometry-based (g-pcc).AP- SIPA Transactions on Signal and Information Processing, 9: e13, 2020. 2
work page 2020
-
[19]
3dgen: Triplane la- tent diffusion for textured mesh generation.arXiv preprint arXiv:2303.05371,
Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, and Bar- las O˘guz. 3dgen: Triplane latent diffusion for textured mesh generation.arXiv preprint arXiv:2303.05371, 2023. 3
-
[20]
Vrvvc: Variable-rate nerf-based volumetric video compression
Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, and Yanfeng Wang. Vrvvc: Variable-rate nerf-based volumetric video compression. InProceedings of the AAAI Conference on Ar- tificial Intelligence, pages 3563–3571, 2025. 2, 3, 6
work page 2025
-
[21]
Berivan Isik, Onur G Guleryuz, Danhang Tang, Jonathan Taylor, and Philip A Chou. Sandwiched video compression: Efficiently extending the reach of standard codecs with neu- ral wrappers. In2023 IEEE International Conference on Im- age Processing (ICIP), pages 2055–2059. IEEE, 2023. 2, 3
work page 2055
-
[22]
Towards practical real-time neural video compression
Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, and Yan Lu. Towards practical real-time neural video compression. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12543–12552,
-
[23]
From capture to display: A survey on volumetric video
Yili Jin, Kaiyuan Hu, Junhua Liu, Fangxin Wang, and Xue Liu. From capture to display: A survey on volumetric video. arXiv preprint arXiv:2309.05658, 2023. 1
-
[24]
Codecnerf: Toward fast encoding and decoding, compact, and high-quality novel-view synthesis
Gyeongjin Kang, Younggeun Lee, Seungjun Oh, and Eun- byung Park. Codecnerf: Toward fast encoding and decoding, compact, and high-quality novel-view synthesis. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 4203–4211, 2025. 2
work page 2025
-
[25]
Plenoptic png: Real-time neural radiance fields in 150 kb
Jae Yong Lee, Yuqun Wu, Chuhang Zou, Derek Hoiem, and Shenlong Wang. Plenoptic png: Real-time neural radiance fields in 150 kb. In2025 International Conference on 3D Vision (3DV), pages 502–511. IEEE, 2025. 7
work page 2025
-
[26]
Ecrf: Entropy-constrained neural ra- diance fields compression with frequency domain optimiza- tion
Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, and Cornelius Hellge. Ecrf: Entropy-constrained neural ra- diance fields compression with frequency domain optimiza- tion. In2024 IEEE 26th International Workshop on Multi- media Signal Processing (MMSP), pages 1–6. IEEE, 2024. 2, 6, 5
work page 2024
-
[27]
Compression of 3d gaussian splatting with optimized feature planes and standard video codecs,
Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, and Cornelius Hellge. Compression of 3d gaussian splatting with optimized feature planes and standard video codecs. arXiv preprint arXiv:2501.03399, 2025. 2, 3
-
[28]
Gifstream: 4d gaussian-based immersive video with feature stream
Hao Li, Sicheng Li, Xiang Gao, Abudouaihati Batuer, Lu Yu, and Yiyi Liao. Gifstream: 4d gaussian-based immersive video with feature stream. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21761– 21770, 2025. 2, 3, 5, 7, 8
work page 2025
-
[29]
Neural video compression with feature modulation
Jiahao Li, Bin Li, and Yan Lu. Neural video compression with feature modulation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26099–26108, 2024. 2, 3
work page 2024
-
[30]
Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, and Ping Tan. Streaming radiance fields for 3d video synthe- sis.Advances in Neural Information Processing Systems, 35: 13485–13498, 2022. 7
work page 2022
-
[31]
Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, and Xiangyu Xu. Instant3d: Instant text- to-3d generation.International Journal of Computer Vision, 132(10):4456–4472, 2024. 3
work page 2024
-
[32]
Sicheng Li, Hao Li, Yiyi Liao, and Lu Yu. Nerfcodec: Neural feature compression meets neural radiance fields for memory-efficient scene representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21274–21283, 2024. 2, 3, 6, 7, 8, 1, 5
work page 2024
-
[33]
Neural 3d video synthesis from multi-view video
Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5521–5531, 2022. 6
work page 2022
-
[34]
Spacetime gaus- sian feature splatting for real-time dynamic view synthesis
Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaus- sian feature splatting for real-time dynamic view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8508–8520, 2024. 3, 7
work page 2024
-
[35]
Neural sparse voxel fields.Advances in Neural Information Processing Systems, 33:15651–15663,
Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. Neural sparse voxel fields.Advances in Neural Information Processing Systems, 33:15651–15663,
-
[36]
Efficient evaluation of quantization-effects in neural codecs.arXiv preprint arXiv:2502.04770, 2025
Wolfgang Mack, Ahmed Mustafa, Rafał Łaganowski, and Samer Hijazy. Efficient evaluation of quantization-effects in neural codecs.arXiv preprint arXiv:2502.04770, 2025. 6, 3, 4
-
[37]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 1, 6, 5
work page 2021
-
[38]
Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 3, 6
work page 2022
-
[39]
Compressed 3d gaussian splatting for accelerated novel view synthesis
Simon Niedermayr, Josef Stumpfegger, and R ¨udiger West- ermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10349–10358, 2024. 1
work page 2024
-
[40]
Holoportation: Virtual 3d teleportation in real-time
Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, et al. Holoportation: Virtual 3d teleportation in real-time. InPro- ceedings of the 29th annual symposium on user interface software and technology, pages 741–754, 2016. 1
work page 2016
-
[41]
Differentiable signal processing with black-box audio effects
Marco A Mart ´ınez Ram´ırez, Oliver Wang, Paris Smaragdis, and Nicholas J Bryan. Differentiable signal processing with black-box audio effects. InICASSP 2021-2021 IEEE Inter- national Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP), pages 66–70. IEEE, 2021. 3
work page 2021
-
[42]
Adaptive bitrate selection: A survey.IEEE Communications Surveys & Tutorials, 19(4):2985–3014, 2017
Yusuf Sani, Andreas Mauthe, and Christopher Edwards. Adaptive bitrate selection: A survey.IEEE Communications Surveys & Tutorials, 19(4):2985–3014, 2017. 2
work page 2017
-
[43]
Swings: sliding windows for dynamic 3d gaussian splatting
Richard Shaw, Michal Nazarczuk, Jifei Song, Arthur Moreau, Sibi Catley-Chandar, Helisa Dhamo, and Eduardo P´erez-Pellitero. Swings: sliding windows for dynamic 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 37–54. Springer, 2024. 1
work page 2024
-
[44]
Binary radiance fields.Ad- vances in neural information processing systems, 36:55919– 55931, 2023
Seungjoo Shin and Jaesik Park. Binary radiance fields.Ad- vances in neural information processing systems, 36:55919– 55931, 2023. 3, 6
work page 2023
-
[45]
3d neural field generation using triplane diffusion
J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, and Gordon Wetzstein. 3d neural field generation using triplane diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023. 3
work page 2023
-
[46]
The mpeg-dash standard for multimedia streaming over the internet.IEEE multimedia, 18(4):62–67,
Iraj Sodagar. The mpeg-dash standard for multimedia streaming over the internet.IEEE multimedia, 18(4):62–67,
-
[47]
Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023. 6, 7
work page 2023
-
[48]
James C Spall. An overview of the simultaneous perturba- tion method for efficient optimization.Johns Hopkins apl technical digest, 19(4):482–492, 1998. 6, 3, 4
work page 1998
-
[49]
Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction
Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5459– 5469, 2022. 3, 6, 7, 8
work page 2022
-
[50]
Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, and Wei Xing. 3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free- viewpoint videos. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20675–20685, 2024. 3, 7
work page 2024
-
[51]
Videorf: Ren- dering dynamic radiance fields as 2d feature video streams
Liao Wang, Kaixin Yao, Chengcheng Guo, Zhirui Zhang, Qiang Hu, Jingyi Yu, Lan Xu, and Minye Wu. Videorf: Ren- dering dynamic radiance fields as 2d feature video streams. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 470–481, 2024. 2
work page 2024
-
[52]
Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, and Lan Xu. Vˆ 3: View- ing volumetric videos on mobiles via streamable 2d dynamic gaussians.ACM Transactions on Graphics (TOG), 43(6):1– 13, 2024. 2
work page 2024
-
[53]
4d gaussian splatting for real-time dynamic scene rendering
Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 20310–20320, 2024. 1
work page 2024
-
[54]
Multi-view neural human rendering
Minye Wu, Yuehao Wang, Qiang Hu, and Jingyi Yu. Multi-view neural human rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1682–1691, 2020. 6
work page 2020
-
[55]
Tetrirf: Temporal tri-plane radiance fields for efficient free-viewpoint video
Minye Wu, Zehao Wang, Georgios Kouros, and Tinne Tuyte- laars. Tetrirf: Temporal tri-plane radiance fields for efficient free-viewpoint video. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 6487–6496, 2024. 2, 3, 5, 6, 7
work page 2024
-
[56]
Ningfeng Yang and Tor M Aamodt. Improving the straight- through estimator with zeroth-order information.arXiv preprint arXiv:2510.23926, 2025. 3
-
[57]
Neural adaptive content-aware internet video delivery
Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han. Neural adaptive content-aware internet video delivery. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 645– 661, 2018. 2
work page 2018
-
[58]
Nemo: enabling neural-enhanced video streaming on commodity mobile devices
Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. InProceedings of the 26th Annual International Conference on Mobile Com- puting and Networking, pages 1–14, 2020. 2
work page 2020
-
[59]
Rate-aware compression for nerf- based volumetric video
Zhiyu Zhang, Guo Lu, Huanxiong Liang, Zhengxue Cheng, Anni Tang, and Li Song. Rate-aware compression for nerf- based volumetric video. InProceedings of the 32nd ACM International Conference on Multimedia, pages 3974–3983,
-
[60]
Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, and Yanfeng Wang. Jointrf: end-to- end joint optimization for dynamic neural radiance field rep- resentation and compression. In2024 IEEE International Conference on Image Processing (ICIP), pages 3292–3298. IEEE, 2024. 3, 6 CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetr...
work page 2024
-
[61]
Sample a frame indext, rays{(o,d)}, and camera poseπ
-
[62]
If the cache is empty org−g cache ≥M, markrefreshas true
-
[63]
If the relative change between the current(P ax t , Dt)and their cached snapshots exceeds a thresholdϵ, also markrefreshas true
-
[64]
Ifrefreshis true, then run the encode–decode codec round trip (as illustrated in Fig. 2) for the video segment
-
[65]
(b) For density: bDt ←cached decoded density, eDt ← bDt + Dt −detach(D t)
STE substitution using cached reconstructions: (a) For each axisax∈ {xy, xz, yz}: bP ax t ←cached decoded plane, eP ax t ← bP ax t + P ax t −detach(P ax t ) . (b) For density: bDt ←cached decoded density, eDt ← bDt + Dt −detach(D t)
-
[66]
Render and compute losses: I← R( ePt,eDt, π;ϕ), Tab. 4 shows that increasing the refresh intervalMyields a favorable trade-off between accuracy and training efficiency. It suggests that relatively infrequent cache updates (e.g.,M= 128) already capture most of the benefit of SCL training, while keeping the overhead of expensive codec round trips manageable...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.