4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting

Chun-Tin Wu; Jun-Cheng Chen

arxiv: 2511.00560 · v2 · submitted 2025-11-01 · 💻 cs.CV

4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting

Chun-Tin Wu , Jun-Cheng Chen This is my paper

Pith reviewed 2026-05-18 01:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords dynamic scene rendering4D neural voxelsgaussian splattingdeformation fieldsnovel view synthesismemory efficient renderingreal-time renderingview refinement

0 comments

The pith

A compact neural voxel grid with learned deformation fields models dynamic scenes without replicating Gaussians per timestamp.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to extend efficient 3D Gaussian Splatting to moving scenes by replacing the usual practice of storing separate Gaussian sets for each time step. It does this through one compact collection of neural voxels whose positions and properties change according to learned deformation fields that track scene motion. A reader would care because the standard approach quickly exhausts memory and slows training as sequence length grows, while this design aims to keep memory and compute costs low. The method adds a selective refinement stage that targets difficult viewpoints for extra optimization passes. If the central idea holds, high-quality novel-view rendering of dynamic content becomes practical for longer sequences and more modest hardware.

Core claim

Instead of generating separate Gaussian sets per timestamp, our method employs a compact set of neural voxels with learned deformation fields to model temporal dynamics. The design greatly reduces memory consumption and accelerates training while preserving high image quality. We further introduce a novel view refinement stage that selectively improves challenging viewpoints through targeted optimization, maintaining global efficiency while enhancing rendering quality for difficult viewing angles.

What carries the argument

compact neural voxel grid paired with learned deformation fields that warp voxel properties across time to capture scene motion

If this is right

Memory usage stays roughly constant with sequence length instead of growing linearly with the number of frames.
Training completes faster than methods that optimize independent Gaussians at every timestamp.
Real-time rendering remains possible at high visual quality for dynamic content.
Targeted refinement improves quality at hard viewpoints without re-optimizing the entire model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same compact voxel-plus-deformation structure could support forward prediction of future frames by extrapolating the learned fields.
Similar compression might apply to related problems such as 4D scene editing or light-field video synthesis.
If deformation fields prove reusable across similar scenes, the approach could reduce the need for per-scene training from scratch.

Load-bearing premise

A single compact neural voxel grid plus learned deformation fields can faithfully capture all scene motion without requiring per-frame Gaussian replication or suffering noticeable quality loss on complex dynamics.

What would settle it

Rendering a test sequence with rapid non-rigid motion and measuring whether 4D-NVS produces noticeably lower PSNR or visible artifacts relative to per-frame Gaussian baselines would directly test the claim.

Figures

Figures reproduced from arXiv: 2511.00560 by Chun-Tin Wu, Jun-Cheng Chen.

**Figure 1.** Figure 1: Our approach demonstrates remarkable memory efficiency and training speed, while achieving superior image quality [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Pipeline overview: (1) Initialize with voxel-based Gaussian splatting, (2) Generate neural Gaussians with temporal information, (3) Apply HexPlane temporal corrections, (4) Optimize with color loss, total variation loss, and scaling regularization, (5) View refinement stage for underperforming viewpoints through adaptive densification. 3. Preliminaries 3.1. 3D Gaussian Splatting 3D Gaussian Splatting (3D-G… view at source ↗

**Figure 3.** Figure 3: Visual comparisons of the proposed method on the HyperNeRF dataset with other methods. The proposed method achieves better [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of the Neu3D dataset compared with other methods. From the visual illustration shown in the top and bottom left, [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Continuous Frames on HyperNeRF Dataset compared with 4DGS. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Left: with Lvol Right: w/o. Lvol [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Left: w/o. view refinement. Right:with view refinement. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Although 3D Gaussian Splatting (3D-GS) achieves efficient rendering for novel view synthesis, extending it to dynamic scenes still results in substantial memory overhead from replicating Gaussians across frames. To address this challenge, we propose 4D Neural Voxel Splatting (4D-NVS), which combines voxel-based representations with neural Gaussian splatting for efficient dynamic scene modeling. Instead of generating separate Gaussian sets per timestamp, our method employs a compact set of neural voxels with learned deformation fields to model temporal dynamics. The design greatly reduces memory consumption and accelerates training while preserving high image quality. We further introduce a novel view refinement stage that selectively improves challenging viewpoints through targeted optimization, maintaining global efficiency while enhancing rendering quality for difficult viewing angles. Experiments demonstrate that our method outperforms state-of-the-art approaches with significant memory reduction and faster training, enabling real-time rendering with superior visual fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes neural voxels plus deformation fields to avoid per-frame Gaussian copies in dynamic scenes, but the abstract gives no numbers to back the efficiency and quality claims.

read the letter

The main point is that 4D Neural Voxel Splatting replaces duplicated Gaussians per timestamp with a compact neural voxel grid and learned deformation fields, then adds a selective view refinement stage for difficult angles. If the results check out, this could trim memory and training time in dynamic novel-view synthesis without a full redesign of the pipeline. The combination of voxels for structure and deformations for time is a straightforward engineering step that builds on existing 3D Gaussian Splatting and dynamic scene work. It correctly identifies the memory overhead as the core pain point and tries to solve it with a single shared representation rather than replication. The refinement idea also shows attention to keeping global costs low while fixing local quality issues. That part earns some credit for practical focus. The soft spots sit in the missing evidence. The abstract asserts outperformance, lower memory, faster training, and preserved fidelity, yet supplies no PSNR numbers, no ablations on deformation parameters or voxel resolution, and no discussion of failure modes. The assumption that a fixed voxel grid with per-voxel deformations can capture complex non-rigid motion, large displacements, or topology changes without artifacts remains untested in the provided text. The stress-test concern about underfitting on high-frequency dynamics looks reasonable given the lack of regularization details or example breakdowns. This work is for graphics researchers already using Gaussian splatting who want lighter dynamic extensions for real-time pipelines. A reader familiar with the prior literature could extract the architectural choices as a useful reference even if the quantitative claims need checking. I would send it for peer review so the experiments and motion-handling analysis can be evaluated properly.

Referee Report

2 major / 2 minor

Summary. The paper proposes 4D Neural Voxel Splatting (4D-NVS) for dynamic scene rendering. It replaces per-timestamp replication of Gaussians with a compact neural voxel grid and learned deformation fields to model temporal dynamics, claiming substantial memory reduction and faster training while preserving image quality. A view refinement stage is added to selectively optimize challenging viewpoints. Experiments are stated to show outperformance over prior methods with reduced memory footprint and accelerated training, supporting real-time rendering.

Significance. If the central claims hold under rigorous validation, the work would meaningfully advance efficient 4D extensions of Gaussian Splatting by offering a compact alternative to per-frame replication. The neural voxel plus deformation-field design directly targets the memory bottleneck, and the view refinement mechanism provides a practical efficiency-quality trade-off. Credit is due for focusing on reproducible efficiency metrics and for attempting a parameter-light temporal model.

major comments (2)

[§3.2] §3.2 (Deformation Field): the formulation of the learned deformation field on a fixed neural voxel lattice lacks explicit analysis or regularization for large displacements, topology changes, or high-frequency non-rigid motion. Without failure-case experiments or resolution scaling studies, it remains unclear whether the single-grid representation can faithfully capture all dynamics without quality loss or temporal artifacts, directly bearing on the memory-reduction claim.
[Table 4] Table 4 (Ablation on deformation parameters): the reported PSNR gains are shown only for moderate-motion sequences; no quantitative breakdown is given for scenes with rapid non-rigid motion or topology variation, leaving the central assumption that the compact voxel grid suffices untested at the load-bearing boundary cases.

minor comments (2)

[Abstract] The abstract asserts quantitative superiority and memory reduction but supplies no concrete numbers; adding key metrics (e.g., PSNR, memory in MB, training time) would strengthen the summary.
[§3] Notation for the neural voxel grid resolution and deformation MLP depth is introduced without a consolidated table; a single reference table would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of the deformation field formulation and the scope of the ablation studies. We respond to each point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (Deformation Field): the formulation of the learned deformation field on a fixed neural voxel lattice lacks explicit analysis or regularization for large displacements, topology changes, or high-frequency non-rigid motion. Without failure-case experiments or resolution scaling studies, it remains unclear whether the single-grid representation can faithfully capture all dynamics without quality loss or temporal artifacts, directly bearing on the memory-reduction claim.

Authors: We appreciate the referee's point on the need for more explicit analysis. The fixed neural voxel lattice is chosen to enforce compactness while the deformation field is learned end-to-end; the voxel structure itself provides a form of spatial regularization. Our experiments on standard dynamic benchmarks show stable temporal coherence without prominent artifacts. In the revised manuscript we will expand §3.2 with a dedicated limitations paragraph discussing behavior under large displacements and topology changes, include qualitative failure-case examples drawn from the existing test set, and add a short resolution-scaling study that reports PSNR and training time across grid resolutions. These additions will better substantiate the memory-reduction claim. revision: yes
Referee: [Table 4] Table 4 (Ablation on deformation parameters): the reported PSNR gains are shown only for moderate-motion sequences; no quantitative breakdown is given for scenes with rapid non-rigid motion or topology variation, leaving the central assumption that the compact voxel grid suffices untested at the load-bearing boundary cases.

Authors: Table 4 reports results on the primary evaluation sequences, which already contain a range of motion speeds. We agree that an explicit stratification by motion type would increase transparency. In the revision we will augment the table (or add a supplementary breakdown) with PSNR values separated into moderate-motion and higher-motion subsets using the sequences already present in our evaluation. Because our current test sets do not contain extreme topology-changing examples, we will note this as a boundary condition rather than claiming universal coverage. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces 4D-NVS as a new architectural combination of neural voxel grids and learned deformation fields to model dynamics without per-timestamp Gaussian replication. The abstract and method description present this as an independent design choice that reduces memory while preserving quality, with no equations, fitted parameters, or predictions shown to reduce by construction to inputs or prior self-citations. No load-bearing uniqueness theorems, ansatzes, or renamings from the authors' own prior work are invoked in the provided text. The central claim remains an empirical modeling assumption open to external validation rather than a definitional tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The approach rests on the unstated premise that deformation fields learned on a voxel grid can substitute for explicit per-frame Gaussians without introducing systematic artifacts on real-world motion.

free parameters (1)

deformation field parameters
Learned weights that control how voxels move over time; their count and initialization are not specified in the abstract.

axioms (1)

domain assumption Voxel grid plus deformation fields suffice to represent arbitrary scene dynamics
Invoked when the paper states that the compact neural voxel set models temporal dynamics without per-timestamp replication.

invented entities (1)

neural voxels no independent evidence
purpose: Compact shared representation that replaces duplicated Gaussians across time
New entity introduced to achieve memory reduction; no independent falsifiable prediction given in abstract.

pith-pipeline@v0.9.0 · 5684 in / 1244 out tokens · 25378 ms · 2026-05-18T01:47:20.038986+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

compact set of neural voxels with learned deformation fields to model temporal dynamics... O(f V+F) memory complexity
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HexPlane decomposition... six 2D planes: three spatial... three space-time

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Hyperreel: High-fidelity 6-dof video with ray- conditioned sampling.arXiv preprint arXiv:2301.02238,

Benjamin Attal, Jia-Bin Huang, Christian Richardt, Michael Zollh¨ofer, Johannes Kopf, Matthew O’Toole, and Changil Kim. Hyperreel: High-fidelity 6-dof video with ray- conditioned sampling.arXiv preprint arXiv:2301.02238,

work page arXiv
[2]

Hexplane: A fast representa- tion for dynamic scenes

Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 130–141, 2023. 2, 6

work page 2023
[3]

Fast dynamic radiance fields with time-aware neural voxels

Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xi- aopeng Zhang, Wenyu Liu, Matthias Nießner, and Qi Tian. Fast dynamic radiance fields with time-aware neural voxels. InSIGGRAPH Asia 2022 Conference Papers, 2022. 6

work page 2022
[4]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (ToG), 42(4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (ToG), 42(4):1–14, 2023. 1, 2, 3, 6

work page 2023
[5]

Fully explicit dynamic guassian splat- ting

Junoh Lee, ChangYeon Won, Hyunjun Jung, Inhwan Bae, and Hae-Gon Jeon. Fully explicit dynamic guassian splat- ting. InProceedings of the Neural Information Processing Systems, 2024. 1, 2, 6

work page 2024
[6]

Neural 3d video synthesis from multi-view video

Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5521–5531,

work page
[7]

Neural scene flow fields for space-time view synthesis of dy- namic scenes

Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. Neural scene flow fields for space-time view synthesis of dy- namic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 2

work page 2021
[8]

Spacetime gaus- sian feature splatting for real-time dynamic view synthesis

Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaus- sian feature splatting for real-time dynamic view synthesis. arXiv preprint arXiv:2312.16812, 2023. 2

work page arXiv 2023
[9]

High-fidelity and real-time novel view synthesis for dynamic scenes

Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hu- jun Bao, and Xiaowei Zhou. High-fidelity and real-time novel view synthesis for dynamic scenes. InSIGGRAPH Asia Conference Proceedings, 2023. 6

work page 2023
[10]

Scaffold-gs: Structured 3d gaussians for view-adaptive rendering

Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20654–20664, 2024. 2, 3, 4, 6

work page 2024
[11]

3d geometry-aware deformable gaussian splatting for dynamic view synthesis

Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Ming Yang, Xiao Tang, Feng Zhu, and Yuchao Dai. 3d geometry-aware deformable gaussian splatting for dynamic view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 2, 6

work page 2024
[12]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

work page 2020
[13]

Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans. Graph., 41(4):102:1– 102:15, 2022. 1, 2

work page 2022
[14]

Barron, Sofien Bouaziz, Dan B

Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B. Goldman, Steven M. Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5865–5874, 2021. 2, 6

work page 2021
[15]

Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M

Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M. Seitz. Hypernerf: A higher- dimensional representation for topologically varying neural radiance fields.ACM Trans. Graph., 40(6), 2021. 5, 6

work page 2021
[16]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil and Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InCVPR, 2022. 2

work page 2022
[17]

K-planes: Explicit radiance fields in space, time, and appearance

Sara Fridovich-Keil and Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023. 2, 4, 6

work page 2023
[18]

Structure-from-motion revisited

Johannes Lutz Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016. 4

work page 2016
[19]

Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023

Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023. 6

work page 2023
[20]

Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction.arXiv preprint arXiv:2306.01496, 2023

Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction.arXiv preprint arXiv:2306.01496, 2023. 1

work page arXiv 2023
[21]

Sparse voxels rasterization: Real- time high-fidelity radiance field rendering

Cheng Sun, Jaesung Choe, Charles Loop, Wei-Chiu Ma, and Yu-Chiang Frank Wang. Sparse voxels rasterization: Real-time high-fidelity radiance field rendering.ArXiv, abs/2412.04459, 2024. 2

work page arXiv 2024
[22]

Masked space-time hash encoding for efficient dynamic scene reconstruction

Feng Wang, Zilong Chen, Guokang Wang, Yafei Song, and Huaping Liu. Masked space-time hash encoding for efficient dynamic scene reconstruction. InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2023. 6

work page 2023
[23]

Freetimegs: Free gaussian primitives at anytime any- where for dynamic scene reconstruction

Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhan- hua Zhang, Yong Chen, Hujun Bao, Sida Peng, and Xiaowei Zhou. Freetimegs: Free gaussian primitives at anytime any- where for dynamic scene reconstruction. InCVPR, 2025. 1, 2

work page 2025
[24]

4d gaussian splatting for real-time dynamic scene render- ing

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene render- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20310– 20320, 2024. 1, 2, 6

work page 2024
[25]

Representing long volumet- ric video with temporal gaussian hierarchy.ACM Transac- tions on Graphics, 43(6), 2024

Zhen Xu, Yinghao Xu, Zhiyuan Yu, Sida Peng, Jiaming Sun, Hujun Bao, and Xiaowei Zhou. Representing long volumet- ric video with temporal gaussian hierarchy.ACM Transac- tions on Graphics, 43(6), 2024. 1, 2

work page 2024
[26]

Real- time photorealistic dynamic scene representation and render- ing with 4d gaussian splatting

Zeyu Yang, Hongye Yang, Zijie Pan, and Li Zhang. Real- time photorealistic dynamic scene representation and render- ing with 4d gaussian splatting. InInternational Conference on Learning Representations (ICLR), 2024. 6 4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting Supplementary Material

work page 2024
[27]

Introduction In the supplementary material, we provide additional de- tails and videos on our hyperparameter settings in 8

Supplementary Material 7.1. Introduction In the supplementary material, we provide additional de- tails and videos on our hyperparameter settings in 8. More qualitative results are presented in 9, further ablation study results are discussed in 9.1, and additional discussions are included in 10

work page
[28]

Gaussian Generation The following learning rates are configured for the Gaussian generation process

Hyperparameters 8.1. Gaussian Generation The following learning rates are configured for the Gaussian generation process. Offset.The learning rate for the offset vector starts at 1×10 −2 and decays to1×10 −5. Opacity.The learning rate for MLP with opacity starts at2×10 −3 and decreases to2×10 −6. Covariance.This includes rotation and scaling. The learning...

work page
[29]

Appendix 1:Comparison of our method with 4D-GS on the HyperNeRF-Interp dataset

Results Since we are unable to render videos in the main paper, this section includes several videos comparing our method to 4D-GS as well as additional videos demonstrating our per- formance in photorealistic rendering. Appendix 1:Comparison of our method with 4D-GS on the HyperNeRF-Interp dataset. Appendix 2Rendered scenes in HyperNeRF showcas- ing our ...

work page
[30]

Discussions 10.1. Limitations of the Current Approach Although our 4D Neural V oxel Splatting method achieves significant improvements in memory efficiency, training speed, and rendering quality, there are still limitations. For example, dynamic scenes with large motions or signifi- cant occlusions present challenges for Gaussian generation and deformatio...

work page

[1] [1]

Hyperreel: High-fidelity 6-dof video with ray- conditioned sampling.arXiv preprint arXiv:2301.02238,

Benjamin Attal, Jia-Bin Huang, Christian Richardt, Michael Zollh¨ofer, Johannes Kopf, Matthew O’Toole, and Changil Kim. Hyperreel: High-fidelity 6-dof video with ray- conditioned sampling.arXiv preprint arXiv:2301.02238,

work page arXiv

[2] [2]

Hexplane: A fast representa- tion for dynamic scenes

Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 130–141, 2023. 2, 6

work page 2023

[3] [3]

Fast dynamic radiance fields with time-aware neural voxels

Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xi- aopeng Zhang, Wenyu Liu, Matthias Nießner, and Qi Tian. Fast dynamic radiance fields with time-aware neural voxels. InSIGGRAPH Asia 2022 Conference Papers, 2022. 6

work page 2022

[4] [4]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (ToG), 42(4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics (ToG), 42(4):1–14, 2023. 1, 2, 3, 6

work page 2023

[5] [5]

Fully explicit dynamic guassian splat- ting

Junoh Lee, ChangYeon Won, Hyunjun Jung, Inhwan Bae, and Hae-Gon Jeon. Fully explicit dynamic guassian splat- ting. InProceedings of the Neural Information Processing Systems, 2024. 1, 2, 6

work page 2024

[6] [6]

Neural 3d video synthesis from multi-view video

Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5521–5531,

work page

[7] [7]

Neural scene flow fields for space-time view synthesis of dy- namic scenes

Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. Neural scene flow fields for space-time view synthesis of dy- namic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 2

work page 2021

[8] [8]

Spacetime gaus- sian feature splatting for real-time dynamic view synthesis

Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaus- sian feature splatting for real-time dynamic view synthesis. arXiv preprint arXiv:2312.16812, 2023. 2

work page arXiv 2023

[9] [9]

High-fidelity and real-time novel view synthesis for dynamic scenes

Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hu- jun Bao, and Xiaowei Zhou. High-fidelity and real-time novel view synthesis for dynamic scenes. InSIGGRAPH Asia Conference Proceedings, 2023. 6

work page 2023

[10] [10]

Scaffold-gs: Structured 3d gaussians for view-adaptive rendering

Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20654–20664, 2024. 2, 3, 4, 6

work page 2024

[11] [11]

3d geometry-aware deformable gaussian splatting for dynamic view synthesis

Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Ming Yang, Xiao Tang, Feng Zhu, and Yuchao Dai. 3d geometry-aware deformable gaussian splatting for dynamic view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 2, 6

work page 2024

[12] [12]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

work page 2020

[13] [13]

Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans. Graph., 41(4):102:1– 102:15, 2022. 1, 2

work page 2022

[14] [14]

Barron, Sofien Bouaziz, Dan B

Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B. Goldman, Steven M. Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5865–5874, 2021. 2, 6

work page 2021

[15] [15]

Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M

Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M. Seitz. Hypernerf: A higher- dimensional representation for topologically varying neural radiance fields.ACM Trans. Graph., 40(6), 2021. 5, 6

work page 2021

[16] [16]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil and Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InCVPR, 2022. 2

work page 2022

[17] [17]

K-planes: Explicit radiance fields in space, time, and appearance

Sara Fridovich-Keil and Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023. 2, 4, 6

work page 2023

[18] [18]

Structure-from-motion revisited

Johannes Lutz Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016. 4

work page 2016

[19] [19]

Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023

Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023. 6

work page 2023

[20] [20]

Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction.arXiv preprint arXiv:2306.01496, 2023

Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction.arXiv preprint arXiv:2306.01496, 2023. 1

work page arXiv 2023

[21] [21]

Sparse voxels rasterization: Real- time high-fidelity radiance field rendering

Cheng Sun, Jaesung Choe, Charles Loop, Wei-Chiu Ma, and Yu-Chiang Frank Wang. Sparse voxels rasterization: Real-time high-fidelity radiance field rendering.ArXiv, abs/2412.04459, 2024. 2

work page arXiv 2024

[22] [22]

Masked space-time hash encoding for efficient dynamic scene reconstruction

Feng Wang, Zilong Chen, Guokang Wang, Yafei Song, and Huaping Liu. Masked space-time hash encoding for efficient dynamic scene reconstruction. InAdvances in Neural Infor- mation Processing Systems (NeurIPS), 2023. 6

work page 2023

[23] [23]

Freetimegs: Free gaussian primitives at anytime any- where for dynamic scene reconstruction

Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhan- hua Zhang, Yong Chen, Hujun Bao, Sida Peng, and Xiaowei Zhou. Freetimegs: Free gaussian primitives at anytime any- where for dynamic scene reconstruction. InCVPR, 2025. 1, 2

work page 2025

[24] [24]

4d gaussian splatting for real-time dynamic scene render- ing

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene render- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20310– 20320, 2024. 1, 2, 6

work page 2024

[25] [25]

Representing long volumet- ric video with temporal gaussian hierarchy.ACM Transac- tions on Graphics, 43(6), 2024

Zhen Xu, Yinghao Xu, Zhiyuan Yu, Sida Peng, Jiaming Sun, Hujun Bao, and Xiaowei Zhou. Representing long volumet- ric video with temporal gaussian hierarchy.ACM Transac- tions on Graphics, 43(6), 2024. 1, 2

work page 2024

[26] [26]

Real- time photorealistic dynamic scene representation and render- ing with 4d gaussian splatting

Zeyu Yang, Hongye Yang, Zijie Pan, and Li Zhang. Real- time photorealistic dynamic scene representation and render- ing with 4d gaussian splatting. InInternational Conference on Learning Representations (ICLR), 2024. 6 4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting Supplementary Material

work page 2024

[27] [27]

Introduction In the supplementary material, we provide additional de- tails and videos on our hyperparameter settings in 8

Supplementary Material 7.1. Introduction In the supplementary material, we provide additional de- tails and videos on our hyperparameter settings in 8. More qualitative results are presented in 9, further ablation study results are discussed in 9.1, and additional discussions are included in 10

work page

[28] [28]

Gaussian Generation The following learning rates are configured for the Gaussian generation process

Hyperparameters 8.1. Gaussian Generation The following learning rates are configured for the Gaussian generation process. Offset.The learning rate for the offset vector starts at 1×10 −2 and decays to1×10 −5. Opacity.The learning rate for MLP with opacity starts at2×10 −3 and decreases to2×10 −6. Covariance.This includes rotation and scaling. The learning...

work page

[29] [29]

Appendix 1:Comparison of our method with 4D-GS on the HyperNeRF-Interp dataset

Results Since we are unable to render videos in the main paper, this section includes several videos comparing our method to 4D-GS as well as additional videos demonstrating our per- formance in photorealistic rendering. Appendix 1:Comparison of our method with 4D-GS on the HyperNeRF-Interp dataset. Appendix 2Rendered scenes in HyperNeRF showcas- ing our ...

work page

[30] [30]

Discussions 10.1. Limitations of the Current Approach Although our 4D Neural V oxel Splatting method achieves significant improvements in memory efficiency, training speed, and rendering quality, there are still limitations. For example, dynamic scenes with large motions or signifi- cant occlusions present challenges for Gaussian generation and deformatio...

work page