pith. sign in

arxiv: 2606.02096 · v1 · pith:Q23OD2SKnew · submitted 2026-06-01 · 💻 cs.CV

WebSpline: Structure-Informed Splines for Real-Time 3D Gaussians from Monocular Videos

Pith reviewed 2026-06-28 15:20 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian splattingdynamic scene reconstructionmonocular videocubic Hermite splinestructural proxy graphreal-time renderingtemporal rigidity
0
0 comments X

The pith

WebSpline models dynamic Gaussian trajectories with learnable cubic Hermite splines organized by a Structural Proxy Graph to enable high-fidelity monocular video reconstruction and fast rendering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to reconstruct dynamic 3D scenes from monocular videos by maintaining global structural coherence while preserving local details despite limited viewpoints. It does this through a Structure-Informed Spline representation in which each Gaussian's path follows a cubic Hermite spline whose motion is constrained by an auxiliary Structural Proxy Graph. The framework runs in two stages: the graph is first built from 2D point tracks and refined with temporal rigidity regularization, after which the splines are initialized from the graph and optimized under spatial and structural constraints. Because motion at inference requires only spline evaluation rather than full optimization, rendering becomes substantially faster while quality remains competitive. A sympathetic reader would care because this parametric approach could make real-time dynamic reconstruction feasible from ordinary single-camera footage.

Core claim

The central claim is that representing each dynamic Gaussian trajectory as a learnable cubic Hermite spline whose motion parameters are structurally organized by an auxiliary Structural Proxy Graph allows the entire system to be optimized in two stages from monocular input: the graph is initialized from 2D tracks and refined via temporal rigidity regularization to enforce coherence across the sequence, the splines are then initialized from the refined graph and further optimized under spatial and structural neighborhood constraints, and at inference Gaussian motion is obtained solely by evaluating the learned splines, producing both high rendering quality and speeds over ten times faster tha

What carries the argument

The Structure-Informed Spline (SIS) representation: a learnable cubic Hermite spline for each Gaussian trajectory whose motion is organized by an auxiliary Structural Proxy Graph (SPG).

If this is right

  • The SPG initialization and rigidity regularization step produces structural coherence for moving objects throughout the monocular sequence.
  • Subsequent optimization of the SIS under spatial and structural constraints yields high-fidelity Gaussian reconstructions.
  • Direct evaluation of the learned SIS at inference time produces rendering speeds more than ten times higher than WorldTree while matching or exceeding its quality on the iPhone and NVIDIA datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the SPG reliably encodes rigidity, the same two-stage pipeline could be applied to multi-object scenes without requiring explicit object segmentation.
  • Replacing per-frame optimization with spline evaluation might reduce compute in other dynamic Gaussian methods that currently rely on dense temporal supervision.
  • Extending the temporal rigidity term to handle longer sequences would test whether the current regularization remains stable when drift accumulates.

Load-bearing premise

That initializing the Structural Proxy Graph from 2D point tracks and refining it with temporal rigidity regularization is sufficient to establish structural coherence for moving objects across the entire monocular sequence.

What would settle it

A monocular video of a non-rigidly deforming object where the reconstructed trajectories produce visibly inconsistent object shapes or broken structural connections after the two-stage optimization.

Figures

Figures reproduced from arXiv: 2606.02096 by Jeonghwan Yun, Jongmin Park, Minh-Quan Viet Bui, Munchurl Kim.

Figure 1
Figure 1. Figure 1: WebSpline achieves high-quality dynamic scene reconstruction with fast rendering from monocular videos. (a) Qualitative comparison with state-of-the-art methods on novel view synthesis (top) and visualized Gaussian trajectories (bottom). (b) WebSpline achieves the best rendering quality while rendering over 10× faster than the second-best method. Abstract Dynamic scene reconstruction from monocular videos … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of WebSpline. WebSpline models each dynamic Gaussian trajectory using the Structure-Informed Spline (SIS) representation, initialized from the Structural Proxy Graph (SPG). For SIS optimization, we define two types of neighborhoods for each dynamic Gaussian, spatial and structural, to enforce coherent spline motion while capturing fine-grained dynamics. 3.2 Structure-Informed Splines (SIS) for Dyn… view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparisons for novel view synthesis on the iPhone dataset [9]. complex camera and object motions. The NVIDIA dataset includes 7 scenes with diverse motion patterns, originally recorded with a 12-camera rig. Following the protocol of RoDynRF [29], we simulate a monocular video sequence by selecting one camera view per time step. Metrics. For the iPhone dataset [9], we follow DyCheck [9] and report c… view at source ↗
Figure 4
Figure 4. Figure 4: Visual comparisons for novel view synthesis on the NVIDIA dataset [49] [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual comparisons of dynamic Gaussian trajectories on the iPhone dataset [9]. motion representation. Notably, WebSpline achieves 278 FPS on the iPhone dataset, rendering more than 10× faster than WorldTree [43] that is the second-best method in rendering quality but achieves only 27 FPS in rendering speed. While WorldTree [43] computes each Gaussian motion by blending neighboring scaffold-node motions usi… view at source ↗
Figure 6
Figure 6. Figure 6: Visual results of ablation study on the iPhone dataset [9]. Loss Functions [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distributions of LSD Values. We use the paper-windmill scene of the iPhone dataset [9]. ‘20+’ indicates all LSD values above 20 are accumulated at the LSD value of 20. Analysis of Local Structural Distortion (LSD). To further analyze the structural consistency of WebSpline, [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparisons for novel view synthesis on the apple scene from the iPhone dataset [9] [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual comparisons for novel view synthesis on the block scene from the iPhone dataset [9]. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visual comparisons for novel view synthesis on the paper-windmill scene from the iPhone dataset [9] [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visual comparisons for novel view synthesis on the spin scene from the iPhone dataset [9]. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visual comparisons for novel view synthesis on the wheel scene from the iPhone dataset [9]. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
read the original abstract

Dynamic scene reconstruction from monocular videos remains highly challenging, as existing methods often struggle to balance global structural coherence and local fine-grained details under limited multi-view cues. To address this challenge, we propose WebSpline, a novel dynamic 3D Gaussian framework that enables structurally coherent and high-fidelity reconstruction from monocular videos with fast rendering. The core of WebSpline is the Structure-Informed Spline (SIS) representation, which models each dynamic Gaussian trajectory using a learnable cubic Hermite spline whose motion is structurally organized with an auxiliary Structural Proxy Graph (SPG). The proposed framework is optimized in two stages: (i) in the first stage, the SPG is initialized from 2D point tracks and refined with temporal rigidity regularization to establish structural coherence for moving objects across the sequence; and (ii) in the second stage, the SIS representation is initialized from the refined SPG and optimized under both spatial and structural neighborhood constraints. At inference, Gaussian motion is obtained solely by evaluating the learned SIS, enabling fast rendering. Extensive experiments on the challenging monocular dynamic scene benchmarks, iPhone and NVIDIA, demonstrate that our WebSpline achieves state-of-the-art rendering quality while rendering over 10 times faster than WorldTree, the second-best method on the iPhone dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce WebSpline, a novel dynamic 3D Gaussian framework for structurally coherent and high-fidelity reconstruction from monocular videos. It uses a Structure-Informed Spline (SIS) representation based on learnable cubic Hermite splines organized by a Structural Proxy Graph (SPG). The framework is optimized in two stages: SPG initialization from 2D point tracks with temporal rigidity regularization, followed by SIS optimization under spatial and structural constraints. Inference uses only the SIS for fast rendering. Experiments on iPhone and NVIDIA datasets show SOTA rendering quality and over 10 times faster rendering than WorldTree on the iPhone dataset.

Significance. If the results hold, this work would be significant as it addresses a key challenge in dynamic scene reconstruction by balancing global structural coherence and local details in monocular settings while enabling real-time rendering. The integration of spline-based trajectories with a structural graph proxy is a creative approach that could influence future methods in 3D Gaussian splatting for dynamic scenes. The reported speedup is particularly notable for practical applications.

major comments (2)
  1. [Method section] Method, stage (i): The assumption that SPG initialization from 2D point tracks plus temporal rigidity regularization establishes structural coherence for moving objects is load-bearing for the central claim, yet the manuscript provides no 3D track accuracy metrics, ablation on the regularization, or failure-case analysis for depth ambiguity, drift, or occlusions. Any weakness here directly affects the neighborhood constraints used in stage (ii) SIS optimization.
  2. [Experiments section] Experiments: The SOTA rendering quality and >10x speedup claims versus WorldTree on the iPhone dataset are presented without detailed quantitative tables, error bars, or ablation isolating the contribution of the SPG-derived constraints, making it impossible to verify whether the structural coherence assumption holds on the reported benchmarks.
minor comments (2)
  1. [Abstract] Abstract: The description of the two-stage optimization would be clearer if it referenced the specific equations or pseudocode for the cubic Hermite spline and the SPG construction.
  2. Notation: Ensure consistent definition of acronyms (SIS, SPG) and variables at first use throughout the main text and figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify areas where additional validation would strengthen the central claims regarding structural coherence. We address each point below and will incorporate revisions to provide the requested evidence.

read point-by-point responses
  1. Referee: [Method section] Method, stage (i): The assumption that SPG initialization from 2D point tracks plus temporal rigidity regularization establishes structural coherence for moving objects is load-bearing for the central claim, yet the manuscript provides no 3D track accuracy metrics, ablation on the regularization, or failure-case analysis for depth ambiguity, drift, or occlusions. Any weakness here directly affects the neighborhood constraints used in stage (ii) SIS optimization.

    Authors: We agree that the SPG stage is foundational. The current manuscript emphasizes end-to-end results, but we will revise to add: quantitative 3D track accuracy metrics (using proxy evaluations or synthetic subsets where feasible), an ablation isolating the temporal rigidity term, and a dedicated paragraph with qualitative failure-case examples for depth ambiguity, drift, and occlusions. These additions will directly support the neighborhood constraints in stage (ii). revision: yes

  2. Referee: [Experiments section] Experiments: The SOTA rendering quality and >10x speedup claims versus WorldTree on the iPhone dataset are presented without detailed quantitative tables, error bars, or ablation isolating the contribution of the SPG-derived constraints, making it impossible to verify whether the structural coherence assumption holds on the reported benchmarks.

    Authors: We acknowledge that the experimental section would benefit from greater detail. In revision we will expand the tables to include per-scene metrics with error bars (from repeated runs where variance is measurable), and add an ablation that removes or varies the SPG-derived constraints while reporting both quality and runtime. This will isolate their contribution and allow direct verification of the structural coherence assumption on the iPhone and NVIDIA benchmarks. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation proceeds in two explicit stages: SPG initialization from independent 2D point tracks plus temporal rigidity regularization, followed by SIS initialization and optimization under neighborhood constraints derived from that SPG. Neither stage reduces the final representation or performance claims back to a quantity defined by the same representation; the inputs (2D tracks) and regularization are external to the learned spline parameters. No self-citations, ansatzes, or fitted-input-as-prediction patterns appear in the described chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be extracted or audited from the full manuscript.

pith-pipeline@v0.9.1-grok · 5773 in / 1133 out tokens · 25080 ms · 2026-06-28T15:20:32.841826+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 15 canonical work pages · 1 internal anchor

  1. [1]

    38, volume 38

    J Harold Ahlberg, Edwin Norman Nilson, and Joseph Leonard Walsh.The Theory of Splines and Their Applications: Mathematics in Science and Engineering: A Series of Monographs and Textbooks, Vol. 38, volume 38. Elsevier, 2016

  2. [2]

    Per-gaussian embedding-based deformation for deformable 3d gaussian splatting

    Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, and Youngjung Uh. Per-gaussian embedding-based deformation for deformable 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 321–335. Springer, 2024

  3. [3]

    Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P

    Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields, 2021. URL https://arxiv.org/abs/2103.13415

  4. [4]

    Barron, Ben Mildenhall, Dor Verbin, Pratul P

    Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields, 2022. URLhttps://arxiv.org/abs/2111.12077

  5. [5]

    Mobgs: Motion deblurring dynamic 3d gaussian splatting for blurry monocular video

    Minh-Quan Viet Bui, Jongmin Park, Juan Luis Gonzalez, Jaeho Moon, Jihyong Oh, and Munchurl Kim. Mobgs: Motion deblurring dynamic 3d gaussian splatting for blurry monocular video. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 2480–2489, 2026

  6. [6]

    Tensorf: Tensorial radiance fields

    Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. In European conference on computer vision, pages 333–350. Springer, 2022

  7. [7]

    A practical guide to splines.Springer-Verlag google schola, 1978

    C De Boor. A practical guide to splines.Springer-Verlag google schola, 1978

  8. [8]

    Plenoxels: Radiance fields without neural networks

    Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022

  9. [9]

    Monocular dynamic view synthesis: A reality check.Advances in Neural Information Processing Systems, 35:33768–33780, 2022

    Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, and Angjoo Kanazawa. Monocular dynamic view synthesis: A reality check.Advances in Neural Information Processing Systems, 35:33768–33780, 2022

  10. [10]

    Sc-gs: Sparse- controlled gaussian splatting for editable dynamic scenes.CVPR, 2024

    Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse- controlled gaussian splatting for editable dynamic scenes.CVPR, 2024

  11. [11]

    D-npc: Dynamic neural point clouds for non-rigid view synthesis from monocular video.arXiv preprint arXiv:2406.10078, 2024

    Moritz Kappel, Florian Hahlbohm, Timon Scholz, Susana Castillo, Christian Theobalt, Martin Eisemann, Vladislav Golyanik, and Marcus Magnor. D-npc: Dynamic neural point clouds for non-rigid view synthesis from monocular video.arXiv preprint arXiv:2406.10078, 2024

  12. [12]

    Cotracker: It is better to track together.arXiv, 2023

    Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker: It is better to track together.arXiv, 2023

  13. [13]

    Cotracker3: Simpler and better point tracking by pseudo-labelling real videos

    Nikita Karaev, Yuri Makarov, Jianyuan Wang, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker3: Simpler and better point tracking by pseudo-labelling real videos. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6013–6022, 2025

  14. [14]

    Skinning with dual quaternions

    Ladislav Kavan, Steven Collins, Ji ˇrí Žára, and Carol O’Sullivan. Skinning with dual quaternions. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, pages 39–46, 2007

  15. [15]

    Kerbl, G

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering, 2023. URLhttps://arxiv.org/abs/2308.04079

  16. [16]

    Approximate differentiable rendering with algebraic surfaces

    Leonid Keselman and Martial Hebert. Approximate differentiable rendering with algebraic surfaces. In European Conference on Computer Vision, pages 596–614. Springer, 2022

  17. [17]

    Flexible techniques for differentiable rendering with 3d gaussians

    Leonid Keselman and Martial Hebert. Flexible techniques for differentiable rendering with 3d gaussians. arXiv preprint arXiv:2308.14737, 2023

  18. [18]

    Peebles and S

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3992–4003, 2023. doi: 10.1109/ICCV51070.2023.00371

  19. [19]

    Modec-gs: Global-to-local motion decomposition and temporal interval adjustment for compact dynamic 3d gaussian splatting

    Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh, and Munchurl Kim. Modec-gs: Global-to-local motion decomposition and temporal interval adjustment for compact dynamic 3d gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 10

  20. [20]

    Fully explicit dynamic gaussian splatting

    Junoh Lee, Chang-Yeon Won, Hyunjun Jung, Inhwan Bae, and Hae-Gon Jeon. Fully explicit dynamic gaussian splatting. InNeurIPS, 2024

  21. [21]

    Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds.arXiv preprint arXiv:2405.17421, 2024

    Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, and Kostas Daniilidis. Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds.arXiv preprint arXiv:2405.17421, 2024

  22. [22]

    St-4dgs: Spatial-temporally consistent 4d gaussian splatting for efficient dynamic scene rendering

    Deqi Li, Shi-Sheng Huang, Zhiyuan Lu, Xinran Duan, and Hua Huang. St-4dgs: Spatial-temporally consistent 4d gaussian splatting for efficient dynamic scene rendering. InACM SIGGRAPH 2024 Con- ference Papers, SIGGRAPH ’24, New York, NY , USA, 2024. Association for Computing Machinery. ISBN 9798400705250. doi: 10.1145/3641519.3657520. URL https://doi.org/10....

  23. [23]

    Spacetime gaussian feature splatting for real-time dynamic view synthesis

    Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaussian feature splatting for real-time dynamic view synthesis. InCVPR, 2024

  24. [24]

    Neural scene flow fields for space-time view synthesis of dynamic scenes

    Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. Neural scene flow fields for space-time view synthesis of dynamic scenes. InCVPR, 2021

  25. [25]

    Himor: Monocular deformable gaussian reconstruction with hierarchical motion representation

    Yiming Liang, Tianhan Xu, and Yuta Kikuchi. Himor: Monocular deformable gaussian reconstruction with hierarchical motion representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 886–895, 2025

  26. [26]

    Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis.arXiv, 2023

    Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, and Lei Xiao. Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis.arXiv, 2023

  27. [27]

    Depth Anything 3: Recovering the Visual Space from Any Views

    Haotong Lin, Sili Chen, Junhao Liew, Donny Y Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025

  28. [28]

    Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle

    Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21136–21145, 2024

  29. [29]

    Robust dynamic radiance fields

    Yu-Lun Liu, Chen Gao, Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Johannes Kopf, and Jia-Bin Huang. Robust dynamic radiance fields. InCVPR, 2023

  30. [30]

    Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis, 2023

    Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis, 2023. URLhttps://arxiv.org/abs/2308.09713

  31. [31]

    Mildenhall, P

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis, 2020. URL https: //arxiv.org/abs/2003.08934

  32. [32]

    Instant neural graphics primitives with a multiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

    Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

  33. [33]

    Splinegs: Robust motion-adaptive spline for real-time dynamic 3d gaussians from monocular video

    Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon, Jihyong Oh, and Munchurl Kim. Splinegs: Robust motion-adaptive spline for real-time dynamic 3d gaussians from monocular video. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 26866–26875, June 2025

  34. [34]

    Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M

    Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M. Seitz. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields.ACM Trans. Graph., 2021

  35. [35]

    Unidepth: Universal monocular metric depth estimation

    Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. Unidepth: Universal monocular metric depth estimation. InCVPR, 2024

  36. [36]

    Modgs: Dynamic gaussian splatting from casually-captured monocular videos with depth priors

    LIU Qingming, Yuan Liu, Jiepeng Wang, Xianqiang Lyu, Peng Wang, Wenping Wang, and Junhui Hou. Modgs: Dynamic gaussian splatting from casually-captured monocular videos with depth priors. InThe Thirteenth International Conference on Learning Representations, 2025

  37. [37]

    Dynamic gaussian marbles for novel view synthesis of casual monocular videos

    Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, and Leonidas Guibas. Dynamic gaussian marbles for novel view synthesis of casual monocular videos. In SIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024

  38. [38]

    Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction

    Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5459–5469, 2022. 11

  39. [39]

    Raft: Recurrent all-pairs field transforms for optical flow

    Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InEuropean conference on computer vision, pages 402–419. Springer, 2020

  40. [40]

    Superpoint gaussian splatting for real-time high-fidelity dynamic scene reconstruction.arXiv preprint arXiv:2406.03697, 2024

    Diwen Wan, Ruijie Lu, and Gang Zeng. Superpoint gaussian splatting for real-time high-fidelity dynamic scene reconstruction.arXiv preprint arXiv:2406.03697, 2024

  41. [41]

    Fourier plenoctrees for dynamic radiance field rendering in real-time

    Liao Wang, Jiakai Zhang, Xinhang Liu, Fuqiang Zhao, Yanshun Zhang, Yingliang Zhang, Minye Wu, Jingyi Yu, and Lan Xu. Fourier plenoctrees for dynamic radiance field rendering in real-time. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13524–13534, 2022

  42. [42]

    Shape of motion: 4d reconstruction from a single video

    Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of motion: 4d reconstruction from a single video. InInternational Conference on Computer Vision (ICCV), 2025

  43. [43]

    Worldtree: Towards 4d dynamic worlds from monocular video using tree-chains

    Qisen Wang, Yifan Zhao, and Jia Li. Worldtree: Towards 4d dynamic worlds from monocular video using tree-chains. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=mVo6cyFR6C

  44. [44]

    arXiv preprint arXiv:2310.08528 (2023)

    Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering, 2024. URL https: //arxiv.org/abs/2310.08528

  45. [45]

    Gmflow: Learning optical flow via global matching

    Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, and Dacheng Tao. Gmflow: Learning optical flow via global matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8121–8130, 2022

  46. [46]

    Depth anything: Unleashing the power of large-scale unlabeled data

    Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InCVPR, 2024

  47. [47]

    Depth anything v2.Advances in Neural Information Processing Systems, 37:21875–21911, 2024

    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2.Advances in Neural Information Processing Systems, 37:21875–21911, 2024

  48. [48]

    arXiv preprint arXiv:2309.13101 (2023)

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction.arXiv preprint arXiv:2309.13101, 2023

  49. [49]

    Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera

    Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, and Jan Kautz. Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. InCVPR, 2020

  50. [50]

    Plenoctrees for real-time rendering of neural radiance fields

    Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. InProceedings of the IEEE/CVF international conference on computer vision, pages 5752–5761, 2021

  51. [51]

    Mip-splatting: Alias-free 3d gaussian splatting

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19447–19456, 2024

  52. [52]

    Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics (ToG), 43(6):1–13, 2024

    Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics (ToG), 43(6):1–13, 2024

  53. [53]

    Learning explicit continuous motion representation for dynamic gaussian splatting from monocular videos.arXiv preprint arXiv:2603.25058, 2026

    Xuankai Zhang, Junjin Xiao, Shangwei Huang, Wei-shi Zheng, and Qing Zhang. Learning explicit continuous motion representation for dynamic gaussian splatting from monocular videos.arXiv preprint arXiv:2603.25058, 2026. 12 Appendix A Demo Videos We provide a demo video, WebSpline_demo.mp4, with extensive qualitative comparisons between WebSpline and state-o...