RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos

Hoseung Choi; Junmyeong Lee; Minsu Cho; Yoonwoo Jeong

arxiv: 2412.03077 · v2 · submitted 2024-12-04 · 💻 cs.CV

RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos

Junmyeong Lee , Hoseung Choi , Yoonwoo Jeong , Minsu Cho This is my paper

Pith reviewed 2026-05-23 07:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords dynamic gaussian splatting4D reconstructionmonocular videonovel view synthesisspatiotemporal regularizationpose-free reconstructionstatic-dynamic separation

0 comments

The pith

RoDyGS reconstructs dynamic 3D scenes from casual monocular videos by separating static and dynamic elements with spatiotemporal regularization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to reconstruct dynamic scenes from casually captured monocular videos, where ambiguity in 3D geometry makes the task difficult. RoDyGS explicitly separates static and dynamic scene elements and applies spatiotemporal regularization to enforce physically plausible geometry and temporally consistent motion. This setup supports dynamic novel view synthesis without camera poses or multi-view data. Experiments show it outperforms earlier pose-free dynamic approaches while matching the rendering quality of pose-free static methods.

Core claim

RoDyGS explicitly separates static and dynamic scene elements, and applies spatiotemporal regularization to enforce physically plausible geometry and temporally consistent motion, significantly outperforming previous pose-free dynamic novel view synthesis approaches.

What carries the argument

Explicit separation of static and dynamic scene elements combined with spatiotemporal regularization applied to a Gaussian splatting representation.

If this is right

Dynamic novel view synthesis becomes feasible from single casual videos without known camera poses.
Rendered outputs maintain temporally consistent motion for moving scene elements.
Geometry in dynamic regions satisfies physical plausibility constraints enforced by regularization.
The method competes in quality with static reconstruction techniques while handling motion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation step could simplify downstream tasks such as object tracking or background removal in video processing pipelines.
Regularization patterns developed here might transfer to other monocular reconstruction settings that face similar static-dynamic ambiguities.
Testing on videos with rapid camera motion or long durations would reveal whether the regularization remains stable beyond the reported cases.

Load-bearing premise

That explicit separation of static and dynamic elements combined with spatiotemporal regularization will reliably resolve the inherent ambiguity in monocular dynamic reconstruction without additional constraints or multi-view data.

What would settle it

A monocular video sequence with complex object interactions or partial occlusions where the separation produces inaccurate 3D geometry or temporally inconsistent motion across frames.

Figures

Figures reproduced from arXiv: 2412.03077 by Hoseung Choi, Junmyeong Lee, Minsu Cho, Yoonwoo Jeong.

**Figure 1.** Figure 1: Robust Dynamic Gaussian Splatting (RoDyGS). RoDyGS achieves high-fidelity rendering of novel viewpoints from casual videos, significantly outperforming RoDynRF, which struggles with blurriness during substantial camera and object movement. cluding casual videos. Building on the success of Neural Radiance Fields (NeRF) [38] for static scenes, subsequent research [41–43] has extended NeRF to dynamic view syn… view at source ↗

**Figure 2.** Figure 2: RoDyGS Pipeline Overview. Starting with a casually captured video input, RoDyGS extracts camera poses and depths using MASt3R [32], while motion masks are derived from TAM [60]. It then separates static and dynamic Gaussians, enabling each to be independently learned for stationary background and moving objects. The primary optimization objective, Lgs, includes photometric loss and Pearson depth loss, with… view at source ↗

**Figure 3.** Figure 3: Qualitative results on Kubric-MRig and iPhone. Our pipeline accurately reconstructs scene geometry, produces sharp renderings, and aligns object positions well. Without GT camera poses, RoDynRF struggles to learn the scene geometry, resulting in object positions that differ from the GT. Even with GT camera poses, RoDynRF produces blurry results. PSNR(↑) SSIM(↑) LPIPS(↓) DynMF [29] 18.92 0.7058 0.3513 no r… view at source ↗

**Figure 4.** Figure 4: Impact of regularization terms. Our regularization effectively enhances the perceptual quality of the rendering results, leading to sharper and more realistic renderings. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of motion masks between TAM [ [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Samples from Kubric-MRig. Kubric-MRig is a dataset generated using Blender that contains 8 scenes. Each scene features multiple objects, some static and some in motion. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Novel View Synthesis on Kubric-MRig. Comparison of rendering results between RoDyGS and other dynamic neural field methods, both pose-aware [35, 43, 57, 62, 63] and pose-free [35]. In the pose-free setup, RoDyGS produces clearer rendering results than RoDynRF. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Novel View Synthesis on Tanks and Temples. Comparison on the Tanks and Temples dataset between RoDyGS and previous pose-free neural field methods [2, 13, 34]. RoDyGS demonstrates competitive rendering quality with CF-3DGS [13], the previous stateof-the-art pose-free neural field for static scenes. TiNeuVox RoDynRF ours GT w. camera pose w.o. camera pose NSFF DynamicNeRF HyperNeRF [PITH_FULL_IMAGE:figures… view at source ↗

**Figure 9.** Figure 9: Novel View Synthesis on NVIDIA Dynamic. We compare RoDyGS with RoDynRF on NVIDIA Dynamic with the pose-free setup. RoDyGS synthesizes realistic images similar to those of RoDynRF. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Novel View Synthesis on iPhone. We compare our RoDyGS method against both pose-aware [35, 43, 57, 62, 63] and posefree [35] dynamic neural fields. RoDyGS achieves better visual clarity than RoDynRF under the pose-free setup. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Failure cases of RoDyGS. RoDyGs and other baselines struggle from large motions and occlusion in scenes. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of RoDyGS when leveraging SAM [45] and TAM [60] on Kubric-MRig. RoDyGS with motion masks obtained by SAM achieves competitive visual quality to RoDyGS with motions masks obtained by TAM. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

read the original abstract

4D reconstruction from casually captured monocular videos is challenging due to inherent ambiguity in reconstructing dynamic 3D geometry. To address this challenge, we introduce Robust Dynamic Gaussian Splatting (RoDyGS), a method that reconstructs dynamic scene representation from casual monocular videos. RoDyGS explicitly separates static and dynamic scene elements, and applies spatiotemporal regularization to enforce physically plausible geometry and temporally consistent motion. Furthermore, we propose a comprehensive benchmark, Kubric-MRig, which provides extensive camera and object motion along with simultaneous multi-view capture, features that are absent in previous benchmarks. Experiments demonstrate that RoDyGS significantly outperforms previous pose-free dynamic novel view synthesis approaches and achieves competitive rendering quality compared to existing pose-free static novel view synthesis approaches. Our proejct page is available at https://rodygs.github.io

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RoDyGS adds explicit static/dynamic separation plus spatiotemporal regularization to pose-free dynamic Gaussian splatting and ships a new multi-view benchmark, but the abstract supplies no numbers to judge whether the gains are real.

read the letter

The main takeaway is that this work targets monocular dynamic reconstruction by splitting the scene into static and dynamic components and layering on regularization to enforce plausible geometry and consistent motion over time. They also release Kubric-MRig, a benchmark that includes both heavy camera motion and object motion along with simultaneous multi-view ground truth, which earlier datasets lack.

Referee Report

1 major / 1 minor

Summary. The paper introduces RoDyGS, a method for 4D reconstruction of dynamic scenes from casually captured monocular videos. It explicitly separates static and dynamic scene elements and applies spatiotemporal regularization to enforce physically plausible geometry and temporally consistent motion. The work also proposes the Kubric-MRig benchmark, which features extensive camera and object motion with simultaneous multi-view capture. Experiments are claimed to show that RoDyGS significantly outperforms prior pose-free dynamic novel view synthesis methods while achieving competitive rendering quality with pose-free static approaches.

Significance. If the central claims hold with supporting quantitative evidence, the approach would offer a practical advance in monocular dynamic reconstruction by addressing inherent ambiguities through explicit decomposition and regularization. The introduction of Kubric-MRig as a benchmark with multi-view ground truth addresses a noted gap in prior datasets and could facilitate more rigorous evaluation of pose-free dynamic methods.

major comments (1)

[Abstract] Abstract: The abstract asserts that RoDyGS 'significantly outperforms previous pose-free dynamic novel view synthesis approaches' and achieves 'competitive rendering quality,' yet provides no quantitative results, error metrics, ablation studies, or method details to support these claims. This absence makes it impossible to assess whether the explicit static/dynamic separation and spatiotemporal regularization actually resolve the monocular ambiguities as stated.

minor comments (1)

[Abstract] Abstract: Typo in 'proejct page' should be corrected to 'project page'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the recommendation for major revision. We address the single major comment below regarding the abstract. We will revise the manuscript accordingly to strengthen the presentation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract asserts that RoDyGS 'significantly outperforms previous pose-free dynamic novel view synthesis approaches' and achieves 'competitive rendering quality,' yet provides no quantitative results, error metrics, ablation studies, or method details to support these claims. This absence makes it impossible to assess whether the explicit static/dynamic separation and spatiotemporal regularization actually resolve the monocular ambiguities as stated.

Authors: We agree that the abstract, being a high-level summary, does not include specific quantitative metrics. The supporting results, including PSNR/SSIM comparisons on Kubric-MRig and other benchmarks, ablation studies on the static/dynamic decomposition and spatiotemporal regularization, and method details, are presented in Sections 4 and 5 with Tables 1-3 and Figures 3-7. To address the concern and make the claims more self-contained, we will revise the abstract to include key quantitative highlights (e.g., average PSNR gains over prior pose-free dynamic methods) while maintaining its concise nature. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method and experiments are self-contained

full rationale

The paper introduces an empirical method (RoDyGS) for dynamic scene reconstruction via explicit static/dynamic separation and spatiotemporal regularization, evaluated on a new benchmark (Kubric-MRig) and compared to prior approaches. No derivation chain, equations, or first-principles predictions are present in the provided text that could reduce to fitted inputs or self-citations by construction. Claims rest on proposed architecture and experimental outcomes rather than any self-definitional or load-bearing self-referential steps, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on abstract; no explicit free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5671 in / 960 out tokens · 38558 ms · 2026-05-23T07:56:30.243857+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 1 internal anchor

[1]

Nonrigid structure from motion in trajectory space

Ijaz Akhter, Yaser Sheikh, Sohaib Khan, and Takeo Kanade. Nonrigid structure from motion in trajectory space. Ad- vances in neural information processing systems , 21, 2008. 5

work page 2008
[2]

Nope-nerf: Optimising neu- ral radiance field with no pose prior

Wenjing Bian, Zirui Wang, Kejie Li, Jia-Wang Bian, and Victor Adrian Prisacariu. Nope-nerf: Optimising neu- ral radiance field with no pose prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4160–4169, 2023. 3, 7, 16

work page 2023
[3]

D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In European Conf. on Computer Vision (ECCV), pages 611–

work page
[4]

Springer-Verlag, 2012. 6

work page 2012
[5]

Hexplane: A fast representa- tion for dynamic scenes

Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 130–141, 2023. 2

work page 2023
[6]

Gaussianeditor: Swift and control- lable 3d editing with gaussian splatting

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xi- aofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and control- lable 3d editing with gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21476–21485, 2024. 2

work page 2024
[7]

Gaussian activated neural radiance fields for high fidelity reconstruction and pose estimation

Shin-Fang Chng, Sameera Ramasinghe, Jamie Sherrah, and Simon Lucey. Gaussian activated neural radiance fields for high fidelity reconstruction and pose estimation. In Eu- ropean Conference on Computer Vision , pages 264–280. Springer, 2022. 3

work page 2022
[8]

Cosseggaussians: Compact and swift scene segmenting 3d gaussians with dual feature fusion

Bin Dou, Tianyu Zhang, Yongjia Ma, Zhaohui Wang, and Zejian Yuan. Cosseggaussians: Compact and swift scene segmenting 3d gaussians with dual feature fusion. CoRR,

work page
[9]

Google scanned objects: A high- quality dataset of 3d scanned household items

Laura Downs, Anthony Francis, Nate Koenig, Brandon Kin- man, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high- quality dataset of 3d scanned household items. In 2022 In- ternational Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022. 10

work page 2022
[10]

InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024

Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. Instantsplat: Un- bounded sparse-view pose-free gaussian splatting in 40 sec- onds. arXiv preprint arXiv:2403.20309, 2024. 1, 3

work page arXiv 2024
[11]

Fast dynamic radiance fields with time-aware neural voxels

Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xi- aopeng Zhang, Wenyu Liu, Matthias Nießner, and Qi Tian. Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia 2022 Conference Papers, 2022. 13

work page 2022
[12]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022. 2

work page 2022
[13]

K-planes: Explicit radiance fields in space, time, and appearance

Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023. 2

work page 2023
[14]

Efros, and Xiaolong Wang

Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, and Xiaolong Wang. Colmap-free 3d gaussian splat- ting. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20796– 20805, 2024. 3, 7, 10, 16

work page 2024
[15]

Dynamic view synthesis from dynamic monocular video

Chen Gao, Ayush Saraf, Johannes Kopf, and Jia-Bin Huang. Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE International Conference on Com- puter Vision, 2021. 13

work page 2021
[16]

Monocular dynamic view synthesis: A reality check

Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, and Angjoo Kanazawa. Monocular dynamic view synthesis: A reality check. Advances in Neural Information Processing Systems, 35:33768–33780, 2022. 2, 6, 7, 10, 13

work page 2022
[17]

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J Fleet, Dan Gnanapra- gasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh- Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Rad- wan, Daniel Rebain, Sara Sabour...

work page 2022
[18]

Compressive sensing with un-trained neural networks: Gradient descent finds a smooth approximation

Reinhard Heckel and Mahdi Soltanolkotabi. Compressive sensing with un-trained neural networks: Gradient descent finds a smooth approximation. In International Conference on Machine Learning, pages 4149–4158. PMLR, 2020. 5

work page 2020
[19]

Baking neural ra- diance fields for real-time view synthesis

Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, Jonathan T Barron, and Paul Debevec. Baking neural ra- diance fields for real-time view synthesis. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5875–5884, 2021. 2

work page 2021
[20]

Au- tomatic photo pop-up

Derek Hoiem, Alexei A Efros, and Martial Hebert. Au- tomatic photo pop-up. In ACM SIGGRAPH 2005 Papers , pages 577–584. 2005. 2

work page 2005
[21]

Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes

Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes. In Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4220–4230, 2024. 2

work page 2024
[22]

Self-calibrating neural radiance fields

Yoonwoo Jeong, Seokjun Ahn, Christopher Choy, Anima Anandkumar, Minsu Cho, and Jaesik Park. Self-calibrating neural radiance fields. In Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision , pages 5846– 5854, 2021. 3, 7

work page 2021
[23]

Perfception: Perception using radiance fields

Yoonwoo Jeong, Seungjoo Shin, Junha Lee, Chris Choy, An- ima Anandkumar, Minsu Cho, and Jaesik Park. Perfception: Perception using radiance fields. Advances in Neural Infor- mation Processing Systems, 35:26105–26121, 2022. 12

work page 2022
[24]

Co- tracker: It is better to track together

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Co- tracker: It is better to track together. arXiv preprint arXiv:2307.07635, 2023. 2

work page arXiv 2023
[25]

3d gaussian splatting for real-time radiance field rendering

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1,

work page
[26]

3d gaussian splatting as markov chain monte carlo

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Wei- wei Sun, Jeff Tseng, Hossam Isack, Abhishek Kar, An- drea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splatting as markov chain monte carlo. arXiv preprint arXiv:2404.09591, 2024. 2

work page arXiv 2024
[27]

Laplacianfusion: Detailed 3d clothed- human body reconstruction

Hyomin Kim, Hyeonseo Nam, Jungeon Kim, Jaesik Park, and Seungyong Lee. Laplacianfusion: Detailed 3d clothed- human body reconstruction. ACM Transactions on Graphics (TOG), 41(6):1–14, 2022. 5

work page 2022
[28]

Tanks and temples: Benchmarking large-scale scene reconstruction

Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) , 36 (4):1–13, 2017. 2, 6, 7, 10

work page 2017
[29]

Point-based neural rendering with per- view optimization

Georgios Kopanas, Julien Philip, Thomas Leimk ¨uhler, and George Drettakis. Point-based neural rendering with per- view optimization. In Computer Graphics Forum, pages 29–

work page
[30]

Wiley Online Library, 2021. 2

work page 2021
[31]

Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting

Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. arXiV, 2023. 2, 3, 5, 7, 8, 10

work page 2023
[32]

Multi- body non-rigid structure-from-motion

Suryansh Kumar, Yuchao Dai, and Hongdong Li. Multi- body non-rigid structure-from-motion. In 2016 Fourth In- ternational Conference on 3D Vision (3DV), pages 148–156. IEEE, 2016. 5

work page 2016
[33]

Fast view synthesis of casual videos

Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, and Feng Liu. Fast view synthesis of casual videos. arXiv preprint arXiv:2312.02135, 2023. 2

work page arXiv 2023
[34]

Grounding image matching in 3d with mast3r, 2024

Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Ground- ing image matching in 3d with mast3r. arXiv preprint arXiv:2406.09756, 2024. 1, 2, 4, 6, 10, 12

work page arXiv 2024
[35]

Neural 3d video synthesis from multi-view video

Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 5521–5531, 2022. 2

work page 2022
[36]

Barf: Bundle-adjusting neural radiance fields

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Si- mon Lucey. Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5741–5751, 2021. 3, 7, 16

work page 2021
[37]

Robust dynamic radiance fields

Yu-Lun Liu, Chen Gao, Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Jo- hannes Kopf, and Jia-Bin Huang. Robust dynamic radiance fields. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 13–23, 2023. 2, 3, 4, 6, 7, 11, 12, 13, 15, 17, 20, 21

work page 2023
[38]

Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis. In 3DV, 2024. 2

work page 2024
[39]

Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar

Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view syn- thesis with prescriptive sampling guidelines. ACM Transac- tions on Graphics (TOG), 2019. 13

work page 2019
[40]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. In ECCV, 2020. 1

work page 2020
[41]

Nerf: Representing scenes as neural radiance fields for view syn- thesis

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. Communications of the ACM , 65(1):99–106, 2021. 2

work page 2021
[42]

Instant neural graphics primitives with a mul- tiresolution hash encoding

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2

work page 2022
[43]

Nerfies: Deformable neural radiance fields

Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021. 1, 2

work page 2021
[44]

Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M

Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M. Seitz. Hypernerf: A higher- dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), 2021. 6, 13

work page 2021
[45]

D-NeRF: Neural Radiance Fields for Dynamic Scenes

Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2020. 1, 2, 6, 7, 15, 17, 20, 21

work page 2020
[46]

Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer

Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence (TPAMI), 2020. 2

work page 2020
[47]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024. 11, 13, 19

work page internal anchor Pith review Pith/arXiv arXiv 2024
[48]

Free view synthesis

Gernot Riegler and Vladlen Koltun. Free view synthesis. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16, pages 623–640. Springer, 2020. 2

work page 2020
[49]

Stable view synthesis

Gernot Riegler and Vladlen Koltun. Stable view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12216–12225, 2021. 2

work page 2021
[50]

The convergence rate of neural networks for learned functions of different frequencies

Basri Ronen, David Jacobs, Yoni Kasten, and Shira Kritch- man. The convergence rate of neural networks for learned functions of different frequencies. Advances in Neural In- formation Processing Systems, 32, 2019. 5

work page 2019
[51]

Structure- from-motion revisited

Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 4104–4113, 2016. 1, 2

work page 2016
[52]

Improved direct voxel grid optimization for radiance fields reconstruc- tion

Cheng Sun, Min Sun, and Hwann-Tzong Chen. Improved direct voxel grid optimization for radiance fields reconstruc- tion. arXiv preprint arXiv:2206.05085, 2022. 2

work page arXiv 2022
[53]

Raft: Recurrent all-pairs field transforms for optical flow

Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 23 2020: 16th European Conference, Glasgow, UK, August 23– 28, 2020, Proceedings, Part II 16, pages 402–419. Springer,

work page 2020
[54]

Shape of motion: 4d reconstruc- tion from a single video

Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of motion: 4d reconstruc- tion from a single video. 2024. 2

work page 2024
[55]

Shape of motion: 4d reconstruc- tion from a single video

Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of motion: 4d reconstruc- tion from a single video. arXiv preprint arXiv:2407.13764,

work page arXiv
[56]

Dust3r: Geometric 3d vi- sion made easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20697– 20709, 2024. 1, 3

work page 2024
[57]

Gflow: Recovering 4d world from monocular video

Shizun Wang, Xingyi Yang, Qiuhong Shen, Zhenxiang Jiang, and Xinchao Wang. Gflow: Recovering 4d world from monocular video. arXiv preprint arXiv:2405.18426, 2024. 2

work page arXiv 2024
[58]

NeRF −−: Neural radiance fields without known camera parameters,

Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, and Victor Adrian Prisacariu. Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021. 3, 7

work page arXiv 2021
[59]

4d gaussian splatting for real-time dynamic scene rendering

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20310–20320, 2024. 2, 6, 7, 15, 17, 20, 21

work page 2024
[60]

Sparsegs: Real- time 360 {\deg} sparse view synthesis using gaussian splat- ting

Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, and Achuta Kadambi. Sparsegs: Real- time 360 {\deg} sparse view synthesis using gaussian splat- ting. arXiv preprint arXiv:2312.00206, 2023. 2, 6

work page arXiv 2023
[61]

Point- nerf: Point-based neural radiance fields

Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. Point- nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022. 2

work page 2022
[62]

arXiv preprint arXiv:2304.11968 (2023)

Jinyu Yang, Mingqi Gao, Zhe Li, Shang Gao, Fangjing Wang, and Feng Zheng. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023. 2, 3, 4, 6, 7, 10, 11, 12, 13, 19, 20, 21

work page arXiv 2023
[63]

Depth anything: Unleashing the power of large-scale unlabeled data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. In CVPR, 2024. 2, 4, 6, 10

work page 2024
[64]

Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20331–20341, 2024. 6, 7, 15, 17, 20, 21

work page 2024
[65]

Real- time photorealistic dynamic scene representation and render- ing with 4d gaussian splatting

Zeyu Yang, Hongye Yang, Zijie Pan, and Li Zhang. Real- time photorealistic dynamic scene representation and render- ing with 4d gaussian splatting. In International Conference on Learning Representations (ICLR), 2024. 2, 6, 7, 15, 17, 20, 21

work page 2024
[66]

Absgs: Recovering fine details in 3d gaussian splat- ting

Zongxin Ye, Wenyu Li, Sidun Liu, Peng Qiao, and Yong Dou. Absgs: Recovering fine details in 3d gaussian splat- ting. In ACM Multimedia 2024, 2024. 2

work page 2024
[67]

inerf: Inverting neural radiance fields for pose estimation

Lin Yen-Chen, Pete Florence, Jonathan T Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. inerf: Inverting neural radiance fields for pose estimation. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021. 3

work page 2021
[68]

Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera

Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, and Jan Kautz. Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5336–5345, 2020. 6, 10, 12, 13

work page 2020
[69]

Cor-gs: Sparse-view 3d gaussian splat- ting via co-regularization

Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, and Xiao Bai. Cor-gs: Sparse-view 3d gaussian splat- ting via co-regularization. arXiv preprint arXiv:2405.12110,

work page arXiv
[70]

Differentiable point-based radiance fields for efficient view synthesis

Qiang Zhang, Seung-Hwan Baek, Szymon Rusinkiewicz, and Felix Heide. Differentiable point-based radiance fields for efficient view synthesis. In SIGGRAPH Asia 2022 Con- ference Papers, pages 1–12, 2022. 2

work page 2022
[71]

Zwicker, H

M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Ewa splatting. IEEE Transactions on Visualization and Computer Graphics, 8(3):223–238, 2002. 3 24

work page 2002

[1] [1]

Nonrigid structure from motion in trajectory space

Ijaz Akhter, Yaser Sheikh, Sohaib Khan, and Takeo Kanade. Nonrigid structure from motion in trajectory space. Ad- vances in neural information processing systems , 21, 2008. 5

work page 2008

[2] [2]

Nope-nerf: Optimising neu- ral radiance field with no pose prior

Wenjing Bian, Zirui Wang, Kejie Li, Jia-Wang Bian, and Victor Adrian Prisacariu. Nope-nerf: Optimising neu- ral radiance field with no pose prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4160–4169, 2023. 3, 7, 16

work page 2023

[3] [3]

D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In European Conf. on Computer Vision (ECCV), pages 611–

work page

[4] [4]

Springer-Verlag, 2012. 6

work page 2012

[5] [5]

Hexplane: A fast representa- tion for dynamic scenes

Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 130–141, 2023. 2

work page 2023

[6] [6]

Gaussianeditor: Swift and control- lable 3d editing with gaussian splatting

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xi- aofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and control- lable 3d editing with gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21476–21485, 2024. 2

work page 2024

[7] [7]

Gaussian activated neural radiance fields for high fidelity reconstruction and pose estimation

Shin-Fang Chng, Sameera Ramasinghe, Jamie Sherrah, and Simon Lucey. Gaussian activated neural radiance fields for high fidelity reconstruction and pose estimation. In Eu- ropean Conference on Computer Vision , pages 264–280. Springer, 2022. 3

work page 2022

[8] [8]

Cosseggaussians: Compact and swift scene segmenting 3d gaussians with dual feature fusion

Bin Dou, Tianyu Zhang, Yongjia Ma, Zhaohui Wang, and Zejian Yuan. Cosseggaussians: Compact and swift scene segmenting 3d gaussians with dual feature fusion. CoRR,

work page

[9] [9]

Google scanned objects: A high- quality dataset of 3d scanned household items

Laura Downs, Anthony Francis, Nate Koenig, Brandon Kin- man, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high- quality dataset of 3d scanned household items. In 2022 In- ternational Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022. 10

work page 2022

[10] [10]

InstantSplat: Sparse-view gaussian splatting in seconds.arXiv preprint arXiv:2403.20309, 2024

Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. Instantsplat: Un- bounded sparse-view pose-free gaussian splatting in 40 sec- onds. arXiv preprint arXiv:2403.20309, 2024. 1, 3

work page arXiv 2024

[11] [11]

Fast dynamic radiance fields with time-aware neural voxels

Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xi- aopeng Zhang, Wenyu Liu, Matthias Nießner, and Qi Tian. Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia 2022 Conference Papers, 2022. 13

work page 2022

[12] [12]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022. 2

work page 2022

[13] [13]

K-planes: Explicit radiance fields in space, time, and appearance

Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023. 2

work page 2023

[14] [14]

Efros, and Xiaolong Wang

Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, and Xiaolong Wang. Colmap-free 3d gaussian splat- ting. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20796– 20805, 2024. 3, 7, 10, 16

work page 2024

[15] [15]

Dynamic view synthesis from dynamic monocular video

Chen Gao, Ayush Saraf, Johannes Kopf, and Jia-Bin Huang. Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE International Conference on Com- puter Vision, 2021. 13

work page 2021

[16] [16]

Monocular dynamic view synthesis: A reality check

Hang Gao, Ruilong Li, Shubham Tulsiani, Bryan Russell, and Angjoo Kanazawa. Monocular dynamic view synthesis: A reality check. Advances in Neural Information Processing Systems, 35:33768–33780, 2022. 2, 6, 7, 10, 13

work page 2022

[17] [17]

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J Fleet, Dan Gnanapra- gasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh- Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Rad- wan, Daniel Rebain, Sara Sabour...

work page 2022

[18] [18]

Compressive sensing with un-trained neural networks: Gradient descent finds a smooth approximation

Reinhard Heckel and Mahdi Soltanolkotabi. Compressive sensing with un-trained neural networks: Gradient descent finds a smooth approximation. In International Conference on Machine Learning, pages 4149–4158. PMLR, 2020. 5

work page 2020

[19] [19]

Baking neural ra- diance fields for real-time view synthesis

Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, Jonathan T Barron, and Paul Debevec. Baking neural ra- diance fields for real-time view synthesis. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5875–5884, 2021. 2

work page 2021

[20] [20]

Au- tomatic photo pop-up

Derek Hoiem, Alexei A Efros, and Martial Hebert. Au- tomatic photo pop-up. In ACM SIGGRAPH 2005 Papers , pages 577–584. 2005. 2

work page 2005

[21] [21]

Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes

Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes. In Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4220–4230, 2024. 2

work page 2024

[22] [22]

Self-calibrating neural radiance fields

Yoonwoo Jeong, Seokjun Ahn, Christopher Choy, Anima Anandkumar, Minsu Cho, and Jaesik Park. Self-calibrating neural radiance fields. In Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision , pages 5846– 5854, 2021. 3, 7

work page 2021

[23] [23]

Perfception: Perception using radiance fields

Yoonwoo Jeong, Seungjoo Shin, Junha Lee, Chris Choy, An- ima Anandkumar, Minsu Cho, and Jaesik Park. Perfception: Perception using radiance fields. Advances in Neural Infor- mation Processing Systems, 35:26105–26121, 2022. 12

work page 2022

[24] [24]

Co- tracker: It is better to track together

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Co- tracker: It is better to track together. arXiv preprint arXiv:2307.07635, 2023. 2

work page arXiv 2023

[25] [25]

3d gaussian splatting for real-time radiance field rendering

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1,

work page

[26] [26]

3d gaussian splatting as markov chain monte carlo

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Wei- wei Sun, Jeff Tseng, Hossam Isack, Abhishek Kar, An- drea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splatting as markov chain monte carlo. arXiv preprint arXiv:2404.09591, 2024. 2

work page arXiv 2024

[27] [27]

Laplacianfusion: Detailed 3d clothed- human body reconstruction

Hyomin Kim, Hyeonseo Nam, Jungeon Kim, Jaesik Park, and Seungyong Lee. Laplacianfusion: Detailed 3d clothed- human body reconstruction. ACM Transactions on Graphics (TOG), 41(6):1–14, 2022. 5

work page 2022

[28] [28]

Tanks and temples: Benchmarking large-scale scene reconstruction

Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) , 36 (4):1–13, 2017. 2, 6, 7, 10

work page 2017

[29] [29]

Point-based neural rendering with per- view optimization

Georgios Kopanas, Julien Philip, Thomas Leimk ¨uhler, and George Drettakis. Point-based neural rendering with per- view optimization. In Computer Graphics Forum, pages 29–

work page

[30] [30]

Wiley Online Library, 2021. 2

work page 2021

[31] [31]

Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting

Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. arXiV, 2023. 2, 3, 5, 7, 8, 10

work page 2023

[32] [32]

Multi- body non-rigid structure-from-motion

Suryansh Kumar, Yuchao Dai, and Hongdong Li. Multi- body non-rigid structure-from-motion. In 2016 Fourth In- ternational Conference on 3D Vision (3DV), pages 148–156. IEEE, 2016. 5

work page 2016

[33] [33]

Fast view synthesis of casual videos

Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, and Feng Liu. Fast view synthesis of casual videos. arXiv preprint arXiv:2312.02135, 2023. 2

work page arXiv 2023

[34] [34]

Grounding image matching in 3d with mast3r, 2024

Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Ground- ing image matching in 3d with mast3r. arXiv preprint arXiv:2406.09756, 2024. 1, 2, 4, 6, 10, 12

work page arXiv 2024

[35] [35]

Neural 3d video synthesis from multi-view video

Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 5521–5531, 2022. 2

work page 2022

[36] [36]

Barf: Bundle-adjusting neural radiance fields

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Si- mon Lucey. Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5741–5751, 2021. 3, 7, 16

work page 2021

[37] [37]

Robust dynamic radiance fields

Yu-Lun Liu, Chen Gao, Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Jo- hannes Kopf, and Jia-Bin Huang. Robust dynamic radiance fields. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 13–23, 2023. 2, 3, 4, 6, 7, 11, 12, 13, 15, 17, 20, 21

work page 2023

[38] [38]

Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis. In 3DV, 2024. 2

work page 2024

[39] [39]

Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar

Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view syn- thesis with prescriptive sampling guidelines. ACM Transac- tions on Graphics (TOG), 2019. 13

work page 2019

[40] [40]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. In ECCV, 2020. 1

work page 2020

[41] [41]

Nerf: Representing scenes as neural radiance fields for view syn- thesis

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. Communications of the ACM , 65(1):99–106, 2021. 2

work page 2021

[42] [42]

Instant neural graphics primitives with a mul- tiresolution hash encoding

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2

work page 2022

[43] [43]

Nerfies: Deformable neural radiance fields

Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021. 1, 2

work page 2021

[44] [44]

Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M

Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin- Brualla, and Steven M. Seitz. Hypernerf: A higher- dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), 2021. 6, 13

work page 2021

[45] [45]

D-NeRF: Neural Radiance Fields for Dynamic Scenes

Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2020. 1, 2, 6, 7, 15, 17, 20, 21

work page 2020

[46] [46]

Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer

Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence (TPAMI), 2020. 2

work page 2020

[47] [47]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714, 2024. 11, 13, 19

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [48]

Free view synthesis

Gernot Riegler and Vladlen Koltun. Free view synthesis. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16, pages 623–640. Springer, 2020. 2

work page 2020

[49] [49]

Stable view synthesis

Gernot Riegler and Vladlen Koltun. Stable view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12216–12225, 2021. 2

work page 2021

[50] [50]

The convergence rate of neural networks for learned functions of different frequencies

Basri Ronen, David Jacobs, Yoni Kasten, and Shira Kritch- man. The convergence rate of neural networks for learned functions of different frequencies. Advances in Neural In- formation Processing Systems, 32, 2019. 5

work page 2019

[51] [51]

Structure- from-motion revisited

Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. In Proceedings of the IEEE con- ference on computer vision and pattern recognition , pages 4104–4113, 2016. 1, 2

work page 2016

[52] [52]

Improved direct voxel grid optimization for radiance fields reconstruc- tion

Cheng Sun, Min Sun, and Hwann-Tzong Chen. Improved direct voxel grid optimization for radiance fields reconstruc- tion. arXiv preprint arXiv:2206.05085, 2022. 2

work page arXiv 2022

[53] [53]

Raft: Recurrent all-pairs field transforms for optical flow

Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 23 2020: 16th European Conference, Glasgow, UK, August 23– 28, 2020, Proceedings, Part II 16, pages 402–419. Springer,

work page 2020

[54] [54]

Shape of motion: 4d reconstruc- tion from a single video

Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of motion: 4d reconstruc- tion from a single video. 2024. 2

work page 2024

[55] [55]

Shape of motion: 4d reconstruc- tion from a single video

Qianqian Wang, Vickie Ye, Hang Gao, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of motion: 4d reconstruc- tion from a single video. arXiv preprint arXiv:2407.13764,

work page arXiv

[56] [56]

Dust3r: Geometric 3d vi- sion made easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20697– 20709, 2024. 1, 3

work page 2024

[57] [57]

Gflow: Recovering 4d world from monocular video

Shizun Wang, Xingyi Yang, Qiuhong Shen, Zhenxiang Jiang, and Xinchao Wang. Gflow: Recovering 4d world from monocular video. arXiv preprint arXiv:2405.18426, 2024. 2

work page arXiv 2024

[58] [58]

NeRF −−: Neural radiance fields without known camera parameters,

Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, and Victor Adrian Prisacariu. Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021. 3, 7

work page arXiv 2021

[59] [59]

4d gaussian splatting for real-time dynamic scene rendering

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20310–20320, 2024. 2, 6, 7, 15, 17, 20, 21

work page 2024

[60] [60]

Sparsegs: Real- time 360 {\deg} sparse view synthesis using gaussian splat- ting

Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, and Achuta Kadambi. Sparsegs: Real- time 360 {\deg} sparse view synthesis using gaussian splat- ting. arXiv preprint arXiv:2312.00206, 2023. 2, 6

work page arXiv 2023

[61] [61]

Point- nerf: Point-based neural radiance fields

Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. Point- nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022. 2

work page 2022

[62] [62]

arXiv preprint arXiv:2304.11968 (2023)

Jinyu Yang, Mingqi Gao, Zhe Li, Shang Gao, Fangjing Wang, and Feng Zheng. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023. 2, 3, 4, 6, 7, 10, 11, 12, 13, 19, 20, 21

work page arXiv 2023

[63] [63]

Depth anything: Unleashing the power of large-scale unlabeled data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. In CVPR, 2024. 2, 4, 6, 10

work page 2024

[64] [64]

Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20331–20341, 2024. 6, 7, 15, 17, 20, 21

work page 2024

[65] [65]

Real- time photorealistic dynamic scene representation and render- ing with 4d gaussian splatting

Zeyu Yang, Hongye Yang, Zijie Pan, and Li Zhang. Real- time photorealistic dynamic scene representation and render- ing with 4d gaussian splatting. In International Conference on Learning Representations (ICLR), 2024. 2, 6, 7, 15, 17, 20, 21

work page 2024

[66] [66]

Absgs: Recovering fine details in 3d gaussian splat- ting

Zongxin Ye, Wenyu Li, Sidun Liu, Peng Qiao, and Yong Dou. Absgs: Recovering fine details in 3d gaussian splat- ting. In ACM Multimedia 2024, 2024. 2

work page 2024

[67] [67]

inerf: Inverting neural radiance fields for pose estimation

Lin Yen-Chen, Pete Florence, Jonathan T Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. inerf: Inverting neural radiance fields for pose estimation. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021. 3

work page 2021

[68] [68]

Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera

Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, and Jan Kautz. Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5336–5345, 2020. 6, 10, 12, 13

work page 2020

[69] [69]

Cor-gs: Sparse-view 3d gaussian splat- ting via co-regularization

Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, and Xiao Bai. Cor-gs: Sparse-view 3d gaussian splat- ting via co-regularization. arXiv preprint arXiv:2405.12110,

work page arXiv

[70] [70]

Differentiable point-based radiance fields for efficient view synthesis

Qiang Zhang, Seung-Hwan Baek, Szymon Rusinkiewicz, and Felix Heide. Differentiable point-based radiance fields for efficient view synthesis. In SIGGRAPH Asia 2022 Con- ference Papers, pages 1–12, 2022. 2

work page 2022

[71] [71]

Zwicker, H

M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Ewa splatting. IEEE Transactions on Visualization and Computer Graphics, 8(3):223–238, 2002. 3 24

work page 2002