pith. sign in

arxiv: 2604.22339 · v2 · submitted 2026-04-24 · 💻 cs.CV

Flow4DGS-SLAM: Optical Flow-Guided 4D Gaussian Splatting SLAM

Pith reviewed 2026-05-08 12:27 UTC · model grok-4.3

classification 💻 cs.CV
keywords SLAM4D Gaussian SplattingOptical FlowMotion MaskDynamic ReconstructionGaussian Mixture ModelScene Flow
0
0 comments X p. Extension

The pith

Optical flow decomposition creates reliable motion masks for efficient dynamic 4DGS SLAM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an optical flow-guided framework for 4D Gaussian Splatting in dynamic SLAM. It first fits a camera ego-motion model to the optical flow to generate a category-agnostic motion mask that separates dynamic and static Gaussians. This mask also supports flow-guided camera pose initialization. The system then explicitly models temporal centers of dynamic Gaussians at keyframes, propagating them with 3D scene flow priors and using adaptive insertion, while a Gaussian Mixture Model learns temporal opacity and rotation. These techniques together enable faster training and state-of-the-art performance in tracking and dynamic reconstruction.

Core claim

The central discovery is that decomposing optical flow via ego-motion fitting produces a motion mask for separating dynamic and static elements in 4DGS, combined with explicit temporal center modeling and GMM for dynamics, resulting in efficient and accurate dynamic SLAM.

What carries the argument

The category-agnostic motion mask generated by fitting an ego-motion model to optical flow, which separates dynamic and static Gaussians and initializes poses, along with propagated temporal centers and GMM modeling of temporal opacity and rotation.

If this is right

  • More robust camera pose estimation in environments with moving objects.
  • Efficient reconstruction of both static backgrounds and dynamic foregrounds.
  • Significant reduction in training time for dynamic 4D Gaussian models.
  • Improved photorealistic renderings of changing scenes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be adapted to other dynamic 3D reconstruction methods beyond Gaussian Splatting.
  • Real-time robotic applications in crowded or changing environments might benefit from this motion separation.
  • Further improvements in scene flow estimation could enhance performance on complex non-rigid motions.

Load-bearing premise

Fitting a camera ego-motion model to decompose the optical flow produces a reliable category-agnostic motion mask that correctly separates dynamic and static Gaussians without introducing errors.

What would settle it

A video sequence featuring multiple objects moving independently in ways that violate the single ego-motion assumption, where the resulting mask leads to incorrect Gaussian separation and poor tracking.

Figures

Figures reproduced from arXiv: 2604.22339 by Gim Hee Lee, Yunsong Wang.

Figure 1
Figure 1. Figure 1: Overview of our results. Our method achieves high-quality renderings with spatially and temporally coherent Gaussian motion. Abstract Handling the dynamic environments is a significant research challenge in Visual Simultaneous Localization and Mapping (SLAM). Recent research combines 3D Gaussian Splatting (3DGS) with SLAM to achieve both robust camera pose esti￾mation and photorealistic renderings. However… view at source ↗
Figure 2
Figure 2. Figure 2 view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative Renderings on non-keyframes in TUM RGB-D [32] (first two rows) and BONN [29] (last two rows) datasets. 4.2. Main Results Camera Tracking. The camera tracking results on TUM RGB-D [32] and BONN [29] are reported in Tables 1 and 3. Our method achieves more accurate camera trajectories on both datasets. Notably, 4DGS-SLAM [21] uses prolonged mapping (200 iterations) to reconstruct the scene and re… view at source ↗
Figure 4
Figure 4. Figure 4: Tracking Results on BONN [29] ballon2 (first row) and person_tracking (second row) scenes view at source ↗
read the original abstract

Handling the dynamic environments is a significant research challenge in Visual Simultaneous Localization and Mapping (SLAM). Recent research combines 3D Gaussian Splatting (3DGS) with SLAM to achieve both robust camera pose estimation and photorealistic renderings. However, using SLAM to efficiently reconstruct both static and dynamic regions remains challenging. In this work, we propose an efficient framework for dynamic 3DGS SLAM guided by optical flow. Using the input depth and prior optical flow, we first propose a category-agnostic motion mask generation strategy by fitting a camera ego-motion model to decompose the optical flow. This module separates dynamic and static Gaussians and simultaneously provides flow-guided camera pose initialization. We boost the training speed of dynamic 3DGS by explicitly modeling their temporal centers at keyframes. These centers are propagated using 3D scene flow priors and are dynamically initialized with an adaptive insertion strategy. Alongside this, we model the temporal opacity and rotation using a Gaussian Mixture Model (GMM) to adaptively learn the complex dynamics. The empirical results demonstrate our state-of-the-art performance in tracking, dynamic reconstruction, and training efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Flow4DGS-SLAM, an optical flow-guided framework for dynamic 4D Gaussian Splatting SLAM. It introduces a category-agnostic motion mask by fitting a camera ego-motion model to decompose input optical flow (combined with depth), which separates dynamic from static Gaussians and supplies flow-guided pose initialization. Temporal Gaussian centers are explicitly modeled at keyframes, propagated via 3D scene flow priors, and dynamically inserted with an adaptive threshold; temporal opacity and rotation are modeled with a GMM. The work claims state-of-the-art results in tracking accuracy, dynamic reconstruction quality, and training efficiency.

Significance. If the motion-mask decomposition and temporal modeling hold under rigorous validation, the approach offers a practical route to efficient dynamic SLAM by leveraging readily available optical flow for both segmentation and initialization, potentially reducing the computational burden of full 4DGS optimization while maintaining photorealistic rendering. The explicit temporal-center propagation and GMM components represent a targeted efficiency gain over purely implicit dynamic 3DGS methods.

major comments (2)
  1. [Abstract / §3] Abstract and §3 (Motion Mask Generation): the central SOTA claims in tracking (ATE) and dynamic reconstruction rest on the reliability of the ego-motion fitting step that produces the category-agnostic mask. No mask accuracy metrics (e.g., precision/recall against ground-truth dynamic regions), failure-case analysis, or ablation on mask quality versus final ATE/PSNR are reported, leaving open the possibility that residual flow errors in fast non-rigid or textureless scenes propagate into pose initialization and Gaussian assignment.
  2. [§4] §4 (Temporal Center Propagation and GMM): the adaptive insertion threshold and GMM component count are listed as free parameters yet no sensitivity analysis or cross-dataset stability results are provided. Because these directly control dynamic Gaussian density and temporal modeling, their tuning could affect the reported efficiency and reconstruction gains; an ablation isolating their contribution is needed to substantiate the efficiency claim.
minor comments (2)
  1. [Abstract] The abstract states 'empirical results demonstrate SOTA' without naming the evaluation datasets, number of sequences, or exact metrics (e.g., ATE, PSNR, training time) used for the comparison; adding a concise quantitative summary table reference would improve clarity.
  2. [§4] Notation for the GMM parameters (component count, mixture weights) and the adaptive insertion threshold should be defined once in the method section with explicit symbols to avoid ambiguity when reading the temporal modeling equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments below and will incorporate additional validation experiments and analyses in the revised version to strengthen the claims regarding the motion mask and temporal modeling components.

read point-by-point responses
  1. Referee: [Abstract / §3] Abstract and §3 (Motion Mask Generation): the central SOTA claims in tracking (ATE) and dynamic reconstruction rest on the reliability of the ego-motion fitting step that produces the category-agnostic mask. No mask accuracy metrics (e.g., precision/recall against ground-truth dynamic regions), failure-case analysis, or ablation on mask quality versus final ATE/PSNR are reported, leaving open the possibility that residual flow errors in fast non-rigid or textureless scenes propagate into pose initialization and Gaussian assignment.

    Authors: We agree that explicit quantitative validation of the motion mask would further substantiate the claims. Although the end-to-end SOTA results on standard dynamic SLAM benchmarks (including challenging non-rigid and textureless sequences) indicate that the ego-motion decomposition is effective in practice, we will add a dedicated ablation subsection. This will report precision/recall of the generated masks against available ground-truth dynamic region annotations, include failure-case visualizations for fast motion and textureless areas, and provide an ablation correlating mask quality directly with final ATE and PSNR. These additions will be included in the revised manuscript. revision: yes

  2. Referee: [§4] §4 (Temporal Center Propagation and GMM): the adaptive insertion threshold and GMM component count are listed as free parameters yet no sensitivity analysis or cross-dataset stability results are provided. Because these directly control dynamic Gaussian density and temporal modeling, their tuning could affect the reported efficiency and reconstruction gains; an ablation isolating their contribution is needed to substantiate the efficiency claim.

    Authors: We acknowledge that sensitivity analysis on the adaptive insertion threshold and GMM component count would strengthen the efficiency claims. In the revised manuscript we will add an ablation study that varies these hyperparameters, reporting their impact on dynamic Gaussian density, reconstruction PSNR, tracking ATE, and training time. The study will be conducted across multiple datasets to demonstrate cross-dataset stability and isolate the contribution of each component to the overall efficiency gains. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain is self-contained from optical flow and depth inputs

full rationale

The paper's core steps—fitting an ego-motion model to decompose input optical flow for a category-agnostic motion mask, using residuals to separate dynamic/static Gaussians, providing flow-guided pose initialization, propagating temporal centers via 3D scene flow priors, and modeling opacity/rotation with GMM—are presented as direct computational procedures on external inputs (depth + optical flow). No step reduces by construction to a fitted parameter renamed as a prediction, a self-definition, or a load-bearing self-citation chain. The SOTA claims are framed as empirical outcomes of the pipeline rather than tautological. The method is self-contained against the stated inputs with no imported uniqueness theorems or ansatzes via citation.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Abstract-only view; the method relies on standard assumptions of optical flow accuracy and scene flow priors plus several new procedural choices whose parameters are not enumerated.

free parameters (2)
  • GMM component count
    Number of mixture components used to model temporal opacity and rotation; value chosen to fit complex dynamics.
  • Adaptive insertion threshold
    Criterion for dynamically adding new Gaussian centers at keyframes.
axioms (1)
  • domain assumption Optical flow and depth inputs are sufficiently accurate to allow reliable ego-motion fitting and 3D scene flow propagation.
    Invoked when decomposing flow into static/dynamic components and propagating centers.

pith-pipeline@v0.9.0 · 5504 in / 1394 out tokens · 33710 ms · 2026-05-08T12:27:31.830601+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 7 canonical work pages

  1. [1]

    Dynaslam: Tracking, mapping, and inpainting in dynamic scenes

    Berta Bescos, José M Fácil, Javier Civera, and José Neira. Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. 2018. 2

  2. [2]

    Orb-slam3: An accu- rate open-source library for visual, visual–inertial, and mul- timap slam

    Carlos Campos, Richard Elvira, Juan J Gómez Rodríguez, José MM Montiel, and Juan D Tardós. Orb-slam3: An accu- rate open-source library for visual, visual–inertial, and mul- timap slam. 2021. 1, 2

  3. [3]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

    David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19457–19467, 2024. 2

  4. [4]

    Vi- sual servoing

    François Chaumette, Seth Hutchinson, and Peter Corke. Vi- sual servoing. InSpringer handbook of robotics, pages 841–

  5. [5]

    arXiv preprint arXiv:2312.00846 , year=

    Hanlin Chen, Chen Li, and Gim Hee Lee. Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance.arXiv preprint arXiv:2312.00846, 2023. 2

  6. [6]

    Vcr-gaus: View consistent depth- normal regularizer for gaussian surface reconstruction.Ad- vances in Neural Information Processing Systems, 37:139725– 139750, 2024

    Hanlin Chen, Fangyin Wei, Chen Li, Tianxin Huang, Yunsong Wang, and Gim Hee Lee. Vcr-gaus: View consistent depth- normal regularizer for gaussian surface reconstruction.Ad- vances in Neural Information Processing Systems, 37:139725– 139750, 2024. 2

  7. [7]

    Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024. 2

  8. [8]

    A volumetric method for building complex models from range images

    Brian Curless and Marc Levoy. A volumetric method for building complex models from range images. InProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 303–312, 1996. 2

  9. [9]

    Deepfactors: Real-time probabilistic dense monocular slam.IEEE Robotics and Automation Letters, 5 (2):721–728, 2020

    Jan Czarnowski, Tristan Laidlow, Ronald Clark, and An- drew J Davison. Deepfactors: Real-time probabilistic dense monocular slam.IEEE Robotics and Automation Letters, 5 (2):721–728, 2020. 2

  10. [10]

    Bundlefusion: Real-time glob- ally consistent 3d reconstruction using on-the-fly surface rein- tegration.ACM Transactions on Graphics (ToG), 36(4):1,

    Angela Dai, Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Christian Theobalt. Bundlefusion: Real-time glob- ally consistent 3d reconstruction using on-the-fly surface rein- tegration.ACM Transactions on Graphics (ToG), 36(4):1,

  11. [11]

    4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes

    Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wen- zheng Chen, and Baoquan Chen. 4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes. In ACM SIGGRAPH 2024 Conference Papers, pages 1–11, 2024. 1

  12. [12]

    Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering

    Antoine Guédon and Vincent Lepetit. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5354–5363, 2024. 2

  13. [13]

    Motion-aware 3d gaussian splatting for ef- ficient dynamic scene reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 2024

    Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, and Houqiang Li. Motion-aware 3d gaussian splatting for ef- ficient dynamic scene reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 2024. 1, 2

  14. [14]

    Horn and Brian G

    Berthold K.P. Horn and Brian G. Schunck. Determining optical flow.Artificial Intelligence, 17(1-3):185–203, 1981. 3

  15. [15]

    Sc-gs: Sparse-controlled gaus- sian splatting for editable dynamic scenes

    Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse-controlled gaus- sian splatting for editable dynamic scenes. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4220–4230, 2024. 1, 2, 5, 6

  16. [16]

    Rodyn-slam: Robust dynamic dense rgb-d slam with neural radiance fields.IEEE Robotics and Automation Letters, 2024

    Haochen Jiang, Yueming Xu, Kejie Li, Jianfeng Feng, and Li Zhang. Rodyn-slam: Robust dynamic dense rgb-d slam with neural radiance fields.IEEE Robotics and Automation Letters, 2024. 5, 6

  17. [17]

    Eslam: Efficient dense slam system based on hy- brid representation of signed distance fields

    Mohammad Mahdi Johari, Camilla Carta, and François Fleuret. Eslam: Efficient dense slam system based on hy- brid representation of signed distance fields. InCVPR, 2023. 1

  18. [18]

    Splatam: Splat, track & map 3d gaussians for dense rgb-d slam

    Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat, track & map 3d gaussians for dense rgb-d slam. InCVPR, 2024. 1, 2, 5, 6

  19. [19]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023. 1, 2, 4

  20. [20]

    Dgs-slam: Gaussian splatting slam in dynamic environment,

    Mangyu Kong, Jaewon Lee, Seongwon Lee, and Euntai Kim. Dgs-slam: Gaussian splatting slam in dynamic environment. arXiv preprint arXiv:2411.10722, 2024. 1

  21. [21]

    4d gaussian splatting slam, 2025

    Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, and Federico Tombari. 4d gaussian splatting slam, 2025. 1, 2, 4, 5, 6, 7, 8

  22. [22]

    Spacetime gaus- sian feature splatting for real-time dynamic view synthesis

    Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaus- sian feature splatting for real-time dynamic view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8508–8520, 2024. 1, 2

  23. [23]

    Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis

    Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen- Phuoc, Douglas Lanman, James Tompkin, and Lei Xiao. Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2642–

  24. [24]

    Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis

    Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In2024 International Conference on 3D Vision (3DV), pages 800–809. IEEE, 2024. 2

  25. [25]

    Gaussian splatting slam

    Hidenobu Matsuki, Riku Murai, Paul HJ Kelly, and An- drew J Davison. Gaussian splatting slam. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18039–18048, 2024. 1, 2, 5, 6, 7, 8

  26. [26]

    Orb-slam2: An open- source slam system for monocular, stereo, and rgb-d cameras

    Raul Mur-Artal and Juan D Tardós. Orb-slam2: An open- source slam system for monocular, stereo, and rgb-d cameras

  27. [27]

    Kinectfusion: Real-time dense surface mapping and tracking

    Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgib- bon. Kinectfusion: Real-time dense surface mapping and tracking. In2011 10th IEEE international symposium on mixed and augmented reality, pages 127–136. Ieee, 2011. 2

  28. [28]

    Dtam: Dense tracking and mapping in real-time

    Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. Dtam: Dense tracking and mapping in real-time. In2011 international conference on computer vision, pages 2320–2327. IEEE, 2011. 2

  29. [29]

    Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals

    Emanuele Palazzolo, Jens Behley, Philipp Lottes, Philippe Giguere, and Cyrill Stachniss. Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals

  30. [30]

    Bad slam: Bundle adjusted direct rgb-d slam

    Thomas Schops, Torsten Sattler, and Marc Pollefeys. Bad slam: Bundle adjusted direct rgb-d slam. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 134–144, 2019. 2

  31. [31]

    Dynamic gaussian marbles for novel view synthesis of casual monocular videos

    Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, and Leonidas Guibas. Dynamic gaussian marbles for novel view synthesis of casual monocular videos. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024. 1

  32. [32]

    A benchmark for the evaluation of rgb-d slam systems

    Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the evaluation of rgb-d slam systems. 2012. 6, 7

  33. [33]

    iMAP: Implicit mapping and positioning in real-time

    Edgar Sucar, Shikun Liu, Joseph Ortiz, and Andrew Davison. iMAP: Implicit mapping and positioning in real-time. In ICCV, 2021. 1, 2

  34. [34]

    arXiv preprint arXiv:1812.04605 , year =

    Zachary Teed and Jia Deng. Deepv2d: Video to depth with differentiable structure from motion.arXiv preprint arXiv:1812.04605, 2018. 2

  35. [35]

    Raft: Recurrent all-pairs field transforms for optical flow

    Zachary Teed and Jia Deng. Raft: Recurrent all-pairs field transforms for optical flow. InECCV, 2020. 3

  36. [36]

    DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras

    Zachary Teed and Jia Deng. DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. 2021. 2

  37. [37]

    Demon: Depth and motion network for learning monocular stereo

    Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Niko- laus Mayer, Eddy Ilg, Alexey Dosovitskiy, and Thomas Brox. Demon: Depth and motion network for learning monocular stereo. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5038–5047, 2017. 2

  38. [38]

    YOLOv9: Learning what you want to learn using programmable gradi- ent information

    Chien-Yao Wang and Hong-Yuan Mark Liao. YOLOv9: Learning what you want to learn using programmable gradi- ent information. 2024. 3

  39. [39]

    Co- slam: Joint coordinate and sparse parametric encodings for neural real-time slam

    Hengyi Wang, Jingwen Wang, and Lourdes Agapito. Co- slam: Joint coordinate and sparse parametric encodings for neural real-time slam. InCVPR, 2023. 1, 2

  40. [40]

    Shape of motion: 4d reconstruction from a single video

    Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of motion: 4d reconstruction from a single video. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9660–9672, 2025. 1, 2

  41. [41]

    Gflow: Recovering 4d world from monoc- ular video

    Shizun Wang, Xingyi Yang, Qiuhong Shen, Zhenxiang Jiang, and Xinchao Wang. Gflow: Recovering 4d world from monoc- ular video. InProceedings of the AAAI Conference on Artifi- cial Intelligence, pages 7862–7870, 2025. 2

  42. [42]

    Freesplat: Generalizable 3d gaussian splatting towards free view synthesis of indoor scenes.Advances in Neural Information Processing Systems, 37:107326–107349, 2024

    Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. Freesplat: Generalizable 3d gaussian splatting towards free view synthesis of indoor scenes.Advances in Neural Information Processing Systems, 37:107326–107349, 2024. 2

  43. [43]

    Freesplat++: Generalizable 3d gaussian splatting for efficient indoor scene reconstruction.arXiv preprint arXiv:2503.22986, 2025

    Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. Freesplat++: Generalizable 3d gaussian splatting for efficient indoor scene reconstruction.arXiv preprint arXiv:2503.22986, 2025. 2

  44. [44]

    Elasticfusion: Dense slam without a pose graph

    Thomas Whelan, Stefan Leutenegger, Renato F Salas- Moreno, Ben Glocker, and Andrew J Davison. Elasticfusion: Dense slam without a pose graph. InRobotics: science and systems, page 3. Rome, Italy, 2015. 2

  45. [45]

    Add-slam: Adaptive dy- namic dense slam with gaussian splatting.arXiv preprint arXiv:2505.19420, 2025

    Wenhua Wu, Chenpeng Su, Siting Zhu, Tianchen Deng, Zhe Liu, and Hesheng Wang. Add-slam: Adaptive dy- namic dense slam with gaussian splatting.arXiv preprint arXiv:2505.19420, 2025. 2

  46. [46]

    Dg-slam: Robust dynamic gaussian splatting slam with hybrid pose optimization.Advances in Neural Information Processing Systems, 37:51577–51596,

    Yueming Xu, Haochen Jiang, Zhongyang Xiao, Jianfeng Feng, and Li Zhang. Dg-slam: Robust dynamic gaussian splatting slam with hybrid pose optimization.Advances in Neural Information Processing Systems, 37:51577–51596,

  47. [47]

    Gs-slam: Dense visual slam with 3d gaussian splatting.arXiv preprint arXiv:2311.11700, 2023

    Chi Yan, Delin Qu, Dong Wang, Dan Xu, Zhigang Wang, Bin Zhao, and Xuelong Li. Gs-slam: Dense visual slam with 3d gaussian splatting.arXiv preprint arXiv:2311.11700, 2023. 1

  48. [48]

    V ox-fusion: Dense tracking and mapping with voxel-based neural implicit representation

    Xingrui Yang, Hai Li, Hongjia Zhai, Yuhang Ming, Yuqian Liu, and Guofeng Zhang. V ox-fusion: Dense tracking and mapping with voxel-based neural implicit representation. In ISMAR, 2022. 1

  49. [49]

    Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting.arXiv preprint arXiv:2310.10642, 2023

    Zeyu Yang, Hongye Yang, Zijie Pan, Xiatian Zhu, and Li Zhang. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting.arXiv preprint arXiv:2310.10642, 2023. 1, 2

  50. [50]

    Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20331–20341, 2024. 1, 2

  51. [51]

    Ds-slam: A semantic visual slam towards dynamic environments

    Chao Yu, Zuxin Liu, Xin-Jun Liu, Fugui Xie, Yi Yang, Qi Wei, and Qiao Fei. Ds-slam: A semantic visual slam towards dynamic environments. 2018. 2

  52. [52]

    Improving 2d feature representations by 3d-aware fine-tuning

    Yuanwen Yue, Anurag Das, Francis Engelmann, Siyu Tang, and Jan Eric Lenssen. Improving 2d feature representations by 3d-aware fine-tuning. InEuropean Conference on Computer Vision, pages 57–74. Springer, 2024. 2

  53. [53]

    Os- wald

    Vladimir Yugay, Yue Li, Theo Gevers, and Martin R. Os- wald. Gaussian-slam: Photo-realistic dense slam with gaus- sian splatting, 2023. 1

  54. [54]

    Vdo-slam: a visual dynamic object-aware slam system.arXiv preprint, 2020

    Jun Zhang, Mina Henein, Robert Mahony, and Viorela Ila. Vdo-slam: a visual dynamic object-aware slam system.arXiv preprint, 2020. 2

  55. [55]

    Wildgs-slam: Monocular gaussian splatting slam in dynamic environments

    Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, and Iro Armeni. Wildgs-slam: Monocular gaussian splatting slam in dynamic environments. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 11461–11471, 2025. 1, 2

  56. [56]

    Deeptam: Deep tracking and mapping

    Huizhong Zhou, Benjamin Ummenhofer, and Thomas Brox. Deeptam: Deep tracking and mapping. InProceedings of the European conference on computer vision (ECCV), pages 822–838, 2018. 2

  57. [57]

    Oswald, and Marc Pollefeys

    Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R. Oswald, and Marc Pollefeys. Nice-slam: Neural implicit scalable encoding for slam. In CVPR, 2022. 1, 2