pith. sign in

arxiv: 2511.17207 · v2 · submitted 2025-11-21 · 💻 cs.CV · cs.RO

SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors

Pith reviewed 2026-05-17 20:33 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords Gaussian SLAMmonocular indoor SLAMsubmap alignmentglobal consistency3D reconstructionpose estimationnovel view rendering
0
0 comments X

The pith

SING3R-SLAM maintains global consistency in monocular indoor SLAM by using a persistent Gaussian map and submap-level alignment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes SING3R-SLAM to extend accurate local dense 3D reconstruction into incremental global mapping without the drift and scale problems common in existing SLAM systems. It builds a Global Gaussian Map that acts as persistent differentiable memory for the whole scene. Local submaps are aligned to this global structure, and the resulting global consistency is fed back to refine each local geometry. Experiments on real indoor datasets report over 10 percent better pose accuracy, finer reconstructions, and compact memory use while supporting novel view rendering.

Core claim

Our approach represents the scene with a Global Gaussian Map that serves as a persistent, differentiable memory, incorporates local geometric reconstruction via submap-level global alignment, and leverages global map's consistency to further refine local geometry. This design enables efficient and versatile 3D mapping for multiple downstream applications.

What carries the argument

The Global Gaussian Map that serves as persistent differentiable memory and supports submap-level global alignment to enforce consistency and refine local geometry.

If this is right

  • Improves pose accuracy by over 10 percent on real-world datasets
  • Produces finer and more detailed local geometry
  • Maintains a compact and memory-efficient global representation
  • Achieves state-of-the-art performance across pose estimation, 3D reconstruction, and novel view rendering
  • Supports multiple downstream 3D mapping applications

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The persistent map could support online updates in robotics navigation where new observations must correct earlier geometry without full re-optimization
  • Integration with semantic labels might allow the same alignment mechanism to handle moving objects by treating them as separate submaps

Load-bearing premise

Submap-level global alignment plus the global Gaussian map's consistency can be maintained incrementally without introducing new drift or scale inconsistencies that outweigh the claimed benefits.

What would settle it

If camera trajectories show increasing drift or reconstructed scales become inconsistent when processing long indoor sequences compared to ground-truth measurements, the incremental global consistency claim would be challenged.

Figures

Figures reproduced from arXiv: 2511.17207 by Federico Tombari, Kunyi Li, Michael Niemeyer, Nassir Navab, Sen Wang, Stefano Gasperini.

Figure 1
Figure 1. Figure 1: SING3R-SLAM is a submap-based monocular SLAM system enhanced by 3D priors. Left: our key modules, where tracking produces locally accurate point maps, mapping fuses them into a compact global representation, and joint optimization further refines poses and geometry, aided by bidirectional loop closure. Right: the resulting Gaussian map supports multiple downstream tasks with global geometry consistency, ex… view at source ↗
Figure 2
Figure 2. Figure 2: Overview. Our system comprises three main components: Sub-Track3R (top-middle), Mapper (right), and Loop Closure (bottom-left). The top-left shows that these components interact and exchange data through the keyframe buffer to maintain consistency. The Sub-Track3R performs tracking between submaps, predicting point maps and local poses that are aligned into the world coordinate system via inter-submap regi… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative Comparison of Reconstructed Point Clouds on 7-scenes [30]. We show the reconstructed point clouds with zoomed-in views for all methods. Our approach provides a compact Gaussian representation that is much cleaner and captures object geometry in detail, as illustrated in the last column. In contrast, other 3D reconstruction-based methods often produce many redundant points, which degrade visual … view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Comparison of Reconstructed Point Clouds on office. Left: RGB images from different views. Mid￾dle: VGGT-SLAM. Right: SING3R-SLAM (Ours). Our approach accurately aligns the wall and table across views, whereas VGGT￾SLAM produces misaligned and overlapping geometry. tive results, though the numerical metrics should be in￾terpreted cautiously because the ground-truth point clouds are incomplete (… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative Comparison of Reconstructed Meshes on Scannet-v2 [4]. We compare our reconstructed meshes with the Gaussian-based SLAM method HI-SLAM2 [42]. Our method successfully captures fine scene details, such as the bicycle in scene 0000 and the chair’s armrests in scene 0059, demonstrating superior geometric fidelity and reconstruction quality. Method Acc. ↓ Complet. ↓ Chamfer ↓ DROID-SLAM [31] 0.141 0.… view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative Comparison of Reconstructed Meshes on ScanNet-v2 [4]. 5 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Recent advances in dense 3D reconstruction have demonstrated strong capability in accurately capturing local geometry. However, extending these methods to incremental global reconstruction, as required in SLAM systems, remains challenging. Without explicit modeling of global geometric consistency, existing approaches often suffer from accumulated drift, scale inconsistency, and suboptimal local geometry. To address these issues, we propose SING3R-SLAM, a globally consistent Gaussian-based monocular indoor SLAM framework. Our approach represents the scene with a Global Gaussian Map that serves as a persistent, differentiable memory, incorporates local geometric reconstruction via submap-level global alignment, and leverages global map's consistency to further refine local geometry. This design enables efficient and versatile 3D mapping for multiple downstream applications. Extensive experiments show that SING3R-SLAM achieves state-of-the-art performance in pose estimation, 3D reconstruction, and novel view rendering. It improves pose accuracy by over 10%, produces finer and more detailed geometry, and maintains a compact and memory-efficient global representation on real-world datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces SING3R-SLAM, a monocular indoor SLAM framework that maintains a persistent Global Gaussian Map as differentiable memory, performs local 3D reconstruction via submap-level global alignment, and refines local geometry by leveraging the global map's consistency. It reports state-of-the-art results on real-world datasets for pose estimation (over 10% accuracy gain), 3D reconstruction quality, and novel-view rendering while keeping a compact representation.

Significance. If the incremental submap alignment and global-map consistency mechanism can be shown to control scale drift without introducing new inconsistencies, the work would offer a practical advance for dense Gaussian SLAM by bridging local reconstruction priors with long-term global coherence. The differentiable memory formulation and multi-task applicability are potentially useful strengths.

major comments (2)
  1. [§3.3] §3.3 (Submap-level Global Alignment): The optimization objective for aligning submaps into the global map is described at a high level but lacks an explicit scale-anchoring term or analysis showing that alignment residuals do not accumulate into long-term scale drift; in monocular indoor settings this is load-bearing for the central claim that global consistency refines rather than degrades local geometry.
  2. [§4.2] §4.2 (Ablation Studies): The reported 10% pose improvement is not accompanied by an ablation isolating the contribution of the global Gaussian map's consistency term versus submap alignment alone; without this, it is unclear whether the claimed refinement of local geometry is actually achieved or whether post-hoc tuning drives the gains.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from a brief equation or pseudocode snippet illustrating the global-map update rule to make the differentiable-memory claim more concrete.
  2. [Table 1] Table 1 (Quantitative Comparison): Clarify whether the reported metrics use the same evaluation protocol (e.g., ATE vs. RPE) as the baselines; minor inconsistencies in reporting can affect direct comparability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments on our manuscript. We have carefully considered each major point and made targeted revisions to strengthen the presentation of the submap alignment formulation and the ablation analysis. Our point-by-point responses follow, with changes incorporated into the revised manuscript.

read point-by-point responses
  1. Referee: [§3.3] §3.3 (Submap-level Global Alignment): The optimization objective for aligning submaps into the global map is described at a high level but lacks an explicit scale-anchoring term or analysis showing that alignment residuals do not accumulate into long-term scale drift; in monocular indoor settings this is load-bearing for the central claim that global consistency refines rather than degrades local geometry.

    Authors: We appreciate the referee's emphasis on this critical detail for monocular SLAM. The original §3.3 presents the submap alignment as an optimization that registers local submaps to the persistent Global Gaussian Map using differentiable rendering losses and 3D reconstruction priors. While scale consistency is implicitly encouraged through the global map's role as differentiable memory and the priors from the reconstruction model, we acknowledge that an explicit scale-anchoring term and supporting analysis were not provided. In the revision we have added the scale-anchoring term to the objective in §3.3 (defined as the squared difference between submap and global scale estimates derived from the Gaussian covariances) and included a new paragraph with empirical analysis of scale factors across long trajectories on the evaluated datasets, showing that residuals remain bounded without accumulation. These additions directly support the claim that global consistency refines local geometry. revision: yes

  2. Referee: [§4.2] §4.2 (Ablation Studies): The reported 10% pose improvement is not accompanied by an ablation isolating the contribution of the global Gaussian map's consistency term versus submap alignment alone; without this, it is unclear whether the claimed refinement of local geometry is actually achieved or whether post-hoc tuning drives the gains.

    Authors: We agree that a finer-grained ablation is necessary to isolate the contributions and validate the refinement mechanism. The original §4.2 reports comparisons of the full system against baselines and a variant without submap alignment, but does not explicitly disable only the global consistency refinement step. We have added a new ablation row and accompanying text in the revised §4.2 that evaluates a configuration using submap-level global alignment without the subsequent local geometry refinement via the global map's consistency term. The results demonstrate an incremental gain attributable to the consistency term, confirming that it contributes to the observed improvements in pose accuracy and local geometry beyond alignment alone. The updated tables and discussion clarify this separation. revision: yes

Circularity Check

0 steps flagged

No circularity in claimed derivation chain

full rationale

The paper describes its core design as an architectural choice: a Global Gaussian Map acting as persistent differentiable memory, combined with submap-level global alignment for local geometry and leveraging map consistency for refinement. This is presented as a proposed framework rather than a derivation that reduces by construction to its own fitted inputs or self-citations. No equations, self-definitional loops, or predictions equivalent to parameters are quoted or exhibited in the abstract or reader's summary. The approach is self-contained, with performance claims tied to external experimental validation on real-world datasets instead of internal redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not enumerate free parameters or axioms; the central design rests on the unstated assumption that submap alignment can be performed reliably and that the global Gaussian representation remains differentiable and memory-efficient at scale.

pith-pipeline@v0.9.0 · 5499 in / 1072 out tokens · 27854 ms · 2026-05-17T20:33:27.143862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MonoEM-GS: Monocular Expectation-Maximization Gaussian Splatting SLAM

    cs.RO 2026-04 unverdicted novelty 5.0

    MonoEM-GS stabilizes view-dependent geometry from foundation models inside a global Gaussian Splatting representation via EM and adds multi-modal features for in-place open-set segmentation.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Gala: Guided attention with language alignment for open vocabulary gaussian splatting.arXiv preprint arXiv:2508.14278, 2025

    Elena Alegret, Kunyi Li, Sen Wang, Siyun Liang, Michael Niemeyer, Stefano Gasperini, Nassir Navab, and Federico Tombari. Gala: Guided attention with language align- ment for open vocabulary gaussian splatting.arXiv preprint arXiv:2508.14278, 2025. 1, 2

  2. [2]

    Modern approaches to augmented reality

    Oliver Bimber and Ramesh Raskar. Modern approaches to augmented reality. InAcm siggraph 2006 courses, pages 1– es. 2006. 1

  3. [3]

    Splat-nav: Safe real-time robot navigation in gaussian splatting maps.IEEE Transactions on Robotics,

    Timothy Chen, Ola Shorinwa, Joseph Bruno, Aiden Swann, Javier Yu, Weijia Zeng, Keiko Nagami, Philip Dames, and Mac Schwager. Splat-nav: Safe real-time robot navigation in gaussian splatting maps.IEEE Transactions on Robotics,

  4. [4]

    Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner

    Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017. 6, 7, 8, 2, 3, 5

  5. [5]

    Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion

    Bardienus Pieter Duisterhof, Lojze Zust, Philippe Weinza- epfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion. In2025 International Conference on 3D Vision (3DV), pages 1–10. IEEE, 2025. 2

  6. [6]

    Omnidata: A scalable pipeline for making multi- task mid-level vision datasets from 3d scans

    Ainaz Eftekhar, Alexander Sax, Jitendra Malik, and Amir Zamir. Omnidata: A scalable pipeline for making multi- task mid-level vision datasets from 3d scans. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 10786–10796, 2021. 2

  7. [7]

    Motiongs: Compact gaussian splatting slam by motion filter

    Xinli Guo, Weidong Zhang, Ruonan Liu, Peng Han, and Hongtian Chen. Motiongs: Compact gaussian splatting slam by motion filter. In2024 7th International Conference on Robotics, Control and Automation Engineering (RCAE), pages 685–692. IEEE, 2024. 3

  8. [8]

    Determining opti- cal flow.Artificial intelligence, 17(1-3):185–203, 1981

    Berthold KP Horn and Brian G Schunck. Determining opti- cal flow.Artificial intelligence, 17(1-3):185–203, 1981. 2

  9. [9]

    A comparison of sift, pca-sift and surf.International Journal of Image Processing (IJIP), 3(4):143–152, 2009

    Luo Juan and Oubong Gwun. A comparison of sift, pca-sift and surf.International Journal of Image Processing (IJIP), 3(4):143–152, 2009. 2

  10. [10]

    Splatam: Splat track & map 3d gaussians for dense rgb-d slam

    Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat track & map 3d gaussians for dense rgb-d slam. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 21357–21366, 2024. 2, 3, 5, 6

  11. [11]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  12. [12]

    Gaussnav: Gaussian splatting for visual navigation.IEEE Transactions on Pattern Analysis and Machine Intelligence,

    Xiaohan Lei, Min Wang, Wengang Zhou, and Houqiang Li. Gaussnav: Gaussian splatting for visual navigation.IEEE Transactions on Pattern Analysis and Machine Intelligence,

  13. [13]

    Ground- ing image matching in 3d with mast3r

    Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Ground- ing image matching in 3d with mast3r. InEuropean Confer- ence on Computer Vision, pages 71–91. Springer, 2024. 1, 2

  14. [14]

    Dns-slam: Dense neural semantic-informed slam

    Kunyi Li, Michael Niemeyer, Nassir Navab, and Federico Tombari. Dns-slam: Dense neural semantic-informed slam. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7839–7846. IEEE, 2024. 1

  15. [15]

    Monogs++: Fast and accurate monocular rgb gaussian slam.arXiv preprint arXiv:2504.02437, 2025

    Renwu Li, Wenjing Ke, Dong Li, Lu Tian, and Emad Bar- soum. Monogs++: Fast and accurate monocular rgb gaussian slam.arXiv preprint arXiv:2504.02437, 2025. 3

  16. [16]

    4d gaussian splatting slam

    Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, and Federico Tombari. 4d gaussian splatting slam. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 25019–25028, 2025. 1, 2

  17. [17]

    Supergseg: Open-vocabulary 3d segmentation with structured super-gaussians

    Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Ste- fano Gasperini, Nassir Navab, and Federico Tombari. Su- pergseg: Open-vocabulary 3d segmentation with structured super-gaussians.arXiv preprint arXiv:2412.10231, 2024. 1, 2

  18. [18]

    Slam3r: Real- time dense scene reconstruction from monocular rgb videos

    Yuzheng Liu, Siyan Dong, Shuzhe Wang, Yingda Yin, Yan- chao Yang, Qingnan Fan, and Baoquan Chen. Slam3r: Real- time dense scene reconstruction from monocular rgb videos. InProceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 16651–16662, 2025. 2, 3

  19. [19]

    Instructnav: Zero-shot system for generic instruction navigation in unexplored environment.arXiv preprint arXiv:2406.04882, 2024

    Yuxing Long, Wenzhe Cai, Hongcheng Wang, Guanqi Zhan, and Hao Dong. Instructnav: Zero-shot system for generic instruction navigation in unexplored environment.arXiv preprint arXiv:2406.04882, 2024. 1

  20. [20]

    VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

    Dominic Maggio, Hyungtae Lim, and Luca Carlone. Vggt- slam: Dense rgb slam optimized on the sl (4) manifold.arXiv preprint arXiv:2505.12549, 2025. 2, 3, 4, 5, 6, 7, 8, 1

  21. [21]

    Gaussian splatting slam

    Hidenobu Matsuki, Riku Murai, Paul HJ Kelly, and An- drew J Davison. Gaussian splatting slam. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 18039–18048, 2024. 1, 2, 3, 6

  22. [22]

    Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163,

    Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163,

  23. [23]

    Mast3r-slam: Real-time dense slam with 3d reconstruction priors

    Riku Murai, Eric Dexheimer, and Andrew J Davison. Mast3r-slam: Real-time dense slam with 3d reconstruction priors. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16695–16705, 2025. 2, 3, 5, 6, 7, 8, 1

  24. [24]

    Learn- ing neural exposure fields for view synthesis

    Michael Niemeyer, Fabian Manhardt, Marie-Julie Rako- tosaona, Christina Tsalicoglou Michael Oechsle, Keisuke Tateno, Jonathan T Barron, and Federico Tombari. Learn- ing neural exposure fields for view synthesis. InNeurIPS,

  25. [25]

    Openscene: 3d scene understanding with open vocabularies

    Songyou Peng, Kyle Genova, Chiyu Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser, et al. Openscene: 3d scene understanding with open vocabularies. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 815–824, 2023. 2

  26. [26]

    Langsplat: 3d language gaussian splatting

    Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, and Hanspeter Pfister. Langsplat: 3d language gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20051–20060, 2024. 2 9

  27. [27]

    Orb: An efficient alternative to sift or surf

    Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In2011 International conference on computer vision, pages 2564–

  28. [28]

    Splat-slam: Globally optimized rgb-only slam with 3d gaus- sians

    Erik Sandstr ¨om, Ganlin Zhang, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Youmin Zhang, Manthan Pa- tel, Luc Van Gool, Martin Oswald, and Federico Tombari. Splat-slam: Globally optimized rgb-only slam with 3d gaus- sians. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1680–1691, 2025. 2, 3, 6

  29. [29]

    Addison-Wesley Professional, 2016

    Dieter Schmalstieg and Tobias Hollerer.Augmented reality: principles and practice. Addison-Wesley Professional, 2016. 1

  30. [30]

    Scene co- ordinate regression forests for camera relocalization in rgb-d images

    Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene co- ordinate regression forests for camera relocalization in rgb-d images. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2930–2937, 2013. 6, 7, 2, 4

  31. [31]

    Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras.Advances in neu- ral information processing systems, 34:16558–16569, 2021

    Zachary Teed and Jia Deng. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras.Advances in neu- ral information processing systems, 34:16558–16569, 2021. 1, 2, 3, 8

  32. [32]

    Probabilistic robotics.Communications of the ACM, 45(3):52–57, 2002

    Sebastian Thrun. Probabilistic robotics.Communications of the ACM, 45(3):52–57, 2002. 1

  33. [33]

    3d reconstruction with spatial memory

    Hengyi Wang and Lourdes Agapito. 3d reconstruction with spatial memory. In2025 International Conference on 3D Vision (3DV), pages 78–89. IEEE, 2025. 2

  34. [34]

    Vggt: Vi- sual geometry grounded transformer

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Vi- sual geometry grounded transformer. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5294–5306, 2025. 1, 2, 3

  35. [35]

    Continuous 3d per- ception model with persistent state

    Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A Efros, and Angjoo Kanazawa. Continuous 3d per- ception model with persistent state. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10510–10522, 2025. 2, 6, 1

  36. [36]

    Dust3r: Geometric 3d vi- sion made easy

    Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20697– 20709, 2024. 1, 2, 3

  37. [37]

    Depth anything: Unleashing the power of large-scale unlabeled data

    Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10371–10381, 2024. 2

  38. [38]

    Yugay, Y

    Vladimir Yugay, Yue Li, Theo Gevers, and Martin R Os- wald. Gaussian-slam: Photo-realistic dense slam with gaus- sian splatting.arXiv preprint arXiv:2312.10070, 2023. 3, 6

  39. [39]

    Rade-gs: Rasterizing depth in gaussian splatting.arXiv preprint arXiv:2406.01467, 2024

    Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, and Ping Tan. Rade-gs: Rasterizing depth in gaussian splatting.arXiv preprint arXiv:2406.01467, 2024. 4, 6

  40. [40]

    Vista-slam: Visual slam with symmetric two-view associa- tion.arXiv preprint arXiv:2509.01584, 2025

    Ganlin Zhang, Shenhan Qian, Xi Wang, and Daniel Cremers. Vista-slam: Visual slam with symmetric two-view associa- tion.arXiv preprint arXiv:2509.01584, 2025. 3, 6

  41. [41]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6

  42. [42]

    Hi-slam2: Geometry- aware gaussian slam for fast monocular scene reconstruction

    Wei Zhang, Qing Cheng, David Skuddis, Niclas Zeller, Daniel Cremers, and Norbert Haala. Hi-slam2: Geometry- aware gaussian slam for fast monocular scene reconstruction. arXiv preprint arXiv:2411.17982, 2024. 2, 3, 4, 5, 6, 7, 8, 1

  43. [43]

    Wildgs-slam: Monocular gaussian splatting slam in dynamic environments

    Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, and Iro Armeni. Wildgs-slam: Monocular gaussian splatting slam in dynamic environments. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 11461–11471, 2025. 3 10 SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors Suppl...

  44. [44]

    In contrast, our method tightly integrates 3D reconstruction with a globally optimized Gaussian map, en- abling both stable tracking and high-fidelity geometry

    attains the lowest average ATE, it relies on external tracking from DROID-SLAM [31], resulting in a decou- pled tracking–mapping pipeline that limits its reconstruc- tion quality. In contrast, our method tightly integrates 3D reconstruction with a globally optimized Gaussian map, en- abling both stable tracking and high-fidelity geometry. Compared with re...

  45. [45]

    and VGGT-SLAM [34], which show large pose errors on several scenes, SING3R-SLAM provides consistently strong pose estimates while achieving substantially better 3D reconstructions. This demonstrates that our unified de- sign yields advantages in both trajectory accuracy and ge- ometric quality, outperforming methods focused solely on either reconstruction...