Robust and Efficient Monocular 3D Gaussian SLAM for Kilometer-Scale Outdoor Scenes

Beizhen Zhao; Dongxu Shen; Guanzhi Ding; Hao Wang; Sicheng Yu

arxiv: 2606.30436 · v1 · pith:7KZXDA3Mnew · submitted 2026-06-29 · 💻 cs.CV

Robust and Efficient Monocular 3D Gaussian SLAM for Kilometer-Scale Outdoor Scenes

Sicheng Yu , Dongxu Shen , Beizhen Zhao , Guanzhi Ding , Hao Wang This is my paper

Pith reviewed 2026-06-30 05:54 UTC · model grok-4.3

classification 💻 cs.CV

keywords monocular SLAM3D Gaussian Splattinglarge-scale outdoor mappingpose trackingmemory-efficient mappinghybrid trackingkilometer-scale scenes

0 comments

The pith

KiloGS-SLAM keeps camera poses stable and memory low while scaling monocular 3D Gaussian mapping to kilometer outdoor scenes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KiloGS-SLAM to overcome two linked problems that stop monocular 3D Gaussian Splatting SLAM from working at kilometer scale: long-term pose drift and memory growth that exhausts available hardware. It adds a motion-adaptive hybrid tracking module whose three-tier pipeline switches between Essential matrix and PnP solvers when geometry becomes degenerate and can call an external foundation model to correct major drift. A second part, the lifecycle-managed Gaussian mapping, uses probabilistic initialization, chunk-wise densification, and pruning to cut redundant primitives while keeping fine detail. The result is a system that runs sequences longer than ten thousand frames on one GPU and reports state-of-the-art tracking and rendering numbers on outdoor benchmarks. Readers care because these two fixes together make high-quality 3D scene reconstruction feasible for real roads, campuses, or city blocks without special hardware.

Core claim

KiloGS-SLAM jointly solves fragile long-term pose tracking and excessive memory overhead in monocular 3DGS-SLAM for kilometer-scale scenes through a motion-adaptive hybrid tracking module and a lifecycle-managed Gaussian mapping strategy, achieving state-of-the-art performance on challenging outdoor datasets with sequences over 10,000 frames on a single GPU.

What carries the argument

Motion-adaptive hybrid tracking module whose condition-triggered three-tier pipeline switches between Essential matrix and PnP models, together with the lifecycle-managed Gaussian mapping that applies probabilistic initialization, chunk-based multi-view densification, and pruning.

If this is right

Drift-free poses supplied by the hybrid tracker supply the geometric foundation required for accurate large-scale mapping.
The lifecycle-managed mapping keeps primitive count low enough for sustained operation across long trajectories without memory exhaustion.
The full pipeline produces state-of-the-art tracking accuracy and rendering quality on the tested outdoor datasets.
The system runs sequences exceeding 10,000 frames on a single GPU.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same switching logic between geometric solvers and learned rescue could be added to other monocular SLAM back-ends that currently fail on degenerate motion.
Chunk-based densification and pruning may reduce memory growth in any Gaussian-based reconstruction pipeline, not only SLAM.
On-demand foundation-model rescue points toward future systems that combine classical geometry checks with learned components only when needed.

Load-bearing premise

The condition-triggered pipeline can correctly detect when to switch models and when to invoke the foundation model to prevent unrecoverable drift.

What would settle it

Running the system on any of the three outdoor test sequences longer than 10,000 frames and finding either tracking loss that the foundation model does not recover or memory use that exceeds a single GPU before the sequence ends.

Figures

Figures reproduced from arXiv: 2606.30436 by Beizhen Zhao, Dongxu Shen, Guanzhi Ding, Hao Wang, Sicheng Yu.

**Figure 1.** Figure 1: KiloGS-SLAM achieves robust pose tracking and precise scene reconstruction in kilometer-scale outdoor environments using only monocular RGB input, enabling high-fidelity novel view synthesis. Abstract. Scaling monocular 3D Gaussian Splatting (3DGS) SLAM to kilometer-level outdoor environments poses two tightly coupled challenges: fragile long-term pose tracking and excessive memory overhead during large-s… view at source ↗

**Figure 2.** Figure 2: Performance comparison on KITTI-00. Our method strikes the optimal balance between rendering quality, runtime, and memory overhead, while achieving the lowest camera tracking error. than a primary pose estimator, we achieve the speed and precision of geometric solvers, backed by the robustness of deep priors. Given these reliable poses, we tackle the memory bottleneck by proposing a lifecycle-managed Gauss… view at source ↗

**Figure 3.** Figure 3: Framework of KiloGS-SLAM. Input RGB frames first undergo sparse matching and dynamic filtering once entering the tracking module. A dual-modal pose estimator dynamically switches between Essential and PnP matrices to handle degeneracies. The estimation is verified against a sliding-window motion prior, triggering an on-demand foundation model dense matching upon failure. Valid poses proceed to the Mapping… view at source ↗

**Figure 4.** Figure 4: Tracking trajectories. Our method maintains robust pose estimation and global consistency across long-term sequences, whereas other baselines inevitably suffer from significant drift in localized challenging segments. frames, 5 km), and the highly challenging KITTI-360 [22] (up to 13,888 frames, 11.6 km). We omit KITTI-01, as its extreme highway speeds and lack of trackable near-field features cause univer… view at source ↗

**Figure 5.** Figure 5: Qualitative rendering comparisons on Waymo (top 2) and KITTI [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Additional tracking trajectories. Qualitative evaluation of camera poses on extended outdoor sequences. KiloGS-SLAM reliably mitigates drift and maintains strong global consistency compared to other approaches [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Additional qualitative rendering results on the Waymo dataset. [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Additional qualitative rendering results on the KITTI dataset. [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

read the original abstract

Scaling monocular 3D Gaussian Splatting (3DGS) SLAM to kilometer-level outdoor environments poses two tightly coupled challenges: fragile long-term pose tracking and excessive memory overhead during large-scale mapping. In this paper, we propose KiloGS-SLAM, a highly efficient and robust monocular 3DGS-SLAM system that jointly addresses both bottlenecks. Since high-fidelity scene reconstruction fundamentally relies on drift-free camera poses, we first introduce a motion-adaptive hybrid tracking module. This module features a condition-triggered three-tier solving pipeline. It dynamically switches between Essential matrix and PnP models to handle geometric degeneracies. An on-demand foundation model can also be activated to rescue the trajectory from catastrophic drift. To ensure the system can sustain these long trajectories without memory exhaustion, we subsequently design a lifecycle-managed Gaussian mapping strategy. By integrating probabilistic initialization with chunk-based multi-view densification and pruning, this full-pipeline optimization effectively reduces primitive redundancy while preserving high-frequency details. Together, the robust tracking guarantees the geometric foundation required for accurate mapping, while the memory-efficient lifecycle-managed mapping enables large-scale operation. Extensive experiments across three challenging outdoor datasets demonstrate that our approach achieves state-of-the-art tracking accuracy and rendering quality, successfully scaling to sequences of over 10,000 frames on a single GPU.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KiloGS-SLAM proposes a hybrid tracking pipeline and lifecycle Gaussian mapping for km-scale outdoor monocular 3DGS SLAM, but the abstract supplies no metrics or switching criteria to check the claims.

read the letter

KiloGS-SLAM claims to scale monocular 3D Gaussian Splatting SLAM to kilometer outdoor scenes with a hybrid tracking module and memory-efficient mapping, but the abstract provides no quantitative results to support the SOTA claims.

The new parts are the condition-triggered three-tier tracking pipeline that switches between Essential matrix and PnP to deal with degeneracies, plus an on-demand foundation model rescue, and the lifecycle-managed Gaussian strategy using probabilistic init, chunk densification, and pruning.

These target the two stated problems directly: fragile long-term poses and memory overhead. The system is said to run over 10k frames on a single GPU, which would be useful for practical robotics if it holds.

The approach seems sensible for outdoor monocular setups where pure geometric tracking often fails.

The main issue is the lack of specifics. The decision criteria for switching tiers or activating the foundation model are not described, so it's unclear if the robustness is general or just works on the three datasets. No error metrics, ablation studies, or comparisons appear in the abstract, making the performance claims impossible to assess from what's here.

The full paper would need to show those details and the actual numbers for the tracking accuracy and rendering quality.

This is aimed at people working on 3D reconstruction and SLAM for large-scale outdoor applications. Readers interested in 3DGS extensions for real-world use might find the engineering choices worth looking at.

It deserves peer review because the problem is relevant and the proposed modules are concrete, even if the current writeup is limited to the abstract level. A referee could check if the experiments back up the scaling claims.

Referee Report

1 major / 1 minor

Summary. The paper presents KiloGS-SLAM, a monocular 3D Gaussian Splatting SLAM system for kilometer-scale outdoor scenes. It introduces a motion-adaptive hybrid tracking module featuring a condition-triggered three-tier pipeline that switches between Essential matrix and PnP solvers while using an on-demand foundation model to rescue from drift, paired with a lifecycle-managed Gaussian mapping strategy that employs probabilistic initialization, chunk-based multi-view densification, and pruning to control memory use. Experiments on three challenging outdoor datasets are said to demonstrate state-of-the-art tracking accuracy and rendering quality while scaling to sequences exceeding 10,000 frames on a single GPU.

Significance. If the long-term robustness claims hold, the work would advance scalable 3DGS SLAM for large outdoor environments by jointly addressing pose drift and memory overhead, enabling applications in autonomous driving and large-scale reconstruction where prior methods typically fail.

major comments (1)

[§3.2] §3.2 (Motion-Adaptive Hybrid Tracking Module): The three-tier pipeline is described as dynamically switching between Essential matrix and PnP models to handle geometric degeneracies, with on-demand foundation-model rescue. No explicit decision criteria (reprojection thresholds, eigenvalue ratios of the essential matrix, degeneracy scores, or failure-detection heuristics) are supplied. Without these, the reliability of the switches cannot be verified and the drift-free tracking claim over >10k frames remains untestable.

minor comments (1)

[Abstract] Abstract: The SOTA claims are stated without any numerical metrics, error values, or dataset-specific results, which weakens immediate substantiation even though the full experiments section presumably contains them.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment regarding the decision criteria in the motion-adaptive hybrid tracking module. We agree that explicit details are required for reproducibility and will incorporate them in the revision.

read point-by-point responses

Referee: [§3.2] §3.2 (Motion-Adaptive Hybrid Tracking Module): The three-tier pipeline is described as dynamically switching between Essential matrix and PnP models to handle geometric degeneracies, with on-demand foundation-model rescue. No explicit decision criteria (reprojection thresholds, eigenvalue ratios of the essential matrix, degeneracy scores, or failure-detection heuristics) are supplied. Without these, the reliability of the switches cannot be verified and the drift-free tracking claim over >10k frames remains untestable.

Authors: We acknowledge that while the manuscript refers to a 'condition-triggered' pipeline, it does not supply the concrete thresholds, eigenvalue ratios, degeneracy scores, or failure-detection heuristics used to switch between the Essential matrix solver, PnP solver, and foundation-model rescue. We will revise Section 3.2 to include these explicit criteria (e.g., reprojection error thresholds for solver selection, condition number or eigenvalue ratio tests for degeneracy detection, and heuristics for triggering the foundation model), together with pseudocode of the three-tier decision logic. This addition will make the switching behavior verifiable and strengthen support for the long-sequence tracking results. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on external dataset evaluation

full rationale

The paper proposes two new algorithmic modules (motion-adaptive hybrid tracking with three-tier pipeline and lifecycle-managed Gaussian mapping) and supports its scaling claims via experiments on three external outdoor datasets. No equations, parameters, or uniqueness theorems are shown to reduce to self-fit inputs or self-citations. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard SLAM domain assumptions and introduces new algorithmic components without detailing fitted parameters or new entities.

axioms (1)

domain assumption High-fidelity scene reconstruction fundamentally relies on drift-free camera poses
Stated explicitly in the abstract as the motivation for the tracking module.

pith-pipeline@v0.9.1-grok · 5775 in / 1164 out tokens · 29131 ms · 2026-06-30T05:54:13.330877+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 12 canonical work pages · 4 internal anchors

[1]

IEEE transactions on robotics37(6), 1874–1890 (2021)

Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE transactions on robotics37(6), 1874–1890 (2021)

2021
[2]

Longstream: Long-sequence streaming autoregressive visual geometry.arXiv preprint arXiv:2602.13172,

Cheng, C., Chen, X., Xie, T., Yin, W., Ren, W., Zhang, Q., Guo, X., Wang, H.: Longstream: Long-sequence streaming autoregressive visual geometry. arXiv preprint arXiv:2602.13172 (2026)

work page arXiv 2026
[3]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cheng, C., Hu, Y., Yu, S., Zhao, B., Wang, Z., Wang, H.: Reggs: Unposed sparse views gaussian splatting with 3dgs registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8100–8109 (2025)

2025
[4]

arXiv preprint arXiv:2507.18541 (2025)

Cheng, C., Wang, Z., Yu, S., Hu, Y., Yao, N., Wang, H.: Unposed 3dgs recon- struction with probabilistic procrustes mapping. arXiv preprint arXiv:2507.18541 (2025)

work page arXiv 2025
[5]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cheng, C., Yu, S., Wang, Z., Zhou, Y., Wang, H.: Outdoor monocular slam with global scale-consistent 3d gaussian pointmaps. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26035–26044 (2025)

2025
[6]

arXiv preprint arXiv:2412.11530 (2024)

Cheng, J., Cai, Z., Zhang, Z., Yin, W., Muller, M., Paulitsch, M., Yang, X.: Romeo: Robust metric visual odometry. arXiv preprint arXiv:2412.11530 (2024)

work page arXiv 2024
[7]

VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences

Deng, K., Ti, Z., Xu, J., Yang, J., Xie, J.: Vggt-long: Chunk it, loop it, align it–pushing vggt’s limits on kilometer-scale long rgb sequences. arXiv preprint arXiv:2507.16443 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers

Deng, K., Zhang, Y., Yang, J., Xie, J.: Gigaslam: Large-scale monocular slam with hierarchical gaussian splats. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–10 (2025)

2025
[9]

arXiv preprint arXiv:2505.18992 (2025)

Deng, T., Wu, W., He, J., Pan, Y., Jiang, X., Yuan, S., Wang, D., Wang, H., Chen, W.: Vpgs-slam: Voxel-based progressive 3d gaussian slam in large-scale scenes. arXiv preprint arXiv:2505.18992 (2025)

work page arXiv 2025
[10]

In: European conference on computer vision

Engel, J., Schöps, T., Cremers, D.: Lsd-slam: Large-scale direct monocular slam. In: European conference on computer vision. pp. 834–849. Springer (2014)

2014
[11]

IEEE Transactions on robotics28(5), 1188–1197 (2012)

Gálvez-López, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Transactions on robotics28(5), 1188–1197 (2012)

2012
[12]

The international journal of robotics research32(11), 1231–1237 (2013)

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. The international journal of robotics research32(11), 1231–1237 (2013)

2013
[13]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

Homeyer, C., Begiristain, L., Schnörr, C.: DROID-Splat: Combining end-to-end SLAM with 3D gaussian splatting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp. 2788–2798 (2025)

2025
[14]

IEEE Robotics and Automation Letters9(11), 9765–9772 (2024)

Hong, S., He, J., Zheng, X., Zheng, C.: Liv-gaussmap: Lidar-inertial-visual fusion for real-time 3d radiance field map rendering. IEEE Robotics and Automation Letters9(11), 9765–9772 (2024)

2024
[15]

In: European Conference on Computer Vision

Hu, J., Chen, X., Feng, B., Li, G., Yang, L., Bao, H., Zhang, G., Cui, Z.: Cg-slam: Efficient dense rgb-d slam in a consistent uncertainty-aware 3d gaussian field. In: European Conference on Computer Vision. pp. 93–112. Springer (2024)

2024
[16]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings

Hu, Y., Cheng, C., Yu, S., Guo, X., Wang, H.: Vggt4d: Mining motion cues in visual geometry transformers for 4d scene reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings. pp. 414–424 (June 2026)

2026
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Huang, H., Li, L., Cheng, H., Yeung, S.K.: Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21584–21593 (2024) KiloGS-SLAM 27

2024
[18]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Keetha, N., Karhade, J., Jatavallabhula, K.M., Yang, G., Scherer, S., Ramanan, D., Luiten, J.: Splatam: Splat track & map 3d gaussians for dense rgb-d slam. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 21357–21366 (2024)

2024
[19]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G., et al.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139:1–139:14 (2023)

2023
[20]

International journal of computer vision81(2), 155–166 (2009)

Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurateO(n)solution to the PnP problem. International journal of computer vision81(2), 155–166 (2009)

2009
[21]

In: European conference on computer vision

Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: European conference on computer vision. pp. 71–91. Springer (2024)

2024
[22]

IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022)

Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022)

2022
[23]

In: European Conference on Computer Vision

Lipson, L., Teed, Z., Deng, J.: Deep patch visual slam. In: European Conference on Computer Vision. pp. 424–440. Springer (2024)

2024
[24]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Lu, T., Yu, M., Xu, L., Xiangli, Y., Wang, L., Lin, D., Dai, B.: Scaffold-gs: Struc- tured 3d gaussians for view-adaptive rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20654–20664 (2024)

2024
[25]

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Maggio, D., Lim, H., Carlone, L.: Vggt-slam: Dense rgb slam optimized on the sl (4) manifold. arXiv preprint arXiv:2505.12549 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Matsuki, H., Murai, R., Kelly, P.H., Davison, A.J.: Gaussian splatting slam. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 18039–18048 (2024)

2024
[27]

ACM Transactions on Graphics (TOG)44(4), 1–14 (2025)

Meuleman, A., Shah, I., Lanvin, A., Kerbl, B., Drettakis, G.: On-the-fly reconstruc- tion for large-scale novel view synthesis from unposed images. ACM Transactions on Graphics (TOG)44(4), 1–14 (2025)

2025
[28]

IEEE transactions on robotics33(5), 1255–1262 (2017)

Mur-Artal, R., Tardós, J.D.: Orb-slam2: An open-source slam system for monoc- ular, stereo, and rgb-d cameras. IEEE transactions on robotics33(5), 1255–1262 (2017)

2017
[29]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Murai, R., Dexheimer, E., Davison, A.J.: Mast3r-slam: Real-time dense slam with 3d reconstruction priors. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16695–16705 (2025)

2025
[30]

IEEE trans- actions on pattern analysis and machine intelligence26(6), 756–770 (2004)

Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE trans- actions on pattern analysis and machine intelligence26(6), 756–770 (2004)

2004
[31]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

Piccinelli, L., Sakaridis, C., Yang, Y.H., Segu, M., Li, S., Abbeloos, W., Van Gool, L.: Unidepthv2: Universal monocular metric depth estimation made simpler. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

2025
[32]

Springer (2006)

Rajamani, R.: Vehicle dynamics and control. Springer (2006)

2006
[33]

In: International Conference on Learning Representations

Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. In: International Conference on Learning Representations. vol. 2025, pp. 28085–28128 (2025)

2025
[34]

arXiv preprint arXiv:2511.04283 (2025)

Ren, S., Wen, T., Fang, Y., Lu, B.: Fastgs: Training 3d gaussian splatting in 100 seconds. arXiv preprint arXiv:2511.04283 (2025)

work page arXiv 2025
[35]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Sandström,E.,Zhang,G.,Tateno,K.,Oechsle,M.,Niemeyer,M.,Zhang,Y.,Patel, M., Van Gool, L., Oswald, M., Tombari, F.: Splat-slam: Globally optimized rgb- only slam with 3d gaussians. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 1680–1691 (2025) 28 S. Yu et al

2025
[36]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2446–2454 (2020)

2020
[37]

Advances in neural information processing systems34, 16558–16569 (2021)

Teed, Z., Deng, J.: Droid-slam: Deep visual slam for monocular, stereo, and rgb- d cameras. Advances in neural information processing systems34, 16558–16569 (2021)

2021
[38]

Advances in Neural Information Processing Systems36, 39033–39051 (2023)

Teed, Z., Lipson, L., Deng, J.: Deep patch visual odometry. Advances in Neural Information Processing Systems36, 39033–39051 (2023)

2023
[39]

Advances in neural information processing systems33, 14254–14265 (2020)

Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: Learning local features with policy gra- dient. Advances in neural information processing systems33, 14254–14265 (2020)

2020
[40]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5294–5306 (2025)

2025
[41]

arXiv preprint arXiv:2602.04251 (2026)

Wang, L., Gong, R., Han, Y., Yang, L., Yang, L., Li, Y., Xu, B., Liu, H., Fu, R.: Towards next-generation slam: A survey on 3dgs-slam focusing on performance, robustness, and future directions. arXiv preprint arXiv:2602.04251 (2026)

work page arXiv 2026
[42]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20697–20709 (2024)

2024
[43]

Wu, C., Duan, Y., Zhang, X., Sheng, Y., Ji, J., Zhang, Y.: Mm-gaussian: 3d gaussian-based multi-modal fusion for localization and reconstruction in un- boundedscenes.In:2024IEEE/RSJInternationalConferenceonIntelligentRobots and Systems (IROS). pp. 12287–12293. IEEE (2024)

2024
[44]

IEEE Transactions on Robotics (2025)

Wu, K., Zhang, Z., Tie, M., Ai, Z., Gan, Z., Ding, W.: Vings-mono: Visual-inertial gaussian splatting monocular slam in large scenes. IEEE Transactions on Robotics (2025)

2025
[45]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yan, C., Qu, D., Xu, D., Zhao, B., Wang, Z., Wang, D., Li, X.: Gs-slam: Dense vi- sual slam with 3d gaussian splatting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19595–19604 (2024)

2024
[46]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA)

Yu, S., Cheng, C., Zhou, Y., Yang, X., Wang, H.: Rgb-only gaussian splatting slam for unbounded outdoor scenes. In: 2025 IEEE International Conference on Robotics and Automation (ICRA). pp. 11068–11074. IEEE (2025)

2025
[47]

Gaussian-slam: Photo-realistic dense slam with gaussian splatting,

Yugay, V., Li, Y., Gevers, T., Oswald, M.R.: Gaussian-slam: Photo-realistic dense slam with gaussian splatting. arXiv preprint arXiv:2312.10070 (2023)

work page arXiv 2023
[48]

Zhan, H., Weerasekera, C.S., Bian, J.W., Garg, R., Reid, I.: Df-vo: What should be learnt for visual odometry? arXiv preprint arXiv:2103.00933 (2021)

work page arXiv 2021
[49]

Zhao, B., Yu, S., Yin, Z., Shen, D., Wang, H.: Mmgs:10×compressed 3dgs throughoptimaltransportaggregationbasedonmulti-viewranking.arXivpreprint arXiv:2605.19304 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[50]

3D Skew Gaussian Splatting with Any Camera Trajectory Visualization Engine

Zhao, B., Zhou, Y., Song, G., Yin, Z., Wang, H.: 3d skew gaussian splatting with any camera trajectory visualization engine. arXiv preprint arXiv:2605.18334 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[51]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Zhao, B., Zhou, Y., Yu, S., Wang, Z., Wang, H.: Wavelet-gs: 3d gaussian splat- ting with wavelet decomposition. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 8616–8625 (2025)

2025
[52]

IEEE Robotics and Automation Letters9(11), 9486–9493 (2024)

Zhu, P., Zhuang, Y., Chen, B., Li, L., Wu, C., Liu, Z.: Mgs-slam: Monocular sparse tracking and gaussian mapping with depth smooth regularization. IEEE Robotics and Automation Letters9(11), 9486–9493 (2024)

2024

[1] [1]

IEEE transactions on robotics37(6), 1874–1890 (2021)

Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE transactions on robotics37(6), 1874–1890 (2021)

2021

[2] [2]

Longstream: Long-sequence streaming autoregressive visual geometry.arXiv preprint arXiv:2602.13172,

Cheng, C., Chen, X., Xie, T., Yin, W., Ren, W., Zhang, Q., Guo, X., Wang, H.: Longstream: Long-sequence streaming autoregressive visual geometry. arXiv preprint arXiv:2602.13172 (2026)

work page arXiv 2026

[3] [3]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cheng, C., Hu, Y., Yu, S., Zhao, B., Wang, Z., Wang, H.: Reggs: Unposed sparse views gaussian splatting with 3dgs registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8100–8109 (2025)

2025

[4] [4]

arXiv preprint arXiv:2507.18541 (2025)

Cheng, C., Wang, Z., Yu, S., Hu, Y., Yao, N., Wang, H.: Unposed 3dgs recon- struction with probabilistic procrustes mapping. arXiv preprint arXiv:2507.18541 (2025)

work page arXiv 2025

[5] [5]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Cheng, C., Yu, S., Wang, Z., Zhou, Y., Wang, H.: Outdoor monocular slam with global scale-consistent 3d gaussian pointmaps. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26035–26044 (2025)

2025

[6] [6]

arXiv preprint arXiv:2412.11530 (2024)

Cheng, J., Cai, Z., Zhang, Z., Yin, W., Muller, M., Paulitsch, M., Yang, X.: Romeo: Robust metric visual odometry. arXiv preprint arXiv:2412.11530 (2024)

work page arXiv 2024

[7] [7]

VGGT-Long: Chunk it, Loop it, Align it -- Pushing VGGT's Limits on Kilometer-scale Long RGB Sequences

Deng, K., Ti, Z., Xu, J., Yang, J., Xie, J.: Vggt-long: Chunk it, loop it, align it–pushing vggt’s limits on kilometer-scale long rgb sequences. arXiv preprint arXiv:2507.16443 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers

Deng, K., Zhang, Y., Yang, J., Xie, J.: Gigaslam: Large-scale monocular slam with hierarchical gaussian splats. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–10 (2025)

2025

[9] [9]

arXiv preprint arXiv:2505.18992 (2025)

Deng, T., Wu, W., He, J., Pan, Y., Jiang, X., Yuan, S., Wang, D., Wang, H., Chen, W.: Vpgs-slam: Voxel-based progressive 3d gaussian slam in large-scale scenes. arXiv preprint arXiv:2505.18992 (2025)

work page arXiv 2025

[10] [10]

In: European conference on computer vision

Engel, J., Schöps, T., Cremers, D.: Lsd-slam: Large-scale direct monocular slam. In: European conference on computer vision. pp. 834–849. Springer (2014)

2014

[11] [11]

IEEE Transactions on robotics28(5), 1188–1197 (2012)

Gálvez-López, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Transactions on robotics28(5), 1188–1197 (2012)

2012

[12] [12]

The international journal of robotics research32(11), 1231–1237 (2013)

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. The international journal of robotics research32(11), 1231–1237 (2013)

2013

[13] [13]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

Homeyer, C., Begiristain, L., Schnörr, C.: DROID-Splat: Combining end-to-end SLAM with 3D gaussian splatting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp. 2788–2798 (2025)

2025

[14] [14]

IEEE Robotics and Automation Letters9(11), 9765–9772 (2024)

Hong, S., He, J., Zheng, X., Zheng, C.: Liv-gaussmap: Lidar-inertial-visual fusion for real-time 3d radiance field map rendering. IEEE Robotics and Automation Letters9(11), 9765–9772 (2024)

2024

[15] [15]

In: European Conference on Computer Vision

Hu, J., Chen, X., Feng, B., Li, G., Yang, L., Bao, H., Zhang, G., Cui, Z.: Cg-slam: Efficient dense rgb-d slam in a consistent uncertainty-aware 3d gaussian field. In: European Conference on Computer Vision. pp. 93–112. Springer (2024)

2024

[16] [16]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings

Hu, Y., Cheng, C., Yu, S., Guo, X., Wang, H.: Vggt4d: Mining motion cues in visual geometry transformers for 4d scene reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings. pp. 414–424 (June 2026)

2026

[17] [17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Huang, H., Li, L., Cheng, H., Yeung, S.K.: Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21584–21593 (2024) KiloGS-SLAM 27

2024

[18] [18]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Keetha, N., Karhade, J., Jatavallabhula, K.M., Yang, G., Scherer, S., Ramanan, D., Luiten, J.: Splatam: Splat track & map 3d gaussians for dense rgb-d slam. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 21357–21366 (2024)

2024

[19] [19]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G., et al.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139:1–139:14 (2023)

2023

[20] [20]

International journal of computer vision81(2), 155–166 (2009)

Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurateO(n)solution to the PnP problem. International journal of computer vision81(2), 155–166 (2009)

2009

[21] [21]

In: European conference on computer vision

Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: European conference on computer vision. pp. 71–91. Springer (2024)

2024

[22] [22]

IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022)

Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence45(3), 3292–3310 (2022)

2022

[23] [23]

In: European Conference on Computer Vision

Lipson, L., Teed, Z., Deng, J.: Deep patch visual slam. In: European Conference on Computer Vision. pp. 424–440. Springer (2024)

2024

[24] [24]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Lu, T., Yu, M., Xu, L., Xiangli, Y., Wang, L., Lin, D., Dai, B.: Scaffold-gs: Struc- tured 3d gaussians for view-adaptive rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20654–20664 (2024)

2024

[25] [25]

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Maggio, D., Lim, H., Carlone, L.: Vggt-slam: Dense rgb slam optimized on the sl (4) manifold. arXiv preprint arXiv:2505.12549 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[26] [26]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Matsuki, H., Murai, R., Kelly, P.H., Davison, A.J.: Gaussian splatting slam. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 18039–18048 (2024)

2024

[27] [27]

ACM Transactions on Graphics (TOG)44(4), 1–14 (2025)

Meuleman, A., Shah, I., Lanvin, A., Kerbl, B., Drettakis, G.: On-the-fly reconstruc- tion for large-scale novel view synthesis from unposed images. ACM Transactions on Graphics (TOG)44(4), 1–14 (2025)

2025

[28] [28]

IEEE transactions on robotics33(5), 1255–1262 (2017)

Mur-Artal, R., Tardós, J.D.: Orb-slam2: An open-source slam system for monoc- ular, stereo, and rgb-d cameras. IEEE transactions on robotics33(5), 1255–1262 (2017)

2017

[29] [29]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Murai, R., Dexheimer, E., Davison, A.J.: Mast3r-slam: Real-time dense slam with 3d reconstruction priors. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16695–16705 (2025)

2025

[30] [30]

IEEE trans- actions on pattern analysis and machine intelligence26(6), 756–770 (2004)

Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE trans- actions on pattern analysis and machine intelligence26(6), 756–770 (2004)

2004

[31] [31]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

Piccinelli, L., Sakaridis, C., Yang, Y.H., Segu, M., Li, S., Abbeloos, W., Van Gool, L.: Unidepthv2: Universal monocular metric depth estimation made simpler. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

2025

[32] [32]

Springer (2006)

Rajamani, R.: Vehicle dynamics and control. Springer (2006)

2006

[33] [33]

In: International Conference on Learning Representations

Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. In: International Conference on Learning Representations. vol. 2025, pp. 28085–28128 (2025)

2025

[34] [34]

arXiv preprint arXiv:2511.04283 (2025)

Ren, S., Wen, T., Fang, Y., Lu, B.: Fastgs: Training 3d gaussian splatting in 100 seconds. arXiv preprint arXiv:2511.04283 (2025)

work page arXiv 2025

[35] [35]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Sandström,E.,Zhang,G.,Tateno,K.,Oechsle,M.,Niemeyer,M.,Zhang,Y.,Patel, M., Van Gool, L., Oswald, M., Tombari, F.: Splat-slam: Globally optimized rgb- only slam with 3d gaussians. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 1680–1691 (2025) 28 S. Yu et al

2025

[36] [36]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., Caine, B., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2446–2454 (2020)

2020

[37] [37]

Advances in neural information processing systems34, 16558–16569 (2021)

Teed, Z., Deng, J.: Droid-slam: Deep visual slam for monocular, stereo, and rgb- d cameras. Advances in neural information processing systems34, 16558–16569 (2021)

2021

[38] [38]

Advances in Neural Information Processing Systems36, 39033–39051 (2023)

Teed, Z., Lipson, L., Deng, J.: Deep patch visual odometry. Advances in Neural Information Processing Systems36, 39033–39051 (2023)

2023

[39] [39]

Advances in neural information processing systems33, 14254–14265 (2020)

Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: Learning local features with policy gra- dient. Advances in neural information processing systems33, 14254–14265 (2020)

2020

[40] [40]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5294–5306 (2025)

2025

[41] [41]

arXiv preprint arXiv:2602.04251 (2026)

Wang, L., Gong, R., Han, Y., Yang, L., Yang, L., Li, Y., Xu, B., Liu, H., Fu, R.: Towards next-generation slam: A survey on 3dgs-slam focusing on performance, robustness, and future directions. arXiv preprint arXiv:2602.04251 (2026)

work page arXiv 2026

[42] [42]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20697–20709 (2024)

2024

[43] [43]

Wu, C., Duan, Y., Zhang, X., Sheng, Y., Ji, J., Zhang, Y.: Mm-gaussian: 3d gaussian-based multi-modal fusion for localization and reconstruction in un- boundedscenes.In:2024IEEE/RSJInternationalConferenceonIntelligentRobots and Systems (IROS). pp. 12287–12293. IEEE (2024)

2024

[44] [44]

IEEE Transactions on Robotics (2025)

Wu, K., Zhang, Z., Tie, M., Ai, Z., Gan, Z., Ding, W.: Vings-mono: Visual-inertial gaussian splatting monocular slam in large scenes. IEEE Transactions on Robotics (2025)

2025

[45] [45]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yan, C., Qu, D., Xu, D., Zhao, B., Wang, Z., Wang, D., Li, X.: Gs-slam: Dense vi- sual slam with 3d gaussian splatting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19595–19604 (2024)

2024

[46] [46]

In: 2025 IEEE International Conference on Robotics and Automation (ICRA)

Yu, S., Cheng, C., Zhou, Y., Yang, X., Wang, H.: Rgb-only gaussian splatting slam for unbounded outdoor scenes. In: 2025 IEEE International Conference on Robotics and Automation (ICRA). pp. 11068–11074. IEEE (2025)

2025

[47] [47]

Gaussian-slam: Photo-realistic dense slam with gaussian splatting,

Yugay, V., Li, Y., Gevers, T., Oswald, M.R.: Gaussian-slam: Photo-realistic dense slam with gaussian splatting. arXiv preprint arXiv:2312.10070 (2023)

work page arXiv 2023

[48] [48]

Zhan, H., Weerasekera, C.S., Bian, J.W., Garg, R., Reid, I.: Df-vo: What should be learnt for visual odometry? arXiv preprint arXiv:2103.00933 (2021)

work page arXiv 2021

[49] [49]

Zhao, B., Yu, S., Yin, Z., Shen, D., Wang, H.: Mmgs:10×compressed 3dgs throughoptimaltransportaggregationbasedonmulti-viewranking.arXivpreprint arXiv:2605.19304 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[50] [50]

3D Skew Gaussian Splatting with Any Camera Trajectory Visualization Engine

Zhao, B., Zhou, Y., Song, G., Yin, Z., Wang, H.: 3d skew gaussian splatting with any camera trajectory visualization engine. arXiv preprint arXiv:2605.18334 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[51] [51]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Zhao, B., Zhou, Y., Yu, S., Wang, Z., Wang, H.: Wavelet-gs: 3d gaussian splat- ting with wavelet decomposition. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 8616–8625 (2025)

2025

[52] [52]

IEEE Robotics and Automation Letters9(11), 9486–9493 (2024)

Zhu, P., Zhuang, Y., Chen, B., Li, L., Wu, C., Liu, Z.: Mgs-slam: Monocular sparse tracking and gaussian mapping with depth smooth regularization. IEEE Robotics and Automation Letters9(11), 9486–9493 (2024)

2024