MMD-SLAM: Structure-Enhanced Multi-Meta Gaussian Distribution-Guided Visual SLAM

Chunmao Jiang; Fan Zhu; Hongxing Zhou; Hui Zhu; Peichen Liu; Sixun Liu; Yifan Zhao; Zhisong Xu; Ziyu Chen

arxiv: 2606.19874 · v1 · pith:HH2PZRWGnew · submitted 2026-06-18 · 💻 cs.RO · cs.CV

MMD-SLAM: Structure-Enhanced Multi-Meta Gaussian Distribution-Guided Visual SLAM

Fan Zhu , Ziyu Chen , Peichen Liu , Yifan Zhao , Zhisong Xu , Hui Zhu , Hongxing Zhou , Sixun Liu

show 1 more author

Chunmao Jiang

This is my paper

Pith reviewed 2026-06-26 17:16 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords visual SLAM3D Gaussian SplattingAtlanta World assumptionMulti-Meta Gaussianstructure-enhancedpose optimizationmapping qualitytracking accuracy

0 comments

The pith

MMD-SLAM incorporates Atlanta World structural priors into a Multi-Meta Gaussian representation to enhance visual SLAM tracking and mapping.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents MMD-SLAM, a structure-enhanced visual SLAM system based on 3D Gaussian Splatting. It leverages the Atlanta World assumption to create a Multi-Meta Gaussian representation that encodes dominant directions as structural priors. The approach includes point-line fusion for pose optimization and a Gaussian evolution strategy that incorporates scene geometry into optimization. These elements address gaps in existing methods that overlook structural information, resulting in better tracking accuracy and higher-fidelity scene reconstruction. Readers would care because it promises more reliable and photorealistic SLAM in indoor environments using only visual input.

Core claim

The authors claim that by guiding the Multi-Meta Gaussian distribution with the Atlanta World assumption through point-line fusion, dominant direction encoding, and Gaussian evolution, their system achieves state-of-the-art performance in both tracking accuracy and mapping quality on benchmarks like ScanNet and Replica.

What carries the argument

Multi-Meta Gaussian representation with dominant directions that encodes structural priors from the Atlanta World hypothesis for guiding photorealistic mapping and optimization.

Load-bearing premise

The Atlanta World assumption holds for the evaluated scenes and can be encoded into the Multi-Meta Gaussian representation to provide useful structural priors without introducing inconsistencies.

What would settle it

Running the system on indoor scenes that violate the Atlanta World assumption, such as those with curved surfaces or no dominant directions, and checking whether the accuracy and quality improvements over baseline Gaussian SLAM methods persist.

Figures

Figures reproduced from arXiv: 2606.19874 by Chunmao Jiang, Fan Zhu, Hongxing Zhou, Hui Zhu, Peichen Liu, Sixun Liu, Yifan Zhao, Zhisong Xu, Ziyu Chen.

**Figure 1.** Figure 1: Comparison of mapping effect. (a) illustrates that traditional Gaussian ellipsoids, which do not conform to the underlying structure, interfere with each other and generate blurred artifacts. (b) demonstrates that the Multi-Meta Gaussians used in our method better fit the scene structure after training, exhibiting clear edges. toward photorealistic map reconstruction, which is essential for embodied perce… view at source ↗

**Figure 2.** Figure 2: Overview. MMD-SLAM consists of two components: Tracking and Mapping. Tracking: First, extracting point and line features from the input RGB-D frame, the camera pose is determined, and a sparse map is constructed. Secondly, the tracking process is optimized by minimizing the reprojection error and backprojection error. Mapping: Using accurate point cloud information to initialize a Weak Gaussian, a set of s… view at source ↗

**Figure 4.** Figure 4: Split and Merge. We improved the original Split and added Merge to make the Density Control module more suitable for our system. where the terms ε and Σ correspond to the projection errors and covariance contributions in the global objective, respectively. B. Mapping of Structure-Enhanced Multi-Meta Gaussian 1) Multi-Meta Gaussian Distribution Guided by AW Assumption: For 3D Gaussian primitives G = δN µW … view at source ↗

**Figure 5.** Figure 5: Rendering results on the Replica dataset [26]. The red arrow highlight the differences between our approach and baselines. For the details of ceiling, floor and window, our method significantly outperforms baselines. where H is the number of line Gaussians, uh is the eigenvector of the dominant direction, and the reference direction provided by the line segment is ˆtk. Consequently, the total loss of this… view at source ↗

**Figure 7.** Figure 7: Ablation of MMGE & SSO. The red dashed box highlights the differences between the two methods. Our comprehensive approach has more obvious structure. rendering quality; and GS-ICP SLAM exhibits significant defects in reconstructed floors and ceilings. In contrast, we employed the Multi-Meta Gaussian model guided by the AW hypothesis, fully leveraging the structural information and achieving the best detail… view at source ↗

read the original abstract

3D Gaussian Splatting (3DGS) has significantly boosted novel view synthesis and high-fidelity scene reconstruction, expanding the potential of 3DGS-based Visual Simultaneous Localization and Mapping (SLAM) methods. However, most existing systems fail to fully exploit the underlying structural information, which limits rendering quality and often leads to inconsistent maps. To address these limitations, we propose MMD-SLAM, a structure-enhanced Visual SLAM framework that leverages the Atlanta World (AW) assumption to guide a Multi-Meta Gaussian representation for photorealistic mapping. First, we introduce a point-line fusion strategy for pose optimization, where 3D line segments are incorporated to improve tracking robustness and provide additional constraints for mapping. Second, we design a Multi-Meta Gaussian representation with dominant directions, explicitly encoding structural priors from the AW hypothesis. Finally, we propose a Gaussian evolution strategy that adapts to scene geometry and incorporates structural cues into global optimization. Extensive experiments demonstrate that these innovations enable MMD-SLAM to achieve state-of-the-art performance in both tracking accuracy and mapping quality. e.g., our method achieves a 48.56% reduction in ATE RMSE on ScanNet and a 5.71% improvement in PSNR on Replica, compared with MonoGS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MMD-SLAM adds Atlanta World priors via Multi-Meta Gaussians and point-line fusion to 3DGS SLAM, but the abstract leaves the assumption's fit to data and the source of the gains unverified.

read the letter

The main point is that this paper takes existing 3DGS SLAM systems like MonoGS and layers on the Atlanta World assumption through a Multi-Meta Gaussian representation that encodes three orthogonal dominant directions, plus a point-line fusion step for tracking and a Gaussian evolution strategy for mapping. The reported results are a 48.56% ATE RMSE drop on ScanNet and 5.71% PSNR gain on Replica.

The combination itself is the concrete addition. Using line segments to tighten pose estimates and then feeding structural directions into the Gaussian parameters is a direct way to bring man-made scene regularity into the splatting pipeline. The evolution strategy that adapts to geometry while respecting those cues is also a practical touch for keeping the map consistent.

The soft spot is the Atlanta World assumption itself. The abstract gives no numbers on how well the three orthogonal directions actually match the ScanNet or Replica scenes, and no discussion of what happens when they do not. If the test environments already contain strong orthogonal structure, the priors may simply reinforce what is already there rather than add new information. Without that check or an ablation that isolates the AW component, the performance numbers are hard to attribute.

The work is aimed at people already building 3DGS-based SLAM for indoor robotics or AR. A reader who wants ideas for injecting world assumptions into Gaussian maps could pick up the representation and optimization choices. It is not a foundational shift, but the specific engineering steps are clear enough to be worth examining.

I would send it to peer review. The topic is active, the claimed gains are large enough to test, and the full methods section should be able to address the assumption validation directly.

Referee Report

2 major / 1 minor

Summary. The paper proposes MMD-SLAM, a 3D Gaussian Splatting-based visual SLAM system that incorporates the Atlanta World (AW) assumption into a Multi-Meta Gaussian representation. It introduces a point-line fusion strategy for pose optimization, encodes structural priors via dominant directions in the Gaussian model, and uses a Gaussian evolution strategy for global optimization, claiming state-of-the-art results including a 48.56% reduction in ATE RMSE on ScanNet and 5.71% PSNR improvement on Replica relative to MonoGS.

Significance. If the reported gains are attributable to the AW-guided structural priors rather than ancillary components, the work could meaningfully advance consistency and accuracy in structured indoor SLAM by bridging geometric assumptions with neural rendering representations. The point-line fusion and adaptive evolution ideas are potentially reusable beyond this specific formulation.

major comments (2)

[Abstract] The central attribution of performance gains to the AW priors requires verification that ScanNet and Replica scenes conform to the three mutually orthogonal dominant directions; the manuscript provides no quantitative check (e.g., measured angular deviation or orthogonality error) on the test data. Without this, it remains possible that the priors introduce inconsistencies rather than constraints, undermining the claim that the Multi-Meta Gaussian representation supplies useful structure enhancement.
[Experiments] No ablation isolating the AW-encoded dominant directions from the point-line fusion strategy or the Gaussian evolution component is reported. Consequently the 48.56% ATE and 5.71% PSNR figures cannot be confidently ascribed to the structure-enhancement mechanism that constitutes the paper's primary contribution.

minor comments (1)

[Abstract] The abstract refers to 'extensive experiments' but supplies no dataset splits, sequence counts, or statistical significance measures for the reported metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the validation of our claims.

read point-by-point responses

Referee: [Abstract] The central attribution of performance gains to the AW priors requires verification that ScanNet and Replica scenes conform to the three mutually orthogonal dominant directions; the manuscript provides no quantitative check (e.g., measured angular deviation or orthogonality error) on the test data. Without this, it remains possible that the priors introduce inconsistencies rather than constraints, undermining the claim that the Multi-Meta Gaussian representation supplies useful structure enhancement.

Authors: We acknowledge that the manuscript does not include a quantitative verification of how well the ScanNet and Replica scenes conform to the Atlanta World assumption. Although the AW model is a standard prior for indoor man-made environments, we agree that reporting measured angular deviations or orthogonality errors would provide stronger support for attributing gains to the structural priors. In the revision we will add this analysis, computing and tabulating the average angular deviation from orthogonality for the dominant directions extracted across the test sequences. revision: yes
Referee: [Experiments] No ablation isolating the AW-encoded dominant directions from the point-line fusion strategy or the Gaussian evolution component is reported. Consequently the 48.56% ATE and 5.71% PSNR figures cannot be confidently ascribed to the structure-enhancement mechanism that constitutes the paper's primary contribution.

Authors: We agree that the current experiments do not isolate the AW-encoded dominant directions from the point-line fusion and Gaussian evolution components, making it difficult to attribute the reported gains specifically to the structure-enhancement mechanism. We will add dedicated ablation studies in the revised manuscript that disable the AW priors while retaining the other modules, thereby quantifying the incremental contribution of the Multi-Meta Gaussian structural encoding to tracking and rendering metrics. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical claims rest on independent experiments

full rationale

The provided abstract and description introduce a new Multi-Meta Gaussian representation guided by the Atlanta World assumption, a point-line fusion strategy, and a Gaussian evolution strategy. These are presented as novel design choices whose value is demonstrated via empirical comparisons (ATE RMSE on ScanNet, PSNR on Replica) against external baselines such as MonoGS. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the text. The AW assumption is invoked as an external structural prior rather than derived from the method itself. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger is populated from stated high-level assumptions; no free parameters or invented entities can be quantified.

axioms (1)

domain assumption Atlanta World assumption holds for the scenes and supplies usable structural priors
Invoked to guide the Multi-Meta Gaussian representation and global optimization.

invented entities (1)

Multi-Meta Gaussian representation with dominant directions no independent evidence
purpose: Explicitly encode structural priors from the Atlanta World hypothesis
New representation introduced to incorporate AW priors into 3DGS

pith-pipeline@v0.9.1-grok · 5785 in / 1349 out tokens · 44917 ms · 2026-06-26T17:16:28.162339+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MyGO-Splat: Multi-Objective Closed-Loop Geometric Feedback for RGB-Only Gaussian SLAM
cs.RO 2026-06 unverdicted novelty 6.0

MyGO-Splat is a closed-loop RGB-only Gaussian SLAM system that rasterizes depth and normals from the map to supervise pose optimization and align monocular depth priors for scale consistency.
PanoImager: Geometry-Guided Novel View Synthesis and Reconstruction from Sparse Panoramic Views
cs.CV 2026-06 unverdicted novelty 4.0

PanoImager is an SfM-free pipeline combining feed-forward priors, geometry-conditioned diffusion view completion, and depth-guided 3DGS optimization to reconstruct from sparse panoramic images.

Reference graph

Works this paper leans on

35 extracted references · 3 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

Edlines: A real-time line segment detector with a false detection control,

C. Akinlar and C. Topal, “Edlines: A real-time line segment detector with a false detection control,”Pattern Recognition Letters, vol. 32, no. 13, pp. 1633–1642, 2011

2011
[2]

Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam,

C. Campos, R. Elvira, J. J. G. Rodriguez, J. M. M. Montiel, and J. D. Tardos, “Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam,”IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021

2021
[3]

SAGE: Spatial-visual adaptive graph exploration for efficient visual place recognition,

S. Chen, C. Wang, R. Xu, Peixingtian, yukun Song, J. Lin, W. Xu, jingyizhang, L. Guo, and S. Xu, “SAGE: Spatial-visual adaptive graph exploration for efficient visual place recognition,” inThe Fourteenth International Conference on Learning Representations, 2026

2026
[4]

Orbeez-slam: A real-time monocular visual slam with orb features and nerf-realized mapping,

C.-M. Chung, Y .-C. Tseng, Y .-C. Hsu, X.-Q. Shi, Y .-H. Hua, J.-F. Yeh, W.-C. Chen, Y .-T. Chen, and W. H. Hsu, “Orbeez-slam: A real-time monocular visual slam with orb features and nerf-realized mapping,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, Conference Proceedings, pp. 9400–9406

2023
[5]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2017, Conference Proceedings, pp. 5828–5839

2017
[6]

Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the- fly surface reintegration,

A. Dai, M. Nießner, M. Zollh ¨ofer, S. Izadi, and C. Theobalt, “Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the- fly surface reintegration,”ACM Transactions on Graphics, vol. 36, no. 4, p. 1, 2017

2017
[7]

Plvs: A slam system with points, lines, volumetric mapping, and 3d incremental segmentation,

L. Freda, “Plvs: A slam system with points, lines, volumetric mapping, and 3d incremental segmentation,”arXiv preprint arXiv:2309.10896, 2023

work page arXiv 2023
[8]

Hs- slam: Hybrid representation with structural supervision for improved dense slam,

Z. Gong, F. Tosi, Y . Zhang, S. Mattoccia, and M. Poggi, “Hs- slam: Hybrid representation with structural supervision for improved dense slam,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8464–8470

2025
[9]

Rgbd gs-icp slam,

S. Ha, J. Yeon, and H. Yu, “Rgbd gs-icp slam,” inProceedings of the European Conference on Computer Vision (ECCV). Springer, 2024, pp. 180–197

2024
[10]

2d gaussian splat- ting for geometrically accurate radiance fields,

B. Huang, Z. Yu, A. Chen, A. Geiger, and S. Gao, “2d gaussian splat- ting for geometrically accurate radiance fields,” inACM SIGGRAPH, 2024, Conference Proceedings, pp. 1–11

2024
[11]

Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras,

H. Huang, L. Li, H. Cheng, and S.-K. Yeung, “Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), 2024, Conference Proceedings, pp. 21 584–21 593

2024
[12]

Di-fusion: Online implicit 3d reconstruction with deep priors,

J. Huang, S. S. Huang, H. Song, and S. M. Hu, “Di-fusion: Online implicit 3d reconstruction with deep priors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Conference Proceedings, pp. 8928–8937

2021
[13]

Splatam: Splat track & map 3d gaussians for dense rgb-d slam,

N. Keetha, J. Karhade, K. M. Jatavallabhula, G. Yang, S. Scherer, D. Ramanan, and J. Luiten, “Splatam: Splat track & map 3d gaussians for dense rgb-d slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, Conference Proceedings, pp. 21 357–21 366

2024
[14]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,”ACM Transactions on Graphics, vol. 42, no. 4, pp. 1–14, 2023

2023
[15]

Garad-slam: 3d gaussian splatting for real-time anti dynamic slam,

M. Li, W. Chen, N. Cheng, J. Xu, D. Li, and H. Wang, “Garad-slam: 3d gaussian splatting for real-time anti dynamic slam,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 047–11 053

2025
[16]

Slam- x: Generalizable dynamic removal for nerf and gaussian splatting slam,

M. Li, D. Li, S. Hu, K. Wang, Z. Zhao, and H. Wang, “Slam- x: Generalizable dynamic removal for nerf and gaussian splatting slam,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 1132–1140

2025
[17]

Sgs-slam: Semantic gaussian splatting for neural dense slam,

M. Li, S. Liu, H. Zhou, G. Zhu, N. Cheng, T. Deng, and H. Wang, “Sgs-slam: Semantic gaussian splatting for neural dense slam,” in Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2024, pp. 163–179

2024
[18]

Convex relaxation for robust vanishing point estimation in manhattan world,

B. Liao, Z. Zhao, H. Li, Y . Zhou, Y . Zeng, H. Li, and P. Liu, “Convex relaxation for robust vanishing point estimation in manhattan world,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 15 823–15 832

2025
[19]

Mg-slam: Structure gaussian splatting slam with manhattan world hy- pothesis,

S. Liu, T. Deng, H. Zhou, L. Li, H. Wang, D. Wang, and M. Li, “Mg-slam: Structure gaussian splatting slam with manhattan world hy- pothesis,”IEEE Transactions on Automation Science and Engineering, vol. 22, pp. 17 034–17 049, 2025

2025
[20]

Aligning cyber space with physical world: A comprehensive survey on embodied ai,

Y . Liu, W. Chen, Y . Bai, X. Liang, G. Li, W. Gao, and L. Lin, “Aligning cyber space with physical world: A comprehensive survey on embodied ai,”IEEE/ASME Transactions on Mechatronics, 2025

2025
[21]

Ngel-slam: Neural implicit representation-based global consistent low-latency slam system,

Y . Mao, X. Yu, Z. Zhang, K. Wang, Y . Wang, R. Xiong, and Y . Liao, “Ngel-slam: Neural implicit representation-based global consistent low-latency slam system,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 6952–6958

2024
[22]

Gaussian splatting slam,

H. Matsuki, R. Murai, P. H. Kelly, and A. J. Davison, “Gaussian splatting slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, Conference Proceedings, pp. 18 039–18 048

2024
[23]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- thi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2022

2022
[24]

Rtg-slam: Real-time 3d reconstruction at scale using gaussian splat- ting,

Z. Peng, T. Shao, Y . Liu, J. Zhou, Y . Yang, J. Wang, and K. Zhou, “Rtg-slam: Real-time 3d reconstruction at scale using gaussian splat- ting,” inACM SIGGRAPH, 2024, Conference Proceedings, pp. 1–11

2024
[25]

Atlanta world: An expectation max- imization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments,

G. Schindler and F. Dellaert, “Atlanta world: An expectation max- imization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2004, pp. I–I

2004
[26]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, Y . Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, and S. Verma, “The replica dataset: A digital replica of indoor spaces,”arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[27]

A benchmark for the evaluation of rgb-d slam systems,

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012, Conference Proceedings, pp. 573–580

2012
[28]

Imap: Implicit map- ping and positioning in real-time,

E. Sucar, S. K. Liu, J. Ortiz, and A. J. Davison, “Imap: Implicit map- ping and positioning in real-time,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, Confer- ence Proceedings, pp. 6209–6218

2021
[29]

Focus on local: Finding reliable discriminative regions for visual place recognition,

C. Wang, S. Chen, Y . Song, R. Xu, Z. Zhang, J. Zhang, H. Yang, Y . Zhang, K. Fu, S. Du,et al., “Focus on local: Finding reliable discriminative regions for visual place recognition,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, 2025, pp. 7536–7544

2025
[30]

Elasticfusion: Dense slam without a pose graph

T. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker, and A. J. Davison, “Elasticfusion: Dense slam without a pose graph.” in Robotics: Science and Systems (RSS), vol. 11, no. 3. Rome, 2015

2015
[31]

An efficient and robust line segment matching approach based on lbd descriptor and pairwise geometric consistency,

L. Zhang and R. Koch, “An efficient and robust line segment matching approach based on lbd descriptor and pairwise geometric consistency,” Journal of visual communication and image representation, vol. 24, no. 7, pp. 794–805, 2013

2013
[32]

Balf: Simple and efficient blur aware local feature detector,

Z. Zhao, “Balf: Simple and efficient blur aware local feature detector,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 3362–3372

2024
[33]

Advances in global solvers for 3d vision,

Z. Zhao, H. Yang, B. Liao, Y . Zeng, S. Yan, Y . Gu, P. Liu, Y . Zhou, H. Li, and J. Civera, “Advances in global solvers for 3d vision,”arXiv preprint arXiv:2602.14662, 2026

work page arXiv 2026
[34]

Fgo-slam: Enhancing gaussian slam with globally consistent opacity radiance field,

F. Zhu, Y . Zhao, Z. Chen, B. Yu, and H. Zhu, “Fgo-slam: Enhancing gaussian slam with globally consistent opacity radiance field,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 075–11 081

2025
[35]

Nice-slam: Neural implicit scalable encoding for slam,

Z. Zhu, S. Peng, V . Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, Conference Proceedings, pp. 12 776–12 786

2022

[1] [1]

Edlines: A real-time line segment detector with a false detection control,

C. Akinlar and C. Topal, “Edlines: A real-time line segment detector with a false detection control,”Pattern Recognition Letters, vol. 32, no. 13, pp. 1633–1642, 2011

2011

[2] [2]

Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam,

C. Campos, R. Elvira, J. J. G. Rodriguez, J. M. M. Montiel, and J. D. Tardos, “Orb-slam3: An accurate open-source library for visual, visual-inertial, and multimap slam,”IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021

2021

[3] [3]

SAGE: Spatial-visual adaptive graph exploration for efficient visual place recognition,

S. Chen, C. Wang, R. Xu, Peixingtian, yukun Song, J. Lin, W. Xu, jingyizhang, L. Guo, and S. Xu, “SAGE: Spatial-visual adaptive graph exploration for efficient visual place recognition,” inThe Fourteenth International Conference on Learning Representations, 2026

2026

[4] [4]

Orbeez-slam: A real-time monocular visual slam with orb features and nerf-realized mapping,

C.-M. Chung, Y .-C. Tseng, Y .-C. Hsu, X.-Q. Shi, Y .-H. Hua, J.-F. Yeh, W.-C. Chen, Y .-T. Chen, and W. H. Hsu, “Orbeez-slam: A real-time monocular visual slam with orb features and nerf-realized mapping,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, Conference Proceedings, pp. 9400–9406

2023

[5] [5]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2017, Conference Proceedings, pp. 5828–5839

2017

[6] [6]

Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the- fly surface reintegration,

A. Dai, M. Nießner, M. Zollh ¨ofer, S. Izadi, and C. Theobalt, “Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the- fly surface reintegration,”ACM Transactions on Graphics, vol. 36, no. 4, p. 1, 2017

2017

[7] [7]

Plvs: A slam system with points, lines, volumetric mapping, and 3d incremental segmentation,

L. Freda, “Plvs: A slam system with points, lines, volumetric mapping, and 3d incremental segmentation,”arXiv preprint arXiv:2309.10896, 2023

work page arXiv 2023

[8] [8]

Hs- slam: Hybrid representation with structural supervision for improved dense slam,

Z. Gong, F. Tosi, Y . Zhang, S. Mattoccia, and M. Poggi, “Hs- slam: Hybrid representation with structural supervision for improved dense slam,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8464–8470

2025

[9] [9]

Rgbd gs-icp slam,

S. Ha, J. Yeon, and H. Yu, “Rgbd gs-icp slam,” inProceedings of the European Conference on Computer Vision (ECCV). Springer, 2024, pp. 180–197

2024

[10] [10]

2d gaussian splat- ting for geometrically accurate radiance fields,

B. Huang, Z. Yu, A. Chen, A. Geiger, and S. Gao, “2d gaussian splat- ting for geometrically accurate radiance fields,” inACM SIGGRAPH, 2024, Conference Proceedings, pp. 1–11

2024

[11] [11]

Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras,

H. Huang, L. Li, H. Cheng, and S.-K. Yeung, “Photo-slam: Real-time simultaneous localization and photorealistic mapping for monocular stereo and rgb-d cameras,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), 2024, Conference Proceedings, pp. 21 584–21 593

2024

[12] [12]

Di-fusion: Online implicit 3d reconstruction with deep priors,

J. Huang, S. S. Huang, H. Song, and S. M. Hu, “Di-fusion: Online implicit 3d reconstruction with deep priors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Conference Proceedings, pp. 8928–8937

2021

[13] [13]

Splatam: Splat track & map 3d gaussians for dense rgb-d slam,

N. Keetha, J. Karhade, K. M. Jatavallabhula, G. Yang, S. Scherer, D. Ramanan, and J. Luiten, “Splatam: Splat track & map 3d gaussians for dense rgb-d slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, Conference Proceedings, pp. 21 357–21 366

2024

[14] [14]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,”ACM Transactions on Graphics, vol. 42, no. 4, pp. 1–14, 2023

2023

[15] [15]

Garad-slam: 3d gaussian splatting for real-time anti dynamic slam,

M. Li, W. Chen, N. Cheng, J. Xu, D. Li, and H. Wang, “Garad-slam: 3d gaussian splatting for real-time anti dynamic slam,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 047–11 053

2025

[16] [16]

Slam- x: Generalizable dynamic removal for nerf and gaussian splatting slam,

M. Li, D. Li, S. Hu, K. Wang, Z. Zhao, and H. Wang, “Slam- x: Generalizable dynamic removal for nerf and gaussian splatting slam,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 1132–1140

2025

[17] [17]

Sgs-slam: Semantic gaussian splatting for neural dense slam,

M. Li, S. Liu, H. Zhou, G. Zhu, N. Cheng, T. Deng, and H. Wang, “Sgs-slam: Semantic gaussian splatting for neural dense slam,” in Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2024, pp. 163–179

2024

[18] [18]

Convex relaxation for robust vanishing point estimation in manhattan world,

B. Liao, Z. Zhao, H. Li, Y . Zhou, Y . Zeng, H. Li, and P. Liu, “Convex relaxation for robust vanishing point estimation in manhattan world,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 15 823–15 832

2025

[19] [19]

Mg-slam: Structure gaussian splatting slam with manhattan world hy- pothesis,

S. Liu, T. Deng, H. Zhou, L. Li, H. Wang, D. Wang, and M. Li, “Mg-slam: Structure gaussian splatting slam with manhattan world hy- pothesis,”IEEE Transactions on Automation Science and Engineering, vol. 22, pp. 17 034–17 049, 2025

2025

[20] [20]

Aligning cyber space with physical world: A comprehensive survey on embodied ai,

Y . Liu, W. Chen, Y . Bai, X. Liang, G. Li, W. Gao, and L. Lin, “Aligning cyber space with physical world: A comprehensive survey on embodied ai,”IEEE/ASME Transactions on Mechatronics, 2025

2025

[21] [21]

Ngel-slam: Neural implicit representation-based global consistent low-latency slam system,

Y . Mao, X. Yu, Z. Zhang, K. Wang, Y . Wang, R. Xiong, and Y . Liao, “Ngel-slam: Neural implicit representation-based global consistent low-latency slam system,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 6952–6958

2024

[22] [22]

Gaussian splatting slam,

H. Matsuki, R. Murai, P. H. Kelly, and A. J. Davison, “Gaussian splatting slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, Conference Proceedings, pp. 18 039–18 048

2024

[23] [23]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- thi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2022

2022

[24] [24]

Rtg-slam: Real-time 3d reconstruction at scale using gaussian splat- ting,

Z. Peng, T. Shao, Y . Liu, J. Zhou, Y . Yang, J. Wang, and K. Zhou, “Rtg-slam: Real-time 3d reconstruction at scale using gaussian splat- ting,” inACM SIGGRAPH, 2024, Conference Proceedings, pp. 1–11

2024

[25] [25]

Atlanta world: An expectation max- imization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments,

G. Schindler and F. Dellaert, “Atlanta world: An expectation max- imization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2004, pp. I–I

2004

[26] [26]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, Y . Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, and S. Verma, “The replica dataset: A digital replica of indoor spaces,”arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[27] [27]

A benchmark for the evaluation of rgb-d slam systems,

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012, Conference Proceedings, pp. 573–580

2012

[28] [28]

Imap: Implicit map- ping and positioning in real-time,

E. Sucar, S. K. Liu, J. Ortiz, and A. J. Davison, “Imap: Implicit map- ping and positioning in real-time,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, Confer- ence Proceedings, pp. 6209–6218

2021

[29] [29]

Focus on local: Finding reliable discriminative regions for visual place recognition,

C. Wang, S. Chen, Y . Song, R. Xu, Z. Zhang, J. Zhang, H. Yang, Y . Zhang, K. Fu, S. Du,et al., “Focus on local: Finding reliable discriminative regions for visual place recognition,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, 2025, pp. 7536–7544

2025

[30] [30]

Elasticfusion: Dense slam without a pose graph

T. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker, and A. J. Davison, “Elasticfusion: Dense slam without a pose graph.” in Robotics: Science and Systems (RSS), vol. 11, no. 3. Rome, 2015

2015

[31] [31]

An efficient and robust line segment matching approach based on lbd descriptor and pairwise geometric consistency,

L. Zhang and R. Koch, “An efficient and robust line segment matching approach based on lbd descriptor and pairwise geometric consistency,” Journal of visual communication and image representation, vol. 24, no. 7, pp. 794–805, 2013

2013

[32] [32]

Balf: Simple and efficient blur aware local feature detector,

Z. Zhao, “Balf: Simple and efficient blur aware local feature detector,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 3362–3372

2024

[33] [33]

Advances in global solvers for 3d vision,

Z. Zhao, H. Yang, B. Liao, Y . Zeng, S. Yan, Y . Gu, P. Liu, Y . Zhou, H. Li, and J. Civera, “Advances in global solvers for 3d vision,”arXiv preprint arXiv:2602.14662, 2026

work page arXiv 2026

[34] [34]

Fgo-slam: Enhancing gaussian slam with globally consistent opacity radiance field,

F. Zhu, Y . Zhao, Z. Chen, B. Yu, and H. Zhu, “Fgo-slam: Enhancing gaussian slam with globally consistent opacity radiance field,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 11 075–11 081

2025

[35] [35]

Nice-slam: Neural implicit scalable encoding for slam,

Z. Zhu, S. Peng, V . Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, Conference Proceedings, pp. 12 776–12 786

2022