VCS-SLAM: Geometry-Validated Semantic Evidence Fusion for 3D Gaussian SLAM

Raman Jha; Shuaihang Yuan; Yi Fang

arxiv: 2606.29494 · v1 · pith:N5KUPVAQnew · submitted 2026-06-28 · 💻 cs.CV

VCS-SLAM: Geometry-Validated Semantic Evidence Fusion for 3D Gaussian SLAM

Raman Jha , Shuaihang Yuan , Yi Fang This is my paper

Pith reviewed 2026-06-30 07:11 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic SLAM3D Gaussiansemantic fusiongeometric validationvisibility consistencyboundary evidenceray uncertaintyRGB-D mapping

0 comments

The pith

VCS-SLAM weights semantic observations in 3D Gaussian SLAM by their geometric reliability to suppress artifacts from occlusions and ambiguities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard semantic 3D Gaussian SLAM applies uniform optimization weights to every 2D semantic label when building the persistent 3D map. This lets errors from occlusions, unsupported boundaries, and ambiguous ray geometry create lasting artifacts in the global map. VCS-SLAM instead scores each observation using visibility consistency, surface-supported boundary evidence, and ray-level conflict uncertainty. These scores drive a reliability-aware objective that down-weights unreliable inputs during fusion. Experiments on Replica and ScanNet show gains in semantic consistency, boundary preservation, and overall reconstruction while tracking stays competitive under real RGB-D inputs.

Core claim

VCS-SLAM evaluates their geometric reliability through visibility consistency, surface-supported boundary evidence, and ray-level conflict uncertainty. The resulting reliability-aware objective suppresses occluded semantic updates, reduces unsupported semantic bleeding, and delays premature label assignment in ambiguous regions. Experiments on Replica demonstrate improved semantic consistency, boundary preservation, and reconstruction quality. Results on ScanNet further show that VCS-SLAM maintains competitive tracking performance under real RGB-D inputs.

What carries the argument

Reliability-aware objective that modulates semantic supervision weights using visibility consistency, surface-supported boundary evidence, and ray-level conflict uncertainty.

If this is right

Occluded semantic updates are suppressed to prevent persistent artifacts in the global Gaussian map.
Unsupported semantic bleeding is reduced to preserve accurate object boundaries.
Premature label assignment is delayed in ambiguous regions to improve label stability.
Tracking performance remains competitive on real RGB-D sequences from ScanNet.
Semantic consistency and reconstruction quality improve on Replica scenes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three-metric validation could be transferred to other map representations such as neural fields or surfels for semantic fusion.
Ray-level conflict uncertainty might also serve as a cue for detecting dynamic objects without extra motion modeling.
More reliable semantic maps could lower the need for separate post-processing steps in downstream tasks like robot grasping or room navigation.

Load-bearing premise

The three geometric reliability metrics correctly identify which semantic observations are trustworthy and weighting them produces net improvement without new failure modes.

What would settle it

A controlled comparison on Replica or ScanNet where semantic accuracy or boundary F-score decreases when the reliability weighting is enabled versus disabled.

Figures

Figures reproduced from arXiv: 2606.29494 by Raman Jha, Shuaihang Yuan, Yi Fang.

**Figure 1.** Figure 1: Overview of VCS-SLAM. Top: High-fidelity global semantic reconstruction. Bottom: Effectiveness of our modules against 3DGS baselines. (1) Overall: Enhanced global consistency. (2) VCSU: Mitigates occlusion artifacts via a depth-gated mask. (3) SCEA: Reduces unsupported semantic bleeding (4) CAUW: Down-weights ambiguous observations via conflictaware uncertainty mapping. Accumulating spatial discrepancies … view at source ↗

**Figure 2.** Figure 2: System architecture of VCS-SLAM. Left (Input): Processes RGB images, depth maps, and Per-frame Semantic Label. The VCSU block evaluates spatial and depth features to generate a depth-gated visibility mask, filtering occluded or inconsistent observations. Middle (Optimization): 3D Semantic Gaussians and camera tracking are iteratively refined via joint multi-channel optimization. This integrates the SCEA co… view at source ↗

**Figure 3.** Figure 3: Qualitative RGB reconstruction comparison on three Replica scenes [25] (Room-0, Room-1, Office-0). We compare renderings from NICE-SLAM [6], SplaTAM [8], SGS-SLAM [9], Hier-SLAM [21], and VCS-SLAM (Ours) against ground truth. Colored boxes mark thin structures and high-frequency regions where methods differ most. All panels, including baselines, are produced by our own runs under a common evaluation protoc… view at source ↗

**Figure 4.** Figure 4: Qualitative ablation study. Image (a) shows the input RGB view; (b) our base 3DGS-SLAM model without VCSU, SCEA, and CAUW; (c) the full VCS-SLAM. E. Mechanism Analysis Beyond global metrics, we conducted targeted evaluations to validate specific operational claims of our core modules. Tab. VII summarizes these quantitative results. SGS-SLAM [9] is re-evaluated under our implementation using the same target… view at source ↗

read the original abstract

Visual SLAM performance often deteriorates in complex real-world applications. Semantic 3D Gaussian SLAM commonly fuses 2D semantic priors into a persistent 3D map using uniform optimization weights. However, such priors are not equally reliable in online mapping: occlusions, unsupported semantic boundaries, and ambiguous ray geometry can introduce persistent semantic artifacts into the global Gaussian map. We propose VCS-SLAM, a geometry-validated semantic evidence fusion framework for RGB-D 3D Gaussian SLAM. Instead of treating all semantic observations as uniformly valid supervision, VCS-SLAM evaluates their geometric reliability through visibility consistency, surface-supported boundary evidence, and ray-level conflict uncertainty. The resulting reliability-aware objective suppresses occluded semantic updates, reduces unsupported semantic bleeding, and delays premature label assignment in ambiguous regions. Experiments on Replica demonstrate improved semantic consistency, boundary preservation, and reconstruction quality. Results on ScanNet further show that VCS-SLAM maintains competitive tracking performance under real RGB-D inputs

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VCS-SLAM adds three geometric reliability checks to down-weight bad semantic observations in 3D Gaussian SLAM, but the abstract supplies no numbers or ablations to show the checks actually improve results.

read the letter

The paper's main move is to stop treating every 2D semantic label as equally good supervision when building a persistent 3D Gaussian map. It scores each observation with visibility consistency, surface-supported boundary evidence, and ray-level conflict uncertainty, then folds those scores into a reliability-aware objective that skips occluded updates, limits boundary bleeding, and holds off on labels in uncertain rays.

This targets a genuine pain point: semantic priors from 2D networks are noisy in real RGB-D streams, and uniform fusion lets the errors stick around. The three checks line up with common failure modes, so the idea is practical rather than decorative.

The soft spot is the evidence. The abstract claims better semantic consistency and reconstruction on Replica plus competitive tracking on ScanNet, yet it contains no tables, no baseline comparisons, no per-metric ablations, and no correlation between the reliability scores and actual semantic error against ground truth. Without those, we cannot tell whether the filters reduce artifacts more than they introduce new ones or simply add overhead.

The work is aimed at people already running Gaussian SLAM who want to add semantic filtering without retraining the whole pipeline. A reader in that niche could extract the three validation rules and test them, but the paper is not ready for citation until the quantitative results are visible.

I would send it to peer review. The problem is real, the proposed checks are specific, and a referee can check whether the full experiments close the gap the abstract leaves open.

Referee Report

2 major / 0 minor

Summary. The paper proposes VCS-SLAM, a geometry-validated semantic evidence fusion framework for RGB-D 3D Gaussian SLAM. It replaces uniform weighting of 2D semantic priors with a reliability-aware objective that evaluates each observation via three geometric metrics—visibility consistency, surface-supported boundary evidence, and ray-level conflict uncertainty—to suppress occluded updates, reduce semantic bleeding, and delay premature label assignment. Experiments are claimed to show improved semantic consistency, boundary preservation, and reconstruction quality on Replica, plus competitive tracking on ScanNet.

Significance. If the three reliability metrics are shown to correlate with semantic trustworthiness and the reliability-aware objective produces net gains without introducing new artifacts, the work would address a recognized weakness in semantic 3D Gaussian SLAM by making fusion geometry-aware rather than uniform. This could improve map consistency in real-world scenes with occlusions and ambiguous geometry.

major comments (2)

[Abstract] Abstract: the central claim that VCS-SLAM 'demonstrate[s] improved semantic consistency, boundary preservation, and reconstruction quality' on Replica (and competitive tracking on ScanNet) is unsupported by any quantitative metrics, baselines, ablation tables, or error analysis in the manuscript. Without these, the assertion that the three geometric reliability metrics produce net improvement cannot be evaluated.
[Abstract] The manuscript provides no validation (e.g., per-ray correlation with ground-truth semantic error, or ablation removing each metric) that visibility consistency, surface-supported boundary evidence, and ray-level conflict uncertainty actually identify trustworthy observations. This leaves the weakest assumption—that applying the reliability-aware objective yields improvement without new failure modes—unexamined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying these gaps in how the abstract presents our experimental claims. We agree that the current manuscript version requires revision to provide explicit quantitative support and validation for the reliability metrics. We address each point below and will incorporate the necessary changes.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that VCS-SLAM 'demonstrate[s] improved semantic consistency, boundary preservation, and reconstruction quality' on Replica (and competitive tracking on ScanNet) is unsupported by any quantitative metrics, baselines, ablation tables, or error analysis in the manuscript. Without these, the assertion that the three geometric reliability metrics produce net improvement cannot be evaluated.

Authors: The referee is correct that the abstract states an improvement claim without accompanying numerical evidence, baselines, or ablation results. The manuscript text provided contains only the high-level description of the experiments. We will revise the abstract to either qualify the claim or reference specific quantitative outcomes (e.g., mIoU gains, boundary F-score improvements) from the experiments section, and we will ensure the full paper includes the supporting tables and error analysis. This change will be made in the next version. revision: yes
Referee: [Abstract] The manuscript provides no validation (e.g., per-ray correlation with ground-truth semantic error, or ablation removing each metric) that visibility consistency, surface-supported boundary evidence, and ray-level conflict uncertainty actually identify trustworthy observations. This leaves the weakest assumption—that applying the reliability-aware objective yields improvement without new failure modes—unexamined.

Authors: We agree that the manuscript does not currently include direct validation such as per-ray correlation against ground-truth semantic error or component-wise ablations that isolate each metric's contribution. This leaves the effectiveness of the three geometric criteria insufficiently demonstrated. In revision we will add these analyses (correlation plots and metric-ablated results) to confirm that the reliability-aware objective improves trustworthiness without introducing new artifacts. The revision will address this directly. revision: yes

Circularity Check

0 steps flagged

No circularity: method proposal with external validation

full rationale

The paper introduces VCS-SLAM as a new framework that computes three geometric reliability metrics (visibility consistency, surface-supported boundary evidence, ray-level conflict uncertainty) and applies them in a reliability-aware objective. No equations, derivations, or predictions are presented that reduce to fitted parameters or self-referential definitions. The central claims rest on the proposed metrics and objective, with reported improvements shown via experiments on Replica and ScanNet rather than by construction from inputs. No self-citations or uniqueness theorems are invoked in the provided text. This is a standard algorithmic contribution without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5696 in / 1027 out tokens · 40579 ms · 2026-06-30T07:11:18.241837+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Kinectfusion: Real-time dense surface mapping and tracking,

R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in2011 10th IEEE International Symposium on Mixed and Augmented Reality, 2011, pp. 127–136

2011
[2]

Rt-x net: Rgb-thermal cross attention network for low-light image enhancement,

R. Jha, A. Lenka, M. Ramanagopal, A. Sankaranarayanan, and K. Mitra, “Rt-x net: Rgb-thermal cross attention network for low-light image enhancement,” in2025 IEEE International Conference on Image Processing (ICIP), 2025, pp. 1492–1497

2025
[3]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

2021
[4]

3d gaussian splatting for real-time radiance field rendering

B. Kerbl, G. Kopanas, T. Leimk¨ uhler, G. Drettakiset al., “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

2023
[5]

imap: Implicit mapping and positioning in real-time,

E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and positioning in real-time,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6229–6238

2021
[6]

Nice-slam: Neural implicit scalable encoding for slam,

Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 786–12 796

2022
[7]

Gaussian splatting slam,

H. Matsuki, R. Murai, P. H. Kelly, and A. J. Davison, “Gaussian splatting slam,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 18 039–18 048

2024
[8]

Splatam: Splat track & map 3d gaussians for dense rgb-d slam,

N. Keetha, J. Karhade, K. M. Jatavallabhula, G. Yang, S. Scherer, D. Ramanan, and J. Luiten, “Splatam: Splat track & map 3d gaussians for dense rgb-d slam,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 21 357–21 366

2024
[9]

Sgs- slam: Semantic gaussian splatting for neural dense slam,

M. Li, S. Liu, H. Zhou, G. Zhu, N. Cheng, T. Deng, and H. Wang, “Sgs- slam: Semantic gaussian splatting for neural dense slam,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 163–179

2024
[10]

Semgauss-slam: Dense semantic gaussian splatting slam,

S. Zhu, R. Qin, G. Wang, J. Liu, and H. Wang, “Semgauss-slam: Dense semantic gaussian splatting slam,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 21 174–21 181

2025
[11]

Opengs-slam: Open-set dense semantic slam with 3d gaussian splatting for object- level scene understanding,

D. Yang, Y. Gao, X. Wang, Y. Yue, Y. Yang, and M. Fu, “Opengs-slam: Open-set dense semantic slam with 3d gaussian splatting for object- level scene understanding,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8486–8492

2025
[12]

Panopticfusion: Online volumetric semantic mapping at the level of stuff and things,

G. Narita, T. Seno, T. Ishikawa, and Y. Kaji, “Panopticfusion: Online volumetric semantic mapping at the level of stuff and things,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 4205–4212

2019
[13]

Dense 3d semantic mapping of indoor scenes from rgb-d images,

A. Hermans, G. Floros, and B. Leibe, “Dense 3d semantic mapping of indoor scenes from rgb-d images,” in2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 2631–2638

2014
[14]

Adaptive keyframe selection for scalable 3d scene reconstruction in dynamic environments,

R. Jha, Y. Zhou, and G. Loianno, “Adaptive keyframe selection for scalable 3d scene reconstruction in dynamic environments,”arXiv preprint arXiv:2510.23928, 2025

work page arXiv 2025
[15]

Sni-slam: Semantic neural implicit slam,

S. Zhu, G. Wang, H. Blum, J. Liu, L. Song, M. Pollefeys, and H. Wang, “Sni-slam: Semantic neural implicit slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 167–21 177

2024
[16]

Neural implicit dense semantic slam,

Y. Haghighi, S. Kumar, J.-P. Thiran, and L. Van Gool, “Neural implicit dense semantic slam,”arXiv preprint arXiv:2304.14560, 2023

work page arXiv 2023
[17]

Dns-slam: Dense neural semantic-informed slam,

K. Li, M. Niemeyer, N. Navab, and F. Tombari, “Dns-slam: Dense neural semantic-informed slam,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 7839– 7846

2024
[18]

How nerfs and 3d gaussian splatting are reshaping slam: a survey,

F. Tosi, Y. Zhang, Z. Gong, S. Mattoccia, M. R. Oswald, E. Sandstrom, and M. Poggi, “How nerfs and 3d gaussian splatting are reshaping slam: a survey,”IEEE Transactions on Robotics, 2026

2026
[19]

Gs- slam: Dense visual slam with 3d gaussian splatting,

C. Yan, D. Qu, D. Xu, B. Zhao, Z. Wang, D. Wang, and X. Li, “Gs- slam: Dense visual slam with 3d gaussian splatting,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 19 595–19 604

2024
[20]

Gaussian-slam: Photo-realistic dense slam with gaussian splatting,

V. Yugay, Y. Li, T. Gevers, and M. R. Oswald, “Gaussian-slam: Photo-realistic dense slam with gaussian splatting,”arXiv preprint arXiv:2312.10070, 2023

work page arXiv 2023
[21]

Hier-slam: Scaling-up semantics in slam with a hierarchically categorical gaussian splatting,

B. Li, Z. Cai, Y.-F. Li, I. Reid, and H. Rezatofighi, “Hier-slam: Scaling-up semantics in slam with a hierarchically categorical gaussian splatting,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 9748–9754

2025
[22]

Neds-slam: A neural explicit dense semantic slam framework using 3d gaussian splatting,

Y. Ji, Y. Liu, G. Xie, B. Ma, Z. Xie, and H. Liu, “Neds-slam: A neural explicit dense semantic slam framework using 3d gaussian splatting,” IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8778–8785, 2024

2024
[23]

Learning segmented 3d gaussians via efficient feature unprojection for zero-shot neural scene segmentation,

B. Dou, T. Zhang, Z. Wang, Y. Ma, Z. Yuan, and N. Zheng, “Learning segmented 3d gaussians via efficient feature unprojection for zero-shot neural scene segmentation,” inInternational Conference on Neural Information Processing. Springer, 2024, pp. 398–412

2024
[24]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660

2021
[25]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Vermaet al., “The replica dataset: A digital replica of indoor spaces,”arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[26]

Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,

A. Dai, M. Nießner, M. Zollh ¨ofer, S. Izadi, and C. Theobalt, “Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,”ACM Transactions on Graphics (ToG), vol. 36, no. 4, p. 1, 2017

2017
[27]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004

2004
[28]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595

2018
[29]

A benchmark for the evaluation of rgb-d slam systems,

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 573–580

2012
[30]

Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,

H. Wang, J. Wang, and L. Agapito, “Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 13 293–13 302

2023
[31]

Eslam: Efficient dense slam system based on hybrid representation of signed distance fields,

M. M. Johari, C. Carta, and F. Fleuret, “Eslam: Efficient dense slam system based on hybrid representation of signed distance fields,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 408–17 419

2023
[32]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5828–5839

2017
[33]

Vox- fusion: Dense tracking and mapping with voxel-based neural implicit representation,

X. Yang, H. Li, H. Zhai, Y. Ming, Y. Liu, and G. Zhang, “Vox- fusion: Dense tracking and mapping with voxel-based neural implicit representation,” in2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2022, pp. 499–507

2022
[34]

Point-slam: Dense neural point cloud-based slam,

E. Sandstr ¨om, Y. Li, L. Van Gool, and M. R. Oswald, “Point-slam: Dense neural point cloud-based slam,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 18 433–18 444

2023

[1] [1]

Kinectfusion: Real-time dense surface mapping and tracking,

R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in2011 10th IEEE International Symposium on Mixed and Augmented Reality, 2011, pp. 127–136

2011

[2] [2]

Rt-x net: Rgb-thermal cross attention network for low-light image enhancement,

R. Jha, A. Lenka, M. Ramanagopal, A. Sankaranarayanan, and K. Mitra, “Rt-x net: Rgb-thermal cross attention network for low-light image enhancement,” in2025 IEEE International Conference on Image Processing (ICIP), 2025, pp. 1492–1497

2025

[3] [3]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

2021

[4] [4]

3d gaussian splatting for real-time radiance field rendering

B. Kerbl, G. Kopanas, T. Leimk¨ uhler, G. Drettakiset al., “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

2023

[5] [5]

imap: Implicit mapping and positioning in real-time,

E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and positioning in real-time,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6229–6238

2021

[6] [6]

Nice-slam: Neural implicit scalable encoding for slam,

Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 786–12 796

2022

[7] [7]

Gaussian splatting slam,

H. Matsuki, R. Murai, P. H. Kelly, and A. J. Davison, “Gaussian splatting slam,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 18 039–18 048

2024

[8] [8]

Splatam: Splat track & map 3d gaussians for dense rgb-d slam,

N. Keetha, J. Karhade, K. M. Jatavallabhula, G. Yang, S. Scherer, D. Ramanan, and J. Luiten, “Splatam: Splat track & map 3d gaussians for dense rgb-d slam,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 21 357–21 366

2024

[9] [9]

Sgs- slam: Semantic gaussian splatting for neural dense slam,

M. Li, S. Liu, H. Zhou, G. Zhu, N. Cheng, T. Deng, and H. Wang, “Sgs- slam: Semantic gaussian splatting for neural dense slam,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 163–179

2024

[10] [10]

Semgauss-slam: Dense semantic gaussian splatting slam,

S. Zhu, R. Qin, G. Wang, J. Liu, and H. Wang, “Semgauss-slam: Dense semantic gaussian splatting slam,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 21 174–21 181

2025

[11] [11]

Opengs-slam: Open-set dense semantic slam with 3d gaussian splatting for object- level scene understanding,

D. Yang, Y. Gao, X. Wang, Y. Yue, Y. Yang, and M. Fu, “Opengs-slam: Open-set dense semantic slam with 3d gaussian splatting for object- level scene understanding,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8486–8492

2025

[12] [12]

Panopticfusion: Online volumetric semantic mapping at the level of stuff and things,

G. Narita, T. Seno, T. Ishikawa, and Y. Kaji, “Panopticfusion: Online volumetric semantic mapping at the level of stuff and things,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 4205–4212

2019

[13] [13]

Dense 3d semantic mapping of indoor scenes from rgb-d images,

A. Hermans, G. Floros, and B. Leibe, “Dense 3d semantic mapping of indoor scenes from rgb-d images,” in2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 2631–2638

2014

[14] [14]

Adaptive keyframe selection for scalable 3d scene reconstruction in dynamic environments,

R. Jha, Y. Zhou, and G. Loianno, “Adaptive keyframe selection for scalable 3d scene reconstruction in dynamic environments,”arXiv preprint arXiv:2510.23928, 2025

work page arXiv 2025

[15] [15]

Sni-slam: Semantic neural implicit slam,

S. Zhu, G. Wang, H. Blum, J. Liu, L. Song, M. Pollefeys, and H. Wang, “Sni-slam: Semantic neural implicit slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 167–21 177

2024

[16] [16]

Neural implicit dense semantic slam,

Y. Haghighi, S. Kumar, J.-P. Thiran, and L. Van Gool, “Neural implicit dense semantic slam,”arXiv preprint arXiv:2304.14560, 2023

work page arXiv 2023

[17] [17]

Dns-slam: Dense neural semantic-informed slam,

K. Li, M. Niemeyer, N. Navab, and F. Tombari, “Dns-slam: Dense neural semantic-informed slam,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 7839– 7846

2024

[18] [18]

How nerfs and 3d gaussian splatting are reshaping slam: a survey,

F. Tosi, Y. Zhang, Z. Gong, S. Mattoccia, M. R. Oswald, E. Sandstrom, and M. Poggi, “How nerfs and 3d gaussian splatting are reshaping slam: a survey,”IEEE Transactions on Robotics, 2026

2026

[19] [19]

Gs- slam: Dense visual slam with 3d gaussian splatting,

C. Yan, D. Qu, D. Xu, B. Zhao, Z. Wang, D. Wang, and X. Li, “Gs- slam: Dense visual slam with 3d gaussian splatting,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 19 595–19 604

2024

[20] [20]

Gaussian-slam: Photo-realistic dense slam with gaussian splatting,

V. Yugay, Y. Li, T. Gevers, and M. R. Oswald, “Gaussian-slam: Photo-realistic dense slam with gaussian splatting,”arXiv preprint arXiv:2312.10070, 2023

work page arXiv 2023

[21] [21]

Hier-slam: Scaling-up semantics in slam with a hierarchically categorical gaussian splatting,

B. Li, Z. Cai, Y.-F. Li, I. Reid, and H. Rezatofighi, “Hier-slam: Scaling-up semantics in slam with a hierarchically categorical gaussian splatting,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 9748–9754

2025

[22] [22]

Neds-slam: A neural explicit dense semantic slam framework using 3d gaussian splatting,

Y. Ji, Y. Liu, G. Xie, B. Ma, Z. Xie, and H. Liu, “Neds-slam: A neural explicit dense semantic slam framework using 3d gaussian splatting,” IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8778–8785, 2024

2024

[23] [23]

Learning segmented 3d gaussians via efficient feature unprojection for zero-shot neural scene segmentation,

B. Dou, T. Zhang, Z. Wang, Y. Ma, Z. Yuan, and N. Zheng, “Learning segmented 3d gaussians via efficient feature unprojection for zero-shot neural scene segmentation,” inInternational Conference on Neural Information Processing. Springer, 2024, pp. 398–412

2024

[24] [24]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660

2021

[25] [25]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Vermaet al., “The replica dataset: A digital replica of indoor spaces,”arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[26] [26]

Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,

A. Dai, M. Nießner, M. Zollh ¨ofer, S. Izadi, and C. Theobalt, “Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,”ACM Transactions on Graphics (ToG), vol. 36, no. 4, p. 1, 2017

2017

[27] [27]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004

2004

[28] [28]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595

2018

[29] [29]

A benchmark for the evaluation of rgb-d slam systems,

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 573–580

2012

[30] [30]

Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,

H. Wang, J. Wang, and L. Agapito, “Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 13 293–13 302

2023

[31] [31]

Eslam: Efficient dense slam system based on hybrid representation of signed distance fields,

M. M. Johari, C. Carta, and F. Fleuret, “Eslam: Efficient dense slam system based on hybrid representation of signed distance fields,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 408–17 419

2023

[32] [32]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5828–5839

2017

[33] [33]

Vox- fusion: Dense tracking and mapping with voxel-based neural implicit representation,

X. Yang, H. Li, H. Zhai, Y. Ming, Y. Liu, and G. Zhang, “Vox- fusion: Dense tracking and mapping with voxel-based neural implicit representation,” in2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2022, pp. 499–507

2022

[34] [34]

Point-slam: Dense neural point cloud-based slam,

E. Sandstr ¨om, Y. Li, L. Van Gool, and M. R. Oswald, “Point-slam: Dense neural point cloud-based slam,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 18 433–18 444

2023