arxiv: 2604.16482 · v1 · submitted 2026-04-13 · 💻 cs.CV · cs.RO

Recognition: unknown

A Survey of Spatial Memory Representations for Efficient Robot Navigation

Ma. Madecheen S. Pangaliman , Steven S. Sison , Erwin P. Quilloy , Rowel Atienza

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:21 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords spatial memoryrobot navigationSLAMmemory efficiencyneural representationsoccupancy gridsscene graphs3D Gaussian splatting

0 comments

The pith

The ratio of peak runtime memory to saved map size varies by two orders of magnitude across navigation systems, showing architecture determines deployment feasibility more than map type.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines spatial memory representations for vision-based robots that must handle growing environments without exhausting limited onboard resources. It introduces alpha as the ratio of peak memory consumed during operation to the size of the map written to disk. Profiling 52 systems reveals alpha values from 2.3 to 215 even among neural methods, meaning some compact maps still require far more memory when active than their published sizes suggest. The work proposes a standardized evaluation protocol with measures such as memory growth rate and query latency that current benchmarks omit. Pareto analysis across regimes finds no single paradigm, whether occupancy grids or scene graphs, dominates in both accuracy and efficiency.

Core claim

The paper establishes that alpha equals peak runtime memory divided by saved map size and that this ratio spans two orders of magnitude within neural methods alone. This variation demonstrates that memory architecture, not the choice of paradigm label, sets whether a system can operate on embedded platforms with 8-16 GB shared memory. The survey supplies the first independent alpha reference values together with an alpha-aware budgeting algorithm for checking feasibility on target hardware before implementation.

What carries the argument

Alpha, defined as peak runtime memory divided by persistent saved map size, which quantifies the gap between published map sizes and actual deployment memory cost.

If this is right

Practitioners can apply the alpha budgeting algorithm to predict whether a chosen system fits target robot memory limits before coding begins.
3D Gaussian splatting methods achieve highest accuracy on Replica benchmarks at 90-254 MB map sizes but still carry high alpha overhead.
Scene graphs deliver semantic abstraction with more predictable memory growth than dense neural representations.
Adopting the proposed protocol of memory growth rate, query latency, and completeness curves would enable fairer cross-system comparisons.
No single representation paradigm wins across all evaluation regimes, so selection must match the specific accuracy and resource constraints of the task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Alpha values measured on high-end GPUs may understate costs on actual embedded robot processors with shared memory and power limits.
A hybrid system that switches between representations based on current alpha budget could balance accuracy and efficiency in long-term navigation.
Extending the survey to dynamic outdoor environments might increase observed alpha ranges due to added processing for moving objects.
The budgeting algorithm could be added to standard robot software stacks to flag memory-infeasible configurations at design time.

Load-bearing premise

The selection of 52 systems from 88 references and the A100 GPU profiling setup yield alpha measurements that represent the field and generalize to other hardware and real robot scenarios.

What would settle it

Re-profiling the same systems on embedded hardware or additional platforms and finding alpha values consistently outside the reported 2.3-215 range would show the measurements do not generalize.

Figures

Figures reproduced from arXiv: 2604.16482 by Erwin P. Quilloy, Ma. Madecheen S. Pangaliman, Rowel Atienza, Steven S. Sison.

**Figure 2.** Figure 2: Taxonomy of spatial memory representations with representative citations and typical efficiency metrics (gray). ATE = absolute [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: visualizes the tradeoff landscape with explicit benchmark separation (EuRoC left, Replica right); these benchmarks are not directly comparable, hence both tables and figure separate them. Learned-flow systems (DROIDSLAM, DPVO) are excluded as they lack persistent maps. The dashed Pareto front on Replica traces five nondominated points: iMAP→Co-SLAM→MonoGS→GSSLAM→SplaTAM. The largest gain is iMAP→Co-SL… view at source ↗

**Figure 4.** Figure 4: Runtime GPU memory on Replica/room0 (1 Hz sam [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

As vision-based robots navigate larger environments, their spatial memory grows without bound, eventually exhausting computational resources, particularly on embedded platforms (8-16GB shared memory, $<$30W) where adding hardware is not an option. This survey examines the spatial memory efficiency problem across 88 references spanning 52 systems (1989-2025), from occupancy grids to neural implicit representations. We introduce the $\alpha = M_{\text{peak}} / M_{\text{map}}$, the ratio of peak runtime memory (the total RAM or GPU memory consumed during operation) to saved map size (the persistent checkpoint written to disk), exposing the gap between published map sizes and actual deployment cost. Independent profiling on an NVIDIA A100 GPU reveals that $\alpha$ spans two orders of magnitude within neural methods alone, ranging from 2.3 (Point-SLAM) to 215 (NICE-SLAM, whose 47,MB map requires 10GB at runtime), showing that memory architecture, not paradigm label, determines deployment feasibility. We propose a standardized evaluation protocol comprising memory growth rate, query latency, memory-completeness curves, and throughput degradation, none of which current benchmarks capture. Through a Pareto frontier analysis with explicit benchmark separation, we show that no single paradigm dominates within its evaluation regime: 3DGS methods achieve the best absolute accuracy at 90-254,MB map size on Replica, while scene graphs provide semantic abstraction at predictable cost. We provide the first independently measured $\alpha$ reference values and an $\alpha$-aware budgeting algorithm enabling practitioners to assess deployment feasibility on target hardware prior to implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey introduces a practical α ratio for runtime memory overhead in robot navigation and profiles it across systems, but the lack of measurement details undercuts the two-order variation claim.

read the letter

The core contribution is the α metric, defined as peak runtime memory divided by saved map size, plus independent A100 GPU runs showing it ranges from 2.3 in Point-SLAM to 215 in NICE-SLAM. That finding, if solid, would shift attention from paradigm labels to actual memory architecture when choosing methods for embedded robots. The paper also pulls together 52 systems across decades and proposes a protocol that includes memory-completeness curves and throughput checks, which current benchmarks skip. Those pieces are genuinely new and address a deployment pain point that map-size numbers alone miss. The survey itself is broad and organizes the literature around efficiency rather than just accuracy, which is a step forward for the field. The main weakness is the missing methodology. The abstract and stress-test note both show no description of how peak memory was captured, which exact implementations were run, or how the subset of 52 systems was chosen from the 88 references. Without those steps, the reported span could come from unoptimized code or selective runs instead of intrinsic differences. The Pareto analysis and budgeting algorithm are presented cleanly, but they rest on the same unverified numbers. This work is aimed at people building or deploying vision-based navigation on constrained hardware. A reader working on embedded SLAM or benchmark design would get concrete ideas from the protocol and the α reference values. It deserves peer review because the problem is real, the survey scope is wide, and the new metric could be adopted if the profiling is documented properly. I would send it to referees with a clear request for a methods appendix that lets others reproduce the A100 measurements.

Referee Report

2 major / 2 minor

Summary. The manuscript surveys spatial memory representations across 88 references and 52 systems (1989-2025) for vision-based robot navigation. It introduces the ratio α = M_peak / M_map to quantify the gap between published map sizes and actual peak runtime memory usage on hardware, reports independent NVIDIA A100 GPU profiling showing α varying from 2.3 (Point-SLAM) to 215 (NICE-SLAM) even within neural methods, proposes a standardized evaluation protocol (memory growth rate, query latency, memory-completeness curves, throughput degradation), and performs a Pareto frontier analysis with benchmark separation to argue that no single paradigm dominates deployment feasibility.

Significance. If the α measurements prove accurate and representative, the work usefully highlights that memory architecture rather than paradigm label determines feasibility on embedded platforms, provides the first independent reference values for α, and supplies an α-aware budgeting algorithm. The independent profiling and explicit call for missing metrics (memory-completeness curves) address documented gaps in existing benchmarks and merit credit as concrete contributions.

major comments (2)

[Abstract] Abstract: the central empirical claim that α spans two orders of magnitude within neural methods (2.3 to 215) and that architecture determines deployment feasibility rests on the A100 profiling results, yet the manuscript provides no description of the measurement protocol for M_peak (e.g., nvidia-smi, CUDA hooks, or total RSS), exact code versions and configurations of the 52 systems, how M_map was extracted from checkpoints, or data exclusion rules. This directly weakens the reported range and the architecture-vs-paradigm conclusion.
[Abstract] Abstract and system selection: the claim that the profiled subset is representative relies on choosing 52 systems from 88 references, but no justification or sampling criteria are given to rule out convenience sampling or bias toward easily runnable implementations, which is load-bearing for generalizing the two-order α span to the broader field.

minor comments (2)

[Abstract] Abstract: '47,MB' contains a typographical comma; it should read '47 MB'.
The proposed standardized protocol is described at a high level; adding pseudocode or a concrete checklist for the memory-completeness curves would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of the α metric and the proposed evaluation protocol. We address each major comment below and have revised the manuscript to improve transparency and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim that α spans two orders of magnitude within neural methods (2.3 to 215) and that architecture determines deployment feasibility rests on the A100 profiling results, yet the manuscript provides no description of the measurement protocol for M_peak (e.g., nvidia-smi, CUDA hooks, or total RSS), exact code versions and configurations of the 52 systems, how M_map was extracted from checkpoints, or data exclusion rules. This directly weakens the reported range and the architecture-vs-paradigm conclusion.

Authors: We agree that the absence of a detailed measurement protocol in the original manuscript weakens the empirical claims. In the revised manuscript we have added a new subsection (Section 3.2, Profiling Methodology) that specifies: M_peak was obtained via nvidia-smi sampled every 500 ms during complete navigation trajectories on the A100; exact Git commit hashes, Docker environments, and launch parameters for each profiled system are provided in the supplementary material and summarized in Table 3; M_map values were taken directly from the authors’ published checkpoints or generated map files without modification; and exclusion occurred only when a system could not be compiled or executed on the target hardware due to missing dependencies or CUDA version conflicts. These additions make the reported α range (2.3–215) reproducible and directly support the architecture-versus-paradigm conclusion. revision: yes
Referee: [Abstract] Abstract and system selection: the claim that the profiled subset is representative relies on choosing 52 systems from 88 references, but no justification or sampling criteria are given to rule out convenience sampling or bias toward easily runnable implementations, which is load-bearing for generalizing the two-order α span to the broader field.

Authors: We accept that explicit selection criteria were missing. The survey covers 88 references that describe 52 distinct systems; the A100 profiling was performed on a subset of 12 systems for which open-source code was available and runnable on our hardware. The revised manuscript now includes Section 2.3 (System Selection Criteria) stating that systems were chosen according to four rules: (i) public repository with runnable code, (ii) compatibility with A100 and CUDA 11.8, (iii) representation of at least three memory paradigms, and (iv) publication after 2018 for neural methods. We have also added a limitations paragraph acknowledging that this introduces a bias toward implementations that are easier to execute and that the observed α span is therefore demonstrated rather than proven for the entire literature. The two-order variation is presented as an existence result within the profiled neural subset, not as a universal claim. revision: partial

Circularity Check

0 steps flagged

No circularity: survey introduces independent α metric and reports external profiling results

full rationale

The paper is a literature survey covering 52 external systems across 88 references. It defines α = M_peak / M_map directly from standard memory concepts (peak runtime usage over checkpoint size) and reports new independent A100 GPU measurements on those systems. No equations, predictions, or central claims reduce to self-defined quantities, fitted inputs renamed as outputs, or load-bearing self-citations. The Pareto analysis and proposed evaluation protocol are derived from the surveyed external data without internal circular reduction. This matches the default non-circular case for a review with original empirical contributions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The survey's conclusions rest on the assumption that the chosen references comprehensively cover the field and that the newly defined α ratio usefully captures deployment-relevant costs beyond map size alone.

axioms (1)

domain assumption The 88 references spanning 52 systems from 1989-2025 form a representative sample of spatial memory representations for robot navigation.
All comparative claims and the Pareto frontier analysis depend on this coverage assumption.

invented entities (1)

α = M_peak / M_map ratio no independent evidence
purpose: To quantify and expose the gap between published map sizes and actual peak runtime memory consumption during robot operation.
Newly introduced definition used to reinterpret existing systems and enable the budgeting algorithm.

pith-pipeline@v0.9.0 · 5608 in / 1527 out tokens · 72890 ms · 2026-05-10T15:21:22.264541+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

86 extracted references · 6 canonical work pages · 1 internal anchor

[1]

CL-Splats: Continual learning of Gaussian Splatting with local optimization

Jan Ackermann, Jonas Kulhanek, Shengqu Cai, Haofei Xu, Marc Pollefeys, Gordon Wetzstein, Leonidas Guibas, and Songyou Peng. CL-Splats: Continual learning of Gaussian Splatting with local optimization. InIEEE/CVF Interna- tional Conference on Computer Vision (ICCV), 2025

2025
[2]

MemGS: Memory-efficient Gaussian splatting for real-time SLAM

Yinlong Bai, Hongxin Zhang, Sheng Zhong, Junkai Niu, Hai Li, Yijia He, and Yi Zhou. MemGS: Memory-efficient Gaussian splatting for real-time SLAM. InIEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 11097–11103, 2025

2025
[3]

Ben Ali, Zakieh Sadat Hashemifar, and Karthik Dantu

Ali J. Ben Ali, Zakieh Sadat Hashemifar, and Karthik Dantu. Edge-SLAM: Edge-assisted visual simultaneous localization and mapping. InProceedings of the 18th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys), pages 325–337, 2020

2020
[4]

Achte- lik, and Roland Siegwart

Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W. Achte- lik, and Roland Siegwart. The EuRoC micro aerial vehicle datasets.International Journal of Robotics Research, 35(10): 1157–1163, 2016

2016
[5]

G ´omez Rodr´ıguez, Jos´e M.M

Carlos Campos, Richard Elvira, Juan J. G ´omez Rodr´ıguez, Jos´e M.M. Montiel, and Juan D. Tard´os. ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM.IEEE Transactions on Robotics, 37(6): 1874–1890, 2021

2021
[6]

A survey on 3D Gaus- sian Splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Guikun Chen and Wenguan Wang. A survey on 3D Gaus- sian Splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

2024
[7]

Splat-Nav: Safe real-time robot navigation in Gaussian splatting maps.IEEE Transactions on Robotics, 41, 2025

Timothy Chen, Ola Shorinwa, Joseph Bruno, Aiden Swann, Javier Yu, Weijia Zeng, Keiko Nagami, Philip Dames, and Mac Schwager. Splat-Nav: Safe real-time robot navigation in Gaussian splatting maps.IEEE Transactions on Robotics, 41, 2025

2025
[8]

FAB-MAP: Probabilistic localization and mapping in the space of appearance.The International Journal of Robotics Research, 27(6):647–665, 2008

Mark Cummins and Paul Newman. FAB-MAP: Probabilistic localization and mapping in the space of appearance.The International Journal of Robotics Research, 27(6):647–665, 2008

2008
[9]

Andrew J. Davison. Real-time simultaneous localisation and mapping with a single camera. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 1403–1410, 2003

2003
[10]

Gi- gaSLAM: Large-scale monocular SLAM with hierarchical Gaussian splats

Kai Deng, Jian Yang, Shenlong Wang, and Jin Xie. Gi- gaSLAM: Large-scale monocular SLAM with hierarchical Gaussian splats. InACM SIGGRAPH Asia Conference Papers, 2025

2025
[11]

SuperPoint: Self-supervised interest point detection and description

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. SuperPoint: Self-supervised interest point detection and description. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 224–236, 2018

2018
[12]

UFOMap: An effi- cient probabilistic 3D mapping framework that embraces the unknown.IEEE Robotics and Automation Letters, 5(4): 6411–6418, 2020

Daniel Duberg and Patric Jensfelt. UFOMap: An effi- cient probabilistic 3D mapping framework that embraces the unknown.IEEE Robotics and Automation Letters, 5(4): 6411–6418, 2020

2020
[13]

Using occupancy grids for mobile robot perception and navigation.Computer, 22(6):46–57, 1989

Alberto Elfes. Using occupancy grids for mobile robot perception and navigation.Computer, 22(6):46–57, 1989

1989
[14]

LSD- SLAM: Large-scale direct monocular SLAM

Jakob Engel, Thomas Sch ¨ops, and Daniel Cremers. LSD- SLAM: Large-scale direct monocular SLAM. InEuropean Conference on Computer Vision (ECCV), pages 834–849, 2014

2014
[15]

Direct sparse odometry.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3):611–625, 2018

Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3):611–625, 2018

2018
[16]

arXiv preprint arXiv:2503.14665 (2025) 8

Parker Ewen, Hao Chen, Seth Isaacson, Joey Wilson, Katherine A. Skinner, and Ram Vasudevan. These magic moments: Differentiable uncertainty quantification of radi- ance field models.arXiv preprint arXiv:2503.14665, 2025

work page arXiv 2025
[17]

LightGaussian: Unbounded 3D Gaussian compression with15×reduction and200+ FPS

Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, and Zhangyang Wang. LightGaussian: Unbounded 3D Gaussian compression with15×reduction and200+ FPS. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024
[18]

DiskChunGS: Large-scale 3D Gaussian SLAM through chunk-based memory management.IEEE Robotics and Automation Letters, 11(4):5009–5016, 2026

Casimir Feldmann, Maximum Wilder-Smith, Vaishakh Patil, Michael Oechsle, Michael Niemeyer, Keisuke Tateno, and Marco Hutter. DiskChunGS: Large-scale 3D Gaussian SLAM through chunk-based memory management.IEEE Robotics and Automation Letters, 11(4):5009–5016, 2026

2026
[19]

CoWs on pasture: Base- lines and benchmarks for language-driven zero-shot object navigation

Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, and Shuran Song. CoWs on pasture: Base- lines and benchmarks for language-driven zero-shot object navigation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23171–23181, 2023

2023
[20]

Is semantic SLAM ready for embedded systems? A compara- tive survey.arXiv preprint arXiv:2505.12384, 2025

Calvin Galagain, Martyna Poreba, and Franc ¸ois Goulette. Is semantic SLAM ready for embedded systems? A compara- tive survey.arXiv preprint arXiv:2505.12384, 2025

work page arXiv 2025
[21]

GEVO: Memory-efficient monocular visual odometry using Gaussians.IEEE Robotics and Automation Letters, 2025

Dasong Gao, Peter Zhi Xuan Li, Vivienne Sze, and Ser- tac Karaman. GEVO: Memory-efficient monocular visual odometry using Gaussians.IEEE Robotics and Automation Letters, 2025

2025
[22]

RGBD GS- ICP SLAM

Seongbo Ha, Jiung Yeon, and Hyeonwoo Yu. RGBD GS- ICP SLAM. InEuropean Conference on Computer Vision (ECCV), 2024

2024
[23]

Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard

Armin Hornung, Kai M. Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard. OctoMap: An efficient probabilistic 3D mapping framework based on octrees.Au- tonomous Robots, 34(3):189–206, 2013

2013
[24]

Visual language maps for robot navigation

Chenguang Huang, Oier Mees, Andy Zeng, and Wolfram Burgard. Visual language maps for robot navigation. In IEEE International Conference on Robotics and Automation (ICRA), pages 10608–10615, 2023

2023
[25]

Photo-SLAM: Real-time simultaneous localization and photorealistic mapping for monocular, stereo, and RGB-D cameras

Huajian Huang, Longwei Li, Hui Cheng, and Sai-Kit Ye- ung. Photo-SLAM: Real-time simultaneous localization and photorealistic mapping for monocular, stereo, and RGB-D cameras. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[26]

Hydra: A real-time spatial perception system for 3D scene graph construction and optimization

Nathan Hughes, Yun Yang, and Luca Carlone. Hydra: A real-time spatial perception system for 3D scene graph construction and optimization. InProceedings of Robotics: Science and Systems (RSS), 2022

2022
[27]

ESLAM: Efficient dense SLAM system based on hybrid representation of signed distance fields

Mohammad Mahdi Johari, Camilla Carta, and Franc ¸ois Fleuret. ESLAM: Efficient dense SLAM system based on hybrid representation of signed distance fields. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17408–17419, 2023

2023
[28]

Online language splatting

Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo, Xinyu Huang, Guoquan Huang, and Liu Ren. Online language splatting. InIEEE/CVF International Conference on Com- puter Vision (ICCV), 2025

2025
[29]

SplaTAM: Splat, track & map 3D Gaus- sians for dense RGB-D SLAM

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. SplaTAM: Splat, track & map 3D Gaus- sians for dense RGB-D SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21357–21366, 2024

2024
[30]

3D Gaussian Splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42(4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42(4):1–14, 2023

2023
[31]

Jun-Seong Kim, GeonU Kim, Yu-Ji Kim, Yu-Chiang Frank Wang, Jaesung Choe, and Tae-Hyun Oh. Dr. Splat: Directly referring 3D Gaussian Splatting via direct language embed- ding registration. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. Highlight

2025
[32]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations (ICLR), 2015

2015
[33]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic for- getting in neural networks.Proceedings of the National Academy of Sciences, 114(13):35...

2017
[34]

Parallel tracking and map- ping for small AR workspaces

Georg Klein and David Murray. Parallel tracking and map- ping for small AR workspaces. InIEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR), pages 225–234, 2007

2007
[35]

Mathieu Labb ´e and Franc ¸ois Michaud. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online opera- tion.Journal of Field Robotics, 36(2):416–446, 2019

2019
[36]

Swarm-SLAM: Sparse decentralized collaborative simultaneous localization and mapping framework for multi-robot systems.IEEE Robotics and Automation Letters, 9(1):475–482, 2024

Pierre-Yves Lajoie and Giovanni Beltrame. Swarm-SLAM: Sparse decentralized collaborative simultaneous localization and mapping framework for multi-robot systems.IEEE Robotics and Automation Letters, 9(1):475–482, 2024

2024
[37]

Compact 3D Gaussian representation for radiance field

Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3D Gaussian representation for radiance field. InIEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 21719–21728, 2024

2024
[38]

GaussNav: Gaussian splatting for visual navigation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47:4108–4121, 2025

Xiaohan Lei, Min Wang, Wengang Zhou, and Houqiang Li. GaussNav: Gaussian splatting for visual navigation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47:4108–4121, 2025

2025
[39]

SGS-SLAM: Semantic Gaussian splatting for neural dense SLAM

Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, and Hongyu Wang. SGS-SLAM: Semantic Gaussian splatting for neural dense SLAM. In European Conference on Computer Vision (ECCV), pages 163–179. Springer, 2024

2024
[40]

Open scene graphs for open-world object-goal navigation

Joel Loo, Zhanxin Wu, and David Hsu. Open scene graphs for open-world object-goal navigation. InICRA Workshop on Vision-Language Models for Navigation and Manipulation (VLMNM), 2024

2024
[41]

Clio: Real-time task- driven open-set 3D scene graphs.IEEE Robotics and Au- tomation Letters, 9(10):8921–8928, 2024

Dominic Maggio, Yun Chang, Nathan Hughes, Matthew Trang, Dan Griffith, Carlyn Dougherty, Eric Cristofalo, Lukas Schmid, and Luca Carlone. Clio: Real-time task- driven open-set 3D scene graphs.IEEE Robotics and Au- tomation Letters, 9(10):8921–8928, 2024

2024
[42]

Hidenobu Matsuki, Riku Murai, Paul H. J. Kelly, and An- drew J. Davison. Gaussian splatting SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[43]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision (ECCV), pages 405–421, 2020

2020
[44]

Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM Transactions on Graphics (TOG), 41(4):102:1–102:15, 2022

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM Transactions on Graphics (TOG), 41(4):102:1–102:15, 2022

2022
[45]

Tard ´os

Raul Mur-Artal and Juan D. Tard ´os. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB- D cameras.IEEE Transactions on Robotics, 33(5):1255– 1262, 2017

2017
[46]

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D. Tard´os. ORB-SLAM: A versatile and accurate monocular SLAM system.IEEE Transactions on Robotics, 31(5):1147– 1163, 2015

2015
[47]

VIGS-Fusion: Fast Gaussian splatting SLAM processed onboard a small quadrotor

Abdoullah Ndoye, Amaury N `egre, Nicolas Marchand, and Franck Ruffier. VIGS-Fusion: Fast Gaussian splatting SLAM processed onboard a small quadrotor. InIEEE Inter- national Conference on Advanced Robotics (ICAR), 2025

2025
[48]

A survey on collaborative SLAM with 3D Gaussian splatting.arXiv preprint arXiv:2510.23988, 2025

Phuc Nguyen Xuan, Thanh Nguyen Canh, Huu-Hung Nguyen, Nak Young Chong, and Xiem HoangVan. A survey on collaborative SLAM with 3D Gaussian splatting.arXiv preprint arXiv:2510.23988, 2025

work page arXiv 2025
[49]

V oxblox: Incremental 3D Eu- clidean signed distance fields for on-board MA V planning

Helen Oleynikova, Zachary Taylor, Marius Fehr, Roland Siegwart, and Juan Nieto. V oxblox: Incremental 3D Eu- clidean signed distance fields for on-board MA V planning. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1366–1373, 2017

2017
[50]

COVINS-G: A generic back-end for collabo- rative visual-inertial SLAM

Manthan Patel, Marco Karrer, Philipp Banninger, and Mar- garita Chli. COVINS-G: A generic back-end for collabo- rative visual-inertial SLAM. InIEEE International Confer- ence on Robotics and Automation (ICRA), pages 8549–8555, 2023

2023
[51]

RTG-SLAM: Real-time 3D reconstruction at scale using Gaussian Splatting

Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, and Kun Zhou. RTG-SLAM: Real-time 3D reconstruction at scale using Gaussian Splatting. InACM SIGGRAPH Conference Papers, 2024

2024
[52]

VINS-Mono: A robust and versatile monocular visual-inertial state estimator

Tong Qin, Peiliang Li, and Shaojie Shen. VINS-Mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018

2018
[53]

SayNav: Grounding large language models for dynamic planning to navigation in new environments

Abhinav Rajvanshi, Karan Sikka, Xiao Lin, Bhoram Lee, Han-Pang Chiu, and Alvaro Velasquez. SayNav: Grounding large language models for dynamic planning to navigation in new environments. InProceedings of the International Con- ference on Automated Planning and Scheduling (ICAPS), pages 464–474, 2024

2024
[54]

Kimera: an open-source library for real-time metric- semantic localization and mapping

Antoni Rosinol, Marcus Abate, Yun Chang, and Luca Car- lone. Kimera: an open-source library for real-time metric- semantic localization and mapping. InIEEE International Conference on Robotics and Automation (ICRA), pages 1689–1696, 2020

2020
[55]

ORB: An efficient alternative to SIFT or SURF

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. ORB: An efficient alternative to SIFT or SURF. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 2564–2571, 2011

2011
[56]

Os- wald

Erik Sandstr ¨om, Yue Li, Luc Van Gool, and Martin R. Os- wald. Point-SLAM: Dense neural point cloud-based SLAM. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 18433–18444, 2023

2023
[57]

Splat-SLAM: Globally optimized RGB-only SLAM with 3D Gaussians

Erik Sandstr ¨om, Ganlin Zhang, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Youmin Zhang, Manthan Pa- tel, Luc Van Gool, Martin Oswald, and Federico Tombari. Splat-SLAM: Globally optimized RGB-only SLAM with 3D Gaussians. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1686–1697, 2025

2025
[58]

CCM-SLAM: Robust and efficient centralized collaborative monocular SLAM for robotic teams.Journal of Field Robotics, 36(4):763–781, 2019

Patrik Schmuck and Margarita Chli. CCM-SLAM: Robust and efficient centralized collaborative monocular SLAM for robotic teams.Journal of Field Robotics, 36(4):763–781, 2019

2019
[59]

COVINS: Visual-inertial SLAM for centralized collaboration

Patrik Schmuck, Thomas Ziegler, Marco Karrer, Jonathan Perraudin, and Margarita Chli. COVINS: Visual-inertial SLAM for centralized collaboration. InProceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 171–176, 2021

2021
[60]

LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action

Dhruv Shah, Bła ˙zej Osi ´nski, Brian Ichter, and Sergey Levine. LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. InProceedings of the 6th Conference on Robot Learning (CoRL), pages 492–504, 2022

2022
[61]

Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. S...

work page internal anchor Pith review arXiv 1906
[62]

A benchmark for the evalua- tion of RGB-D SLAM systems

J ¨urgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the evalua- tion of RGB-D SLAM systems. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 573–580, 2012

2012
[63]

Davi- son

Edgar Sucar, Shikun Liu, Joseph Ortiz, and Andrew J. Davi- son. iMAP: Implicit mapping and positioning in real-time. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 6229–6238, 2021

2021
[64]

DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras

Zachary Teed and Jia Deng. DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras. InAd- vances in Neural Information Processing Systems (NeurIPS), pages 16558–16569, 2021

2021
[65]

Deep patch visual odometry

Zachary Teed, Lahav Lipson, and Jia Deng. Deep patch visual odometry. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), 2023

2023
[66]

Annika Thomas, Aneesa Sonawalla, Alex Rose, and Jonathan P. How. GRAND-SLAM: Local optimization for globally consistent large-scale multi-agent Gaussian SLAM. IEEE Robotics and Automation Letters, 10:13129–13136, 2025

2025
[67]

How, and Luca Carlone

Yulun Tian, Yun Chang, Fernando Herrera Arias, Carlos Nieto-Granda, Jonathan P. How, and Luca Carlone. Kimera- Multi: Robust, distributed, dense metric-semantic SLAM for multi-robot systems.IEEE Transactions on Robotics, 38(4): 2022–2038, 2022

2022
[68]

How nerfs and 3d gaussian splatting are reshaping slam: A survey

Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandstr ¨om, Stefano Mattoccia, Martin R. Oswald, and Matteo Poggi. How NeRFs and 3D Gaussian Splatting are reshaping SLAM: a survey.arXiv preprint arXiv:2402.13255, 2024

work page arXiv 2024
[69]

Visual-inertial mapping with non-linear factor recovery.IEEE Robotics and Automation Letters, 5(2):422–429, 2020

Vladyslav Usenko, Nikolaus Demmel, David Schubert, J ¨org St¨uckler, and Daniel Cremers. Visual-inertial mapping with non-linear factor recovery.IEEE Robotics and Automation Letters, 5(2):422–429, 2020

2020
[70]

Buckley, and Tim Verbelen

Toon Van de Maele, Ozan Catal, Alexander Tschantz, Christopher L. Buckley, and Tim Verbelen. Variational Bayes Gaussian splatting, 2024

2024
[71]

Co- SLAM: Joint coordinate and sparse parametric encodings for neural real-time SLAM

Hengyi Wang, Jingwen Wang, and Lourdes Agapito. Co- SLAM: Joint coordinate and sparse parametric encodings for neural real-time SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13293–13302, 2023

2023
[72]

REACT3D: Real-time edge accelerator for incremental training in 3D Gaussian Splatting based SLAM systems

Hongyi Wang, Zhenhua Zhu, Tianchen Zhao, Yunfei Xiang, Zehao Wang, Jincheng Yu, Huazhong Yang, Yuan Xie, and Yu Wang. REACT3D: Real-time edge accelerator for incremental training in 3D Gaussian Splatting based SLAM systems. InIEEE/ACM International Symposium on Mi- croarchitecture (MICRO), 2025

2025
[73]

SEGS- SLAM: Structure-enhanced 3D Gaussian splatting slam with appearance embedding

Tianci Wen, Zhiang Liu, and Yongchun Fang. SEGS- SLAM: Structure-enhanced 3D Gaussian splatting slam with appearance embedding. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

2025
[74]

Hierarchical open- vocabulary 3D scene graphs for language-grounded robot navigation

Abdelrhman Werby, Chenguang Huang, Martin B ¨uchner, Abhinav Valada, and Wolfram Burgard. Hierarchical open- vocabulary 3D scene graphs for language-grounded robot navigation. InProceedings of Robotics: Science and Systems (RSS), 2024

2024
[75]

Embodied-rag: General non- parametric embodied memory for retrieval and generation.arXiv preprint arXiv:2409.18313, 2024

Quanting Xie, So Yeon Min, Tianyi Zhang, Kedi Xu, Aarav Bajaj, Ruslan Salakhutdinov, Matthew Johnson-Roberson, and Yonatan Bisk. Embodied-RAG: General non-parametric embodied memory for retrieval and generation.arXiv preprint arXiv:2409.18313, 2024

work page arXiv 2024
[76]

MAC-Ego3D: Multi-agent Gaussian consensus for real-time collaborative ego-motion and photorealistic 3D reconstruction

Xiaohao Xu, Feng Xue, Shibo Zhao, Yike Pan, Sebastian Scherer, and Xiaonan Huang. MAC-Ego3D: Multi-agent Gaussian consensus for real-time collaborative ego-motion and photorealistic 3D reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 854–863, 2025

2025
[77]

GS-SLAM: Dense visual SLAM with 3D Gaussian Splatting

Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, and Xuelong Li. GS-SLAM: Dense visual SLAM with 3D Gaussian Splatting. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19595–19604, 2024

2024
[78]

BioSLAM: A bioin- spired lifelong memory system for general place recognition

Peng Yin, Abulikemu Abuduweili, Shiqi Zhao, Lingyun Xu, Changliu Liu, and Sebastian Scherer. BioSLAM: A bioin- spired lifelong memory system for general place recognition. IEEE Transactions on Robotics, 39(6):4855–4874, 2023

2023
[79]

HAM- MER: Heterogeneous, multi-robot semantic gaussian splat- ting.IEEE Robotics and Automation Letters, 2025

Javier Yu, Timothy Chen, and Mac Schwager. HAM- MER: Heterogeneous, multi-robot semantic gaussian splat- ting.IEEE Robotics and Automation Letters, 2025

2025
[80]

GaussianUpdate: Continual 3D Gaussian Splatting update for changing en- vironments

Lin Zeng, Boming Zhao, Jiarui Hu, Xujie Shen, Ziqiang Dang, Hujun Bao, and Zhaopeng Cui. GaussianUpdate: Continual 3D Gaussian Splatting update for changing en- vironments. InIEEE/CVF International Conference on Computer Vision (ICCV), 2025

2025

Showing first 80 references.