pith. machine review for the scientific record. sign in

arxiv: 2604.16482 · v1 · submitted 2026-04-13 · 💻 cs.CV · cs.RO

Recognition: unknown

A Survey of Spatial Memory Representations for Efficient Robot Navigation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:21 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords spatial memoryrobot navigationSLAMmemory efficiencyneural representationsoccupancy gridsscene graphs3D Gaussian splatting
0
0 comments X

The pith

The ratio of peak runtime memory to saved map size varies by two orders of magnitude across navigation systems, showing architecture determines deployment feasibility more than map type.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines spatial memory representations for vision-based robots that must handle growing environments without exhausting limited onboard resources. It introduces alpha as the ratio of peak memory consumed during operation to the size of the map written to disk. Profiling 52 systems reveals alpha values from 2.3 to 215 even among neural methods, meaning some compact maps still require far more memory when active than their published sizes suggest. The work proposes a standardized evaluation protocol with measures such as memory growth rate and query latency that current benchmarks omit. Pareto analysis across regimes finds no single paradigm, whether occupancy grids or scene graphs, dominates in both accuracy and efficiency.

Core claim

The paper establishes that alpha equals peak runtime memory divided by saved map size and that this ratio spans two orders of magnitude within neural methods alone. This variation demonstrates that memory architecture, not the choice of paradigm label, sets whether a system can operate on embedded platforms with 8-16 GB shared memory. The survey supplies the first independent alpha reference values together with an alpha-aware budgeting algorithm for checking feasibility on target hardware before implementation.

What carries the argument

Alpha, defined as peak runtime memory divided by persistent saved map size, which quantifies the gap between published map sizes and actual deployment memory cost.

If this is right

  • Practitioners can apply the alpha budgeting algorithm to predict whether a chosen system fits target robot memory limits before coding begins.
  • 3D Gaussian splatting methods achieve highest accuracy on Replica benchmarks at 90-254 MB map sizes but still carry high alpha overhead.
  • Scene graphs deliver semantic abstraction with more predictable memory growth than dense neural representations.
  • Adopting the proposed protocol of memory growth rate, query latency, and completeness curves would enable fairer cross-system comparisons.
  • No single representation paradigm wins across all evaluation regimes, so selection must match the specific accuracy and resource constraints of the task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Alpha values measured on high-end GPUs may understate costs on actual embedded robot processors with shared memory and power limits.
  • A hybrid system that switches between representations based on current alpha budget could balance accuracy and efficiency in long-term navigation.
  • Extending the survey to dynamic outdoor environments might increase observed alpha ranges due to added processing for moving objects.
  • The budgeting algorithm could be added to standard robot software stacks to flag memory-infeasible configurations at design time.

Load-bearing premise

The selection of 52 systems from 88 references and the A100 GPU profiling setup yield alpha measurements that represent the field and generalize to other hardware and real robot scenarios.

What would settle it

Re-profiling the same systems on embedded hardware or additional platforms and finding alpha values consistently outside the reported 2.3-215 range would show the measurements do not generalize.

Figures

Figures reproduced from arXiv: 2604.16482 by Erwin P. Quilloy, Ma. Madecheen S. Pangaliman, Rowel Atienza, Steven S. Sison.

Figure 1
Figure 1. Figure 1: Evolution of spatial memory representations along [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Taxonomy of spatial memory representations with representative citations and typical efficiency metrics (gray). ATE = absolute [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: visualizes the tradeoff landscape with explicit bench￾mark separation (EuRoC left, Replica right); these bench￾marks are not directly comparable, hence both tables and figure separate them. Learned-flow systems (DROID￾SLAM, DPVO) are excluded as they lack persistent maps. The dashed Pareto front on Replica traces five non￾dominated points: iMAP→Co-SLAM→MonoGS→GS￾SLAM→SplaTAM. The largest gain is iMAP→Co-SL… view at source ↗
Figure 4
Figure 4. Figure 4: Runtime GPU memory on Replica/room0 (1 Hz sam [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

As vision-based robots navigate larger environments, their spatial memory grows without bound, eventually exhausting computational resources, particularly on embedded platforms (8-16GB shared memory, $<$30W) where adding hardware is not an option. This survey examines the spatial memory efficiency problem across 88 references spanning 52 systems (1989-2025), from occupancy grids to neural implicit representations. We introduce the $\alpha = M_{\text{peak}} / M_{\text{map}}$, the ratio of peak runtime memory (the total RAM or GPU memory consumed during operation) to saved map size (the persistent checkpoint written to disk), exposing the gap between published map sizes and actual deployment cost. Independent profiling on an NVIDIA A100 GPU reveals that $\alpha$ spans two orders of magnitude within neural methods alone, ranging from 2.3 (Point-SLAM) to 215 (NICE-SLAM, whose 47,MB map requires 10GB at runtime), showing that memory architecture, not paradigm label, determines deployment feasibility. We propose a standardized evaluation protocol comprising memory growth rate, query latency, memory-completeness curves, and throughput degradation, none of which current benchmarks capture. Through a Pareto frontier analysis with explicit benchmark separation, we show that no single paradigm dominates within its evaluation regime: 3DGS methods achieve the best absolute accuracy at 90-254,MB map size on Replica, while scene graphs provide semantic abstraction at predictable cost. We provide the first independently measured $\alpha$ reference values and an $\alpha$-aware budgeting algorithm enabling practitioners to assess deployment feasibility on target hardware prior to implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript surveys spatial memory representations across 88 references and 52 systems (1989-2025) for vision-based robot navigation. It introduces the ratio α = M_peak / M_map to quantify the gap between published map sizes and actual peak runtime memory usage on hardware, reports independent NVIDIA A100 GPU profiling showing α varying from 2.3 (Point-SLAM) to 215 (NICE-SLAM) even within neural methods, proposes a standardized evaluation protocol (memory growth rate, query latency, memory-completeness curves, throughput degradation), and performs a Pareto frontier analysis with benchmark separation to argue that no single paradigm dominates deployment feasibility.

Significance. If the α measurements prove accurate and representative, the work usefully highlights that memory architecture rather than paradigm label determines feasibility on embedded platforms, provides the first independent reference values for α, and supplies an α-aware budgeting algorithm. The independent profiling and explicit call for missing metrics (memory-completeness curves) address documented gaps in existing benchmarks and merit credit as concrete contributions.

major comments (2)
  1. [Abstract] Abstract: the central empirical claim that α spans two orders of magnitude within neural methods (2.3 to 215) and that architecture determines deployment feasibility rests on the A100 profiling results, yet the manuscript provides no description of the measurement protocol for M_peak (e.g., nvidia-smi, CUDA hooks, or total RSS), exact code versions and configurations of the 52 systems, how M_map was extracted from checkpoints, or data exclusion rules. This directly weakens the reported range and the architecture-vs-paradigm conclusion.
  2. [Abstract] Abstract and system selection: the claim that the profiled subset is representative relies on choosing 52 systems from 88 references, but no justification or sampling criteria are given to rule out convenience sampling or bias toward easily runnable implementations, which is load-bearing for generalizing the two-order α span to the broader field.
minor comments (2)
  1. [Abstract] Abstract: '47,MB' contains a typographical comma; it should read '47 MB'.
  2. The proposed standardized protocol is described at a high level; adding pseudocode or a concrete checklist for the memory-completeness curves would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of the α metric and the proposed evaluation protocol. We address each major comment below and have revised the manuscript to improve transparency and rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claim that α spans two orders of magnitude within neural methods (2.3 to 215) and that architecture determines deployment feasibility rests on the A100 profiling results, yet the manuscript provides no description of the measurement protocol for M_peak (e.g., nvidia-smi, CUDA hooks, or total RSS), exact code versions and configurations of the 52 systems, how M_map was extracted from checkpoints, or data exclusion rules. This directly weakens the reported range and the architecture-vs-paradigm conclusion.

    Authors: We agree that the absence of a detailed measurement protocol in the original manuscript weakens the empirical claims. In the revised manuscript we have added a new subsection (Section 3.2, Profiling Methodology) that specifies: M_peak was obtained via nvidia-smi sampled every 500 ms during complete navigation trajectories on the A100; exact Git commit hashes, Docker environments, and launch parameters for each profiled system are provided in the supplementary material and summarized in Table 3; M_map values were taken directly from the authors’ published checkpoints or generated map files without modification; and exclusion occurred only when a system could not be compiled or executed on the target hardware due to missing dependencies or CUDA version conflicts. These additions make the reported α range (2.3–215) reproducible and directly support the architecture-versus-paradigm conclusion. revision: yes

  2. Referee: [Abstract] Abstract and system selection: the claim that the profiled subset is representative relies on choosing 52 systems from 88 references, but no justification or sampling criteria are given to rule out convenience sampling or bias toward easily runnable implementations, which is load-bearing for generalizing the two-order α span to the broader field.

    Authors: We accept that explicit selection criteria were missing. The survey covers 88 references that describe 52 distinct systems; the A100 profiling was performed on a subset of 12 systems for which open-source code was available and runnable on our hardware. The revised manuscript now includes Section 2.3 (System Selection Criteria) stating that systems were chosen according to four rules: (i) public repository with runnable code, (ii) compatibility with A100 and CUDA 11.8, (iii) representation of at least three memory paradigms, and (iv) publication after 2018 for neural methods. We have also added a limitations paragraph acknowledging that this introduces a bias toward implementations that are easier to execute and that the observed α span is therefore demonstrated rather than proven for the entire literature. The two-order variation is presented as an existence result within the profiled neural subset, not as a universal claim. revision: partial

Circularity Check

0 steps flagged

No circularity: survey introduces independent α metric and reports external profiling results

full rationale

The paper is a literature survey covering 52 external systems across 88 references. It defines α = M_peak / M_map directly from standard memory concepts (peak runtime usage over checkpoint size) and reports new independent A100 GPU measurements on those systems. No equations, predictions, or central claims reduce to self-defined quantities, fitted inputs renamed as outputs, or load-bearing self-citations. The Pareto analysis and proposed evaluation protocol are derived from the surveyed external data without internal circular reduction. This matches the default non-circular case for a review with original empirical contributions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The survey's conclusions rest on the assumption that the chosen references comprehensively cover the field and that the newly defined α ratio usefully captures deployment-relevant costs beyond map size alone.

axioms (1)
  • domain assumption The 88 references spanning 52 systems from 1989-2025 form a representative sample of spatial memory representations for robot navigation.
    All comparative claims and the Pareto frontier analysis depend on this coverage assumption.
invented entities (1)
  • α = M_peak / M_map ratio no independent evidence
    purpose: To quantify and expose the gap between published map sizes and actual peak runtime memory consumption during robot operation.
    Newly introduced definition used to reinterpret existing systems and enable the budgeting algorithm.

pith-pipeline@v0.9.0 · 5608 in / 1527 out tokens · 72890 ms · 2026-05-10T15:21:22.264541+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

86 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    CL-Splats: Continual learning of Gaussian Splatting with local optimization

    Jan Ackermann, Jonas Kulhanek, Shengqu Cai, Haofei Xu, Marc Pollefeys, Gordon Wetzstein, Leonidas Guibas, and Songyou Peng. CL-Splats: Continual learning of Gaussian Splatting with local optimization. InIEEE/CVF Interna- tional Conference on Computer Vision (ICCV), 2025

  2. [2]

    MemGS: Memory-efficient Gaussian splatting for real-time SLAM

    Yinlong Bai, Hongxin Zhang, Sheng Zhong, Junkai Niu, Hai Li, Yijia He, and Yi Zhou. MemGS: Memory-efficient Gaussian splatting for real-time SLAM. InIEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 11097–11103, 2025

  3. [3]

    Ben Ali, Zakieh Sadat Hashemifar, and Karthik Dantu

    Ali J. Ben Ali, Zakieh Sadat Hashemifar, and Karthik Dantu. Edge-SLAM: Edge-assisted visual simultaneous localization and mapping. InProceedings of the 18th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys), pages 325–337, 2020

  4. [4]

    Achte- lik, and Roland Siegwart

    Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W. Achte- lik, and Roland Siegwart. The EuRoC micro aerial vehicle datasets.International Journal of Robotics Research, 35(10): 1157–1163, 2016

  5. [5]

    G ´omez Rodr´ıguez, Jos´e M.M

    Carlos Campos, Richard Elvira, Juan J. G ´omez Rodr´ıguez, Jos´e M.M. Montiel, and Juan D. Tard´os. ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM.IEEE Transactions on Robotics, 37(6): 1874–1890, 2021

  6. [6]

    A survey on 3D Gaus- sian Splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Guikun Chen and Wenguan Wang. A survey on 3D Gaus- sian Splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  7. [7]

    Splat-Nav: Safe real-time robot navigation in Gaussian splatting maps.IEEE Transactions on Robotics, 41, 2025

    Timothy Chen, Ola Shorinwa, Joseph Bruno, Aiden Swann, Javier Yu, Weijia Zeng, Keiko Nagami, Philip Dames, and Mac Schwager. Splat-Nav: Safe real-time robot navigation in Gaussian splatting maps.IEEE Transactions on Robotics, 41, 2025

  8. [8]

    FAB-MAP: Probabilistic localization and mapping in the space of appearance.The International Journal of Robotics Research, 27(6):647–665, 2008

    Mark Cummins and Paul Newman. FAB-MAP: Probabilistic localization and mapping in the space of appearance.The International Journal of Robotics Research, 27(6):647–665, 2008

  9. [9]

    Andrew J. Davison. Real-time simultaneous localisation and mapping with a single camera. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 1403–1410, 2003

  10. [10]

    Gi- gaSLAM: Large-scale monocular SLAM with hierarchical Gaussian splats

    Kai Deng, Jian Yang, Shenlong Wang, and Jin Xie. Gi- gaSLAM: Large-scale monocular SLAM with hierarchical Gaussian splats. InACM SIGGRAPH Asia Conference Papers, 2025

  11. [11]

    SuperPoint: Self-supervised interest point detection and description

    Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. SuperPoint: Self-supervised interest point detection and description. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 224–236, 2018

  12. [12]

    UFOMap: An effi- cient probabilistic 3D mapping framework that embraces the unknown.IEEE Robotics and Automation Letters, 5(4): 6411–6418, 2020

    Daniel Duberg and Patric Jensfelt. UFOMap: An effi- cient probabilistic 3D mapping framework that embraces the unknown.IEEE Robotics and Automation Letters, 5(4): 6411–6418, 2020

  13. [13]

    Using occupancy grids for mobile robot perception and navigation.Computer, 22(6):46–57, 1989

    Alberto Elfes. Using occupancy grids for mobile robot perception and navigation.Computer, 22(6):46–57, 1989

  14. [14]

    LSD- SLAM: Large-scale direct monocular SLAM

    Jakob Engel, Thomas Sch ¨ops, and Daniel Cremers. LSD- SLAM: Large-scale direct monocular SLAM. InEuropean Conference on Computer Vision (ECCV), pages 834–849, 2014

  15. [15]

    Direct sparse odometry.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3):611–625, 2018

    Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3):611–625, 2018

  16. [16]

    arXiv preprint arXiv:2503.14665 (2025) 8

    Parker Ewen, Hao Chen, Seth Isaacson, Joey Wilson, Katherine A. Skinner, and Ram Vasudevan. These magic moments: Differentiable uncertainty quantification of radi- ance field models.arXiv preprint arXiv:2503.14665, 2025

  17. [17]

    LightGaussian: Unbounded 3D Gaussian compression with15×reduction and200+ FPS

    Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, and Zhangyang Wang. LightGaussian: Unbounded 3D Gaussian compression with15×reduction and200+ FPS. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  18. [18]

    DiskChunGS: Large-scale 3D Gaussian SLAM through chunk-based memory management.IEEE Robotics and Automation Letters, 11(4):5009–5016, 2026

    Casimir Feldmann, Maximum Wilder-Smith, Vaishakh Patil, Michael Oechsle, Michael Niemeyer, Keisuke Tateno, and Marco Hutter. DiskChunGS: Large-scale 3D Gaussian SLAM through chunk-based memory management.IEEE Robotics and Automation Letters, 11(4):5009–5016, 2026

  19. [19]

    CoWs on pasture: Base- lines and benchmarks for language-driven zero-shot object navigation

    Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, and Shuran Song. CoWs on pasture: Base- lines and benchmarks for language-driven zero-shot object navigation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23171–23181, 2023

  20. [20]

    Is semantic SLAM ready for embedded systems? A compara- tive survey.arXiv preprint arXiv:2505.12384, 2025

    Calvin Galagain, Martyna Poreba, and Franc ¸ois Goulette. Is semantic SLAM ready for embedded systems? A compara- tive survey.arXiv preprint arXiv:2505.12384, 2025

  21. [21]

    GEVO: Memory-efficient monocular visual odometry using Gaussians.IEEE Robotics and Automation Letters, 2025

    Dasong Gao, Peter Zhi Xuan Li, Vivienne Sze, and Ser- tac Karaman. GEVO: Memory-efficient monocular visual odometry using Gaussians.IEEE Robotics and Automation Letters, 2025

  22. [22]

    RGBD GS- ICP SLAM

    Seongbo Ha, Jiung Yeon, and Hyeonwoo Yu. RGBD GS- ICP SLAM. InEuropean Conference on Computer Vision (ECCV), 2024

  23. [23]

    Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard

    Armin Hornung, Kai M. Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard. OctoMap: An efficient probabilistic 3D mapping framework based on octrees.Au- tonomous Robots, 34(3):189–206, 2013

  24. [24]

    Visual language maps for robot navigation

    Chenguang Huang, Oier Mees, Andy Zeng, and Wolfram Burgard. Visual language maps for robot navigation. In IEEE International Conference on Robotics and Automation (ICRA), pages 10608–10615, 2023

  25. [25]

    Photo-SLAM: Real-time simultaneous localization and photorealistic mapping for monocular, stereo, and RGB-D cameras

    Huajian Huang, Longwei Li, Hui Cheng, and Sai-Kit Ye- ung. Photo-SLAM: Real-time simultaneous localization and photorealistic mapping for monocular, stereo, and RGB-D cameras. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  26. [26]

    Hydra: A real-time spatial perception system for 3D scene graph construction and optimization

    Nathan Hughes, Yun Yang, and Luca Carlone. Hydra: A real-time spatial perception system for 3D scene graph construction and optimization. InProceedings of Robotics: Science and Systems (RSS), 2022

  27. [27]

    ESLAM: Efficient dense SLAM system based on hybrid representation of signed distance fields

    Mohammad Mahdi Johari, Camilla Carta, and Franc ¸ois Fleuret. ESLAM: Efficient dense SLAM system based on hybrid representation of signed distance fields. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17408–17419, 2023

  28. [28]

    Online language splatting

    Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo, Xinyu Huang, Guoquan Huang, and Liu Ren. Online language splatting. InIEEE/CVF International Conference on Com- puter Vision (ICCV), 2025

  29. [29]

    SplaTAM: Splat, track & map 3D Gaus- sians for dense RGB-D SLAM

    Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. SplaTAM: Splat, track & map 3D Gaus- sians for dense RGB-D SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21357–21366, 2024

  30. [30]

    3D Gaussian Splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42(4):1–14, 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for real-time radiance field rendering.ACM Transactions on Graphics (TOG), 42(4):1–14, 2023

  31. [31]

    Jun-Seong Kim, GeonU Kim, Yu-Ji Kim, Yu-Chiang Frank Wang, Jaesung Choe, and Tae-Hyun Oh. Dr. Splat: Directly referring 3D Gaussian Splatting via direct language embed- ding registration. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. Highlight

  32. [32]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations (ICLR), 2015

  33. [33]

    Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catastrophic for- getting in neural networks.Proceedings of the National Academy of Sciences, 114(13):35...

  34. [34]

    Parallel tracking and map- ping for small AR workspaces

    Georg Klein and David Murray. Parallel tracking and map- ping for small AR workspaces. InIEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR), pages 225–234, 2007

  35. [35]

    Mathieu Labb ´e and Franc ¸ois Michaud. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online opera- tion.Journal of Field Robotics, 36(2):416–446, 2019

  36. [36]

    Swarm-SLAM: Sparse decentralized collaborative simultaneous localization and mapping framework for multi-robot systems.IEEE Robotics and Automation Letters, 9(1):475–482, 2024

    Pierre-Yves Lajoie and Giovanni Beltrame. Swarm-SLAM: Sparse decentralized collaborative simultaneous localization and mapping framework for multi-robot systems.IEEE Robotics and Automation Letters, 9(1):475–482, 2024

  37. [37]

    Compact 3D Gaussian representation for radiance field

    Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3D Gaussian representation for radiance field. InIEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 21719–21728, 2024

  38. [38]

    GaussNav: Gaussian splatting for visual navigation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47:4108–4121, 2025

    Xiaohan Lei, Min Wang, Wengang Zhou, and Houqiang Li. GaussNav: Gaussian splatting for visual navigation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47:4108–4121, 2025

  39. [39]

    SGS-SLAM: Semantic Gaussian splatting for neural dense SLAM

    Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, and Hongyu Wang. SGS-SLAM: Semantic Gaussian splatting for neural dense SLAM. In European Conference on Computer Vision (ECCV), pages 163–179. Springer, 2024

  40. [40]

    Open scene graphs for open-world object-goal navigation

    Joel Loo, Zhanxin Wu, and David Hsu. Open scene graphs for open-world object-goal navigation. InICRA Workshop on Vision-Language Models for Navigation and Manipulation (VLMNM), 2024

  41. [41]

    Clio: Real-time task- driven open-set 3D scene graphs.IEEE Robotics and Au- tomation Letters, 9(10):8921–8928, 2024

    Dominic Maggio, Yun Chang, Nathan Hughes, Matthew Trang, Dan Griffith, Carlyn Dougherty, Eric Cristofalo, Lukas Schmid, and Luca Carlone. Clio: Real-time task- driven open-set 3D scene graphs.IEEE Robotics and Au- tomation Letters, 9(10):8921–8928, 2024

  42. [42]

    Hidenobu Matsuki, Riku Murai, Paul H. J. Kelly, and An- drew J. Davison. Gaussian splatting SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  43. [43]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision (ECCV), pages 405–421, 2020

  44. [44]

    Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM Transactions on Graphics (TOG), 41(4):102:1–102:15, 2022

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM Transactions on Graphics (TOG), 41(4):102:1–102:15, 2022

  45. [45]

    Tard ´os

    Raul Mur-Artal and Juan D. Tard ´os. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB- D cameras.IEEE Transactions on Robotics, 33(5):1255– 1262, 2017

  46. [46]

    Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D. Tard´os. ORB-SLAM: A versatile and accurate monocular SLAM system.IEEE Transactions on Robotics, 31(5):1147– 1163, 2015

  47. [47]

    VIGS-Fusion: Fast Gaussian splatting SLAM processed onboard a small quadrotor

    Abdoullah Ndoye, Amaury N `egre, Nicolas Marchand, and Franck Ruffier. VIGS-Fusion: Fast Gaussian splatting SLAM processed onboard a small quadrotor. InIEEE Inter- national Conference on Advanced Robotics (ICAR), 2025

  48. [48]

    A survey on collaborative SLAM with 3D Gaussian splatting.arXiv preprint arXiv:2510.23988, 2025

    Phuc Nguyen Xuan, Thanh Nguyen Canh, Huu-Hung Nguyen, Nak Young Chong, and Xiem HoangVan. A survey on collaborative SLAM with 3D Gaussian splatting.arXiv preprint arXiv:2510.23988, 2025

  49. [49]

    V oxblox: Incremental 3D Eu- clidean signed distance fields for on-board MA V planning

    Helen Oleynikova, Zachary Taylor, Marius Fehr, Roland Siegwart, and Juan Nieto. V oxblox: Incremental 3D Eu- clidean signed distance fields for on-board MA V planning. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1366–1373, 2017

  50. [50]

    COVINS-G: A generic back-end for collabo- rative visual-inertial SLAM

    Manthan Patel, Marco Karrer, Philipp Banninger, and Mar- garita Chli. COVINS-G: A generic back-end for collabo- rative visual-inertial SLAM. InIEEE International Confer- ence on Robotics and Automation (ICRA), pages 8549–8555, 2023

  51. [51]

    RTG-SLAM: Real-time 3D reconstruction at scale using Gaussian Splatting

    Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, and Kun Zhou. RTG-SLAM: Real-time 3D reconstruction at scale using Gaussian Splatting. InACM SIGGRAPH Conference Papers, 2024

  52. [52]

    VINS-Mono: A robust and versatile monocular visual-inertial state estimator

    Tong Qin, Peiliang Li, and Shaojie Shen. VINS-Mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 34(4):1004–1020, 2018

  53. [53]

    SayNav: Grounding large language models for dynamic planning to navigation in new environments

    Abhinav Rajvanshi, Karan Sikka, Xiao Lin, Bhoram Lee, Han-Pang Chiu, and Alvaro Velasquez. SayNav: Grounding large language models for dynamic planning to navigation in new environments. InProceedings of the International Con- ference on Automated Planning and Scheduling (ICAPS), pages 464–474, 2024

  54. [54]

    Kimera: an open-source library for real-time metric- semantic localization and mapping

    Antoni Rosinol, Marcus Abate, Yun Chang, and Luca Car- lone. Kimera: an open-source library for real-time metric- semantic localization and mapping. InIEEE International Conference on Robotics and Automation (ICRA), pages 1689–1696, 2020

  55. [55]

    ORB: An efficient alternative to SIFT or SURF

    Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. ORB: An efficient alternative to SIFT or SURF. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 2564–2571, 2011

  56. [56]

    Os- wald

    Erik Sandstr ¨om, Yue Li, Luc Van Gool, and Martin R. Os- wald. Point-SLAM: Dense neural point cloud-based SLAM. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 18433–18444, 2023

  57. [57]

    Splat-SLAM: Globally optimized RGB-only SLAM with 3D Gaussians

    Erik Sandstr ¨om, Ganlin Zhang, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Youmin Zhang, Manthan Pa- tel, Luc Van Gool, Martin Oswald, and Federico Tombari. Splat-SLAM: Globally optimized RGB-only SLAM with 3D Gaussians. InIEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1686–1697, 2025

  58. [58]

    CCM-SLAM: Robust and efficient centralized collaborative monocular SLAM for robotic teams.Journal of Field Robotics, 36(4):763–781, 2019

    Patrik Schmuck and Margarita Chli. CCM-SLAM: Robust and efficient centralized collaborative monocular SLAM for robotic teams.Journal of Field Robotics, 36(4):763–781, 2019

  59. [59]

    COVINS: Visual-inertial SLAM for centralized collaboration

    Patrik Schmuck, Thomas Ziegler, Marco Karrer, Jonathan Perraudin, and Margarita Chli. COVINS: Visual-inertial SLAM for centralized collaboration. InProceedings of the IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pages 171–176, 2021

  60. [60]

    LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action

    Dhruv Shah, Bła ˙zej Osi ´nski, Brian Ichter, and Sergey Levine. LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. InProceedings of the 6th Conference on Robot Learning (CoRL), pages 492–504, 2022

  61. [61]

    Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. S...

  62. [62]

    A benchmark for the evalua- tion of RGB-D SLAM systems

    J ¨urgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the evalua- tion of RGB-D SLAM systems. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 573–580, 2012

  63. [63]

    Davi- son

    Edgar Sucar, Shikun Liu, Joseph Ortiz, and Andrew J. Davi- son. iMAP: Implicit mapping and positioning in real-time. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 6229–6238, 2021

  64. [64]

    DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras

    Zachary Teed and Jia Deng. DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras. InAd- vances in Neural Information Processing Systems (NeurIPS), pages 16558–16569, 2021

  65. [65]

    Deep patch visual odometry

    Zachary Teed, Lahav Lipson, and Jia Deng. Deep patch visual odometry. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), 2023

  66. [66]

    Annika Thomas, Aneesa Sonawalla, Alex Rose, and Jonathan P. How. GRAND-SLAM: Local optimization for globally consistent large-scale multi-agent Gaussian SLAM. IEEE Robotics and Automation Letters, 10:13129–13136, 2025

  67. [67]

    How, and Luca Carlone

    Yulun Tian, Yun Chang, Fernando Herrera Arias, Carlos Nieto-Granda, Jonathan P. How, and Luca Carlone. Kimera- Multi: Robust, distributed, dense metric-semantic SLAM for multi-robot systems.IEEE Transactions on Robotics, 38(4): 2022–2038, 2022

  68. [68]

    How nerfs and 3d gaussian splatting are reshaping slam: A survey

    Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandstr ¨om, Stefano Mattoccia, Martin R. Oswald, and Matteo Poggi. How NeRFs and 3D Gaussian Splatting are reshaping SLAM: a survey.arXiv preprint arXiv:2402.13255, 2024

  69. [69]

    Visual-inertial mapping with non-linear factor recovery.IEEE Robotics and Automation Letters, 5(2):422–429, 2020

    Vladyslav Usenko, Nikolaus Demmel, David Schubert, J ¨org St¨uckler, and Daniel Cremers. Visual-inertial mapping with non-linear factor recovery.IEEE Robotics and Automation Letters, 5(2):422–429, 2020

  70. [70]

    Buckley, and Tim Verbelen

    Toon Van de Maele, Ozan Catal, Alexander Tschantz, Christopher L. Buckley, and Tim Verbelen. Variational Bayes Gaussian splatting, 2024

  71. [71]

    Co- SLAM: Joint coordinate and sparse parametric encodings for neural real-time SLAM

    Hengyi Wang, Jingwen Wang, and Lourdes Agapito. Co- SLAM: Joint coordinate and sparse parametric encodings for neural real-time SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13293–13302, 2023

  72. [72]

    REACT3D: Real-time edge accelerator for incremental training in 3D Gaussian Splatting based SLAM systems

    Hongyi Wang, Zhenhua Zhu, Tianchen Zhao, Yunfei Xiang, Zehao Wang, Jincheng Yu, Huazhong Yang, Yuan Xie, and Yu Wang. REACT3D: Real-time edge accelerator for incremental training in 3D Gaussian Splatting based SLAM systems. InIEEE/ACM International Symposium on Mi- croarchitecture (MICRO), 2025

  73. [73]

    SEGS- SLAM: Structure-enhanced 3D Gaussian splatting slam with appearance embedding

    Tianci Wen, Zhiang Liu, and Yongchun Fang. SEGS- SLAM: Structure-enhanced 3D Gaussian splatting slam with appearance embedding. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

  74. [74]

    Hierarchical open- vocabulary 3D scene graphs for language-grounded robot navigation

    Abdelrhman Werby, Chenguang Huang, Martin B ¨uchner, Abhinav Valada, and Wolfram Burgard. Hierarchical open- vocabulary 3D scene graphs for language-grounded robot navigation. InProceedings of Robotics: Science and Systems (RSS), 2024

  75. [75]

    Embodied-rag: General non- parametric embodied memory for retrieval and generation.arXiv preprint arXiv:2409.18313, 2024

    Quanting Xie, So Yeon Min, Tianyi Zhang, Kedi Xu, Aarav Bajaj, Ruslan Salakhutdinov, Matthew Johnson-Roberson, and Yonatan Bisk. Embodied-RAG: General non-parametric embodied memory for retrieval and generation.arXiv preprint arXiv:2409.18313, 2024

  76. [76]

    MAC-Ego3D: Multi-agent Gaussian consensus for real-time collaborative ego-motion and photorealistic 3D reconstruction

    Xiaohao Xu, Feng Xue, Shibo Zhao, Yike Pan, Sebastian Scherer, and Xiaonan Huang. MAC-Ego3D: Multi-agent Gaussian consensus for real-time collaborative ego-motion and photorealistic 3D reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 854–863, 2025

  77. [77]

    GS-SLAM: Dense visual SLAM with 3D Gaussian Splatting

    Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, and Xuelong Li. GS-SLAM: Dense visual SLAM with 3D Gaussian Splatting. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19595–19604, 2024

  78. [78]

    BioSLAM: A bioin- spired lifelong memory system for general place recognition

    Peng Yin, Abulikemu Abuduweili, Shiqi Zhao, Lingyun Xu, Changliu Liu, and Sebastian Scherer. BioSLAM: A bioin- spired lifelong memory system for general place recognition. IEEE Transactions on Robotics, 39(6):4855–4874, 2023

  79. [79]

    HAM- MER: Heterogeneous, multi-robot semantic gaussian splat- ting.IEEE Robotics and Automation Letters, 2025

    Javier Yu, Timothy Chen, and Mac Schwager. HAM- MER: Heterogeneous, multi-robot semantic gaussian splat- ting.IEEE Robotics and Automation Letters, 2025

  80. [80]

    GaussianUpdate: Continual 3D Gaussian Splatting update for changing en- vironments

    Lin Zeng, Boming Zhao, Jiarui Hu, Xujie Shen, Ziqiang Dang, Hujun Bao, and Zhaopeng Cui. GaussianUpdate: Continual 3D Gaussian Splatting update for changing en- vironments. InIEEE/CVF International Conference on Computer Vision (ICCV), 2025

Showing first 80 references.