Change-Robust Online Spatial-Semantic Topological Mapping

Atharva Ghotavadekar; Diwen Liu; Harold Soh; Jiaming Wang; Jiaxuan Da; Jizhuo Chen; Linh K\"astner

arxiv: 2605.02227 · v1 · submitted 2026-05-04 · 💻 cs.RO

Change-Robust Online Spatial-Semantic Topological Mapping

Jiaming Wang , Jizhuo Chen , Diwen Liu , Atharva Ghotavadekar , Jiaxuan Da , Linh K\"astner , Harold Soh This is my paper

Pith reviewed 2026-05-08 18:56 UTC · model grok-4.3

classification 💻 cs.RO

keywords topological mappingchange-robust navigationspatial-semantic reasoningRGB-D keyframeshypothesis testingrobot localizationperceptual aliasingSLAM alternatives

0 comments

The pith

Robots can navigate reliably amid lighting changes and rearranged furniture by using an online topological graph of RGB-D keyframes instead of a globally consistent metric map.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that spatial-semantic reasoning for robot navigation remains reliable when an online pose-aware topological graph of RGB-D keyframes replaces the usual SLAM-built metric map. Existing pipelines attach semantics to those metric maps, yet they break when appearance shifts or scene dynamics interfere with data association and relocalization. The proposed method instead reasons explicitly over perceptual ambiguity through sequential hypothesis testing in continuous three-dimensional pose space and keeps a bounded mixture belief over possible poses. This matters for autonomous robots because real environments constantly alter through lighting, moved objects, or other changes that would otherwise force unsafe or lost navigation decisions.

Core claim

The central claim is that an online, pose-aware topological graph of RGB-D keyframes, together with sequential hypothesis testing in continuous SE(3), supplies sufficient spatial-semantic information for navigation without requiring a globally consistent metric substrate. The estimator maintains a bounded Gaussian-mixture belief over poses, which supports principled loop-closure handling and recovery from kidnapped-robot events. Experiments with real-robot object-goal navigation under lighting shifts and furniture rearrangement show improved robustness over SLAM-based and standard topological baselines while preserving safety under perceptual aliasing.

What carries the argument

An online pose-aware topological graph of RGB-D keyframes combined with sequential hypothesis testing in continuous three-dimensional pose space, which supplies the spatial-semantic information and bounded pose beliefs needed for navigation decisions.

If this is right

Object-goal navigation stays safe and accurate even when lighting conditions change or furniture is moved.
The system handles perceptual aliasing without catastrophic failure where multiple locations appear similar.
Loop closures and sudden robot displacements are managed through the bounded mixture belief over poses.
Navigation decisions remain more reliable than those from methods that depend on global metric consistency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bounded pose belief could support incremental updates over very long periods without map rebuilding.
Sharing such topological graphs among multiple robots might avoid the alignment problems that metric maps create.
Pairing the graph with object-level semantic labels could enable planning that reasons directly about reachable places rather than coordinates.

Load-bearing premise

An online pose-aware topological graph of RGB-D keyframes plus sequential hypothesis testing in continuous three-dimensional pose space can supply enough spatial-semantic information for reliable navigation decisions without a globally consistent metric substrate.

What would settle it

A real-robot trial in which the topological-graph method produces unsafe paths or loses localization accuracy during combined lighting shifts and furniture rearrangement, performing no better than the SLAM or topological baselines under perceptual aliasing, would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.02227 by Atharva Ghotavadekar, Diwen Liu, Harold Soh, Jiaming Wang, Jiaxuan Da, Jizhuo Chen, Linh K\"astner.

**Figure 1.** Figure 1: Our Change-Robust Online Spatial–Semantic (CROSS) representation enables robust language-goal navigation (A) under substantial appearance and scene changes, including lighting variation (C), object rearrangement, dynamic pedestrians (B,E), and unexpected sensor failures (D). CROSS constructs a pose-aware topological graph and explicitly reasons over ambiguity via sequential hypothesis testing in continuous… view at source ↗

**Figure 3.** Figure 3: Change-Robust Online Spatial-Semantic (CROSS) Topological System Overview. Given an RGB-D frame and odometry, the online tracking module (orange) performs sequential hypothesis testing in continuous SE(3). Motion updates are propagated via SE(3) pushforward, while measurement updates are constructed through VPR-based keyframe retrieval. Competing hypotheses are efficiently managed using Gaussian-mixture c… view at source ↗

**Figure 4.** Figure 4: Appearance change at the same physical locations for the two benchmarks. The top row shows Rover (Campus) across different months/times, while the bottom row shows OpenLORIS (Corridor) across different times of day. kidnapped-robot event, appears as an additional hypothesis whose trajectory becomes consistent with an older region of the map. After the SHT step, we retain a small set of hypotheses {h (l) t … view at source ↗

**Figure 5.** Figure 5: Multi-session relocalization results on the Rover [32] Campus scene. Left: Relocalization outcomes across different locations. Each row corresponds to a mapping trajectory (indicated by different colors), while columns show relocalization attempts at the same physical locations captured at different times or months, as illustrated in the top image. Empty space indicates relocalization failed at that specif… view at source ↗

**Figure 6.** Figure 6: Example illustrations of the three evaluation settings for the real quadruped-robot experiment. Each image shows the environment before (top) and after (bottom) the change. Left: Lighting Change (LC). Middle: Object Rearrangement (OR). Right: Combined Change (LC+OR). on a quadruped robot operating in a changing indoor environment. These experiments are designed to assess the robustness of our spatial–se… view at source ↗

**Figure 7.** Figure 7: Fast-motion sequence across seven timestamps. Top row shows the online mapping trajectory of our system: yellow view at source ↗

**Figure 8.** Figure 8: Occlusion sequence across six timestamps. Top row shows the online mapping and belief evolution of our system: view at source ↗

**Figure 9.** Figure 9: Noisy-odometry experiment under different signal-to-noise ratio (SNR) settings. Each column shows the full trajectory view at source ↗

**Figure 10.** Figure 10: Runtime breakdown of the mapping pipeline per step, comparing relative pose estimation via PnP-RANSAC versus VGGT [39]. Bars report the total average step time and its main components: relative pose estimation (Rel. pose), visual place recognition retrieval (VPR), and sequential hypothesis testing. The logscale y-axis highlights the large disparity in pose-estimation cost. D. Runtime Analysis view at source ↗

**Figure 11.** Figure 11: Factor-graph representation of our Gaussian mixture filtering model. ϕ mot t and ϕ meas t are the motion and measurement factors. xt is the current pose, ut the odometry input, and zt the RGBD observation. Yt is a latent association variable identifying which keyframe explains zt, and G is the set of stored keyframes view at source ↗

**Figure 12.** Figure 12: Multi-session relocalization results on the Rover [32] Campus scene. view at source ↗

**Figure 13.** Figure 13: Multi-session relocalization results on the OpenLORIS [33] Corridor scene. view at source ↗

read the original abstract

Autonomous robots require change-robust spatial-semantic reasoning: using spatial and semantic knowledge to decide where to go, how to get there, and where the robot is despite environmental change. Existing approaches typically attach semantics to SLAM-built metric maps, but these pipelines are brittle under appearance shifts and scene dynamics, where data association and relocalization degrade. We propose a Change-Robust Online Spatial-Semantic (CROSS) representation that replaces a globally consistent metric substrate with an online, pose-aware topological graph of RGB-D keyframes. The system explicitly reasons over perceptual ambiguity using sequential hypothesis testing in continuous SE(3). Our estimator maintains a bounded Gaussian-mixture belief over poses, enabling principled handling of loop closures and kidnapped-robot events. Experiments under severe appearance change, including real-robot object-goal navigation with lighting shifts and furniture rearrangement, demonstrate improved robustness over SLAM-based and topological baselines while remaining safe under perceptual aliasing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CROSS proposes a topological mapping system with pose-aware graphs, Gaussian-mixture beliefs, and SE(3) hypothesis testing to handle environmental change, but the evidence that this replaces metric consistency without losing reliability is still preliminary.

read the letter

The main takeaway is that this paper pushes a topological alternative to metric SLAM for robots that must navigate despite lighting shifts and object rearrangements. It builds an online graph of RGB-D keyframes, tracks pose uncertainty with a bounded Gaussian mixture, and applies sequential hypothesis testing in continuous SE(3) to resolve aliasing and lost-robot cases without forcing a single global map.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Change-Robust Online Spatial-Semantic (CROSS) representation for autonomous robots, which replaces a globally consistent metric substrate with an online pose-aware topological graph of RGB-D keyframes. The system uses sequential hypothesis testing in continuous SE(3) and maintains a bounded Gaussian-mixture belief over poses to handle perceptual ambiguities, loop closures, and kidnapped-robot events. Experiments with real-robot object-goal navigation under severe appearance changes (lighting shifts and furniture rearrangement) are reported to show improved robustness over SLAM-based and topological baselines while remaining safe under perceptual aliasing.

Significance. If the central claims hold, the work would be significant for robot mapping and navigation in dynamic environments, as it provides a principled topological alternative to brittle metric SLAM pipelines under appearance and structural change. The explicit handling of ambiguity via SE(3) hypothesis testing and bounded mixture beliefs, combined with real-robot trials, offers a concrete advance over purely metric or purely topological baselines. Credit is due for focusing on safety under aliasing and for grounding the evaluation in object-goal navigation tasks.

major comments (2)

[the proposed CROSS representation and estimator] The load-bearing claim that relative pose estimates between keyframes plus the multi-hypothesis SE(3) belief suffice for reliable navigation decisions without global metric consistency is not accompanied by an explicit analysis of residual pose uncertainty (particularly when furniture rearrangement alters keyframe visibility). This assumption underpins the safety and robustness assertions but lacks a concrete bound or failure-mode characterization in the method description.
[Experiments] The abstract states that experiments demonstrate improved robustness, yet provides no quantitative metrics, error bars, or statistical comparison details. Without these, it is impossible to assess whether the topological approach actually outperforms baselines by a margin that justifies replacing metric substrates.

minor comments (2)

The abstract would be strengthened by including at least one key quantitative result (e.g., success rate or navigation time under change) to support the robustness claim.
Notation for the bounded Gaussian-mixture belief and the sequential hypothesis test should be introduced with explicit equations or pseudocode for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and have revised the manuscript to incorporate the suggested improvements where they strengthen the work.

read point-by-point responses

Referee: [the proposed CROSS representation and estimator] The load-bearing claim that relative pose estimates between keyframes plus the multi-hypothesis SE(3) belief suffice for reliable navigation decisions without global metric consistency is not accompanied by an explicit analysis of residual pose uncertainty (particularly when furniture rearrangement alters keyframe visibility). This assumption underpins the safety and robustness assertions but lacks a concrete bound or failure-mode characterization in the method description.

Authors: We thank the referee for identifying this point. The CROSS estimator maintains a bounded Gaussian-mixture belief over SE(3) poses via sequential hypothesis testing precisely to represent residual uncertainty and perceptual ambiguity without relying on global metric consistency; navigation decisions are conditioned on the full belief support to preserve safety. We agree, however, that an explicit characterization of how this uncertainty evolves under furniture rearrangement (which can reduce keyframe visibility) would make the safety claims more concrete. We have added a dedicated paragraph in the method section providing a bound on residual pose uncertainty and discussing associated failure modes. revision: yes
Referee: [Experiments] The abstract states that experiments demonstrate improved robustness, yet provides no quantitative metrics, error bars, or statistical comparison details. Without these, it is impossible to assess whether the topological approach actually outperforms baselines by a margin that justifies replacing metric substrates.

Authors: We appreciate the referee's emphasis on quantitative rigor. The manuscript's experiments section already reports success rates, navigation times, and direct comparisons against SLAM-based and topological baselines under lighting shifts and furniture rearrangement. To address the concern about the abstract and presentation, we have revised the abstract to include key quantitative metrics with error bars and have added explicit statistical significance tests in the experiments section. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper introduces a CROSS representation based on an online pose-aware topological graph of RGB-D keyframes with sequential SE(3) hypothesis testing and a bounded Gaussian-mixture pose belief. No equations, predictions, or first-principles results are shown that reduce by construction to the inputs (e.g., no fitted parameters renamed as predictions, no self-definitional loops, and no load-bearing self-citations). The central claims rest on experimental validation under appearance change rather than tautological redefinitions or imported uniqueness theorems. This is a standard non-circular proposal of a new mapping method.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard robotics assumptions about keyframe-based representation and probabilistic pose estimation; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption A topological graph of RGB-D keyframes can sufficiently capture spatial-semantic knowledge for navigation despite environmental changes.
This underpins the replacement of metric maps and is invoked in the description of the CROSS representation.

pith-pipeline@v0.9.0 · 5479 in / 1227 out tokens · 48787 ms · 2026-05-08T18:56:54.231077+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Classical SE(3) covariance transport; RS's emergent Lorentzian (1,3) signature is conceptually unrelated to estimator covariance propagation reality_from_one_distinction (spacetime emergence) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Σ⁻ = Ad_{ΔT⁻¹} Σ Ad^T_{ΔT⁻¹} + Q_t (first-order pushforward via the adjoint)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review arXiv 2023
[2]

Boq: A place is worth a bag of learnable queries

Amar Ali-Bey, Brahim Chaib-draa, and Philippe Giguere. Boq: A place is worth a bag of learnable queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17794–17803, 2024

work page 2024
[3]

D. L. Alspach and H. W. Sorenson. Nonlinear bayesian estimation using gaussian sum approximations. IEEE Transactions on Automatic Control, 17(4):439– 448, 1972. doi: 10.1109/TAC.1972.1100034

work page doi:10.1109/tac.1972.1100034 1972
[4]

Fast and incremental method for loop-closure detection using bags of visual words.IEEE transactions on robotics, 24(5):1027–1037, 2008

Adrien Angeli, David Filliat, St ´ephane Doncieux, and Jean-Arcady Meyer. Fast and incremental method for loop-closure detection using bags of visual words.IEEE transactions on robotics, 24(5):1027–1037, 2008

work page 2008
[5]

Megaloc: One re- trieval to place them all

Gabriele Berton and Carlo Masone. Megaloc: One re- trieval to place them all. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2861– 2867, 2025

work page 2025
[6]

Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam.IEEE transactions on robotics, 37(6):1874–1890, 2021

Carlos Campos, Richard Elvira, Juan J G ´omez Rodr´ıguez, Jos ´e MM Montiel, and Juan D Tard ´os. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam.IEEE transactions on robotics, 37(6):1874–1890, 2021

work page 2021
[7]

NaVILA: Legged Robot Vision-Language-Action Model for Navigation

Matthew Chang, Theophile Gervet, Mukul Khanna, Sri- ram Yenamandra, Dhruv Shah, So Yeon Min, Kavit Shah, Chris Paxton, Saurabh Gupta, Dhruv Batra, Roozbeh Mottaghi, Jitendra Malik, and Devendra Singh Chaplot. GOAT: GO to any thing. InProceedings of Robotics: Science and Systems (RSS), 2024. doi: 10.15607/RSS. 2024.XX.073

work page doi:10.15607/rss 2024
[8]

Object goal navigation using goal-oriented semantic exploration

Devendra Singh Chaplot, Dhiraj Prakashchand Gandhi, Abhinav Gupta, and Russ R Salakhutdinov. Object goal navigation using goal-oriented semantic exploration. Advances in Neural Information Processing Systems, 33: 4247–4258, 2020

work page 2020
[9]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gem- ini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

work page Pith review arXiv 2025
[10]

Appearance-only slam at large scale with fab-map 2.0.The International Journal of Robotics Research, 30(9):1100–1123, 2011

Mark Cummins and Paul Newman. Appearance-only slam at large scale with fab-map 2.0.The International Journal of Robotics Research, 30(9):1100–1123, 2011. doi: 10.1177/0278364910385483

work page doi:10.1177/0278364910385483 2011
[11]

borglab/gtsam, May 2022

Frank Dellaert and GTSAM Contributors. borglab/gtsam, May 2022. URL https://github.com/borglab/gtsam)

work page 2022
[12]

Figueiredo and A.K

M.A.T. Figueiredo and A.K. Jain. Unsupervised learning of finite mixture models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):381–396, 2002. doi: 10.1109/34.990138

work page doi:10.1109/34.990138 2002
[13]

ConceptGraphs: Open-vocabulary 3D scene graphs for perception and planning

Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Kr- ishna Murthy Jatavallabhula, Aditya Sen, Aditya Agar- wal, Corban Rivera, William Knudson, Erik Sudderth, Oscar Beijbom, et al. ConceptGraphs: Open-vocabulary 3D scene graphs for perception and planning. InProceed- ings of the IEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024
[14]

Hughes, Y

N. Hughes, Y . Chang, and L. Carlone. Hydra: A real-time spatial perception system for 3D scene graph construction and optimization. 2022

work page 2022
[15]

ConceptFusion: Open-set multimodal 3D mapping

Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, et al. ConceptFusion: Open-set multimodal 3D mapping. InProceedings of Robotics: Science and Systems (RSS), 2023

work page 2023
[16]

Appearance- based loop closure detection for online large-scale and long-term operation.IEEE Transactions on Robotics, 29 (3):734–745, 2013

Mathieu Labbe and Francois Michaud. Appearance- based loop closure detection for online large-scale and long-term operation.IEEE Transactions on Robotics, 29 (3):734–745, 2013

work page 2013
[17]

Mathieu Labb ´e and Franc ¸ois Michaud. Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation.Journal of field robotics, 36(2):416–446, 2019

work page 2019
[18]

Ep n p: An accurate o (n) solution to the p n p problem.International journal of computer vision, 81 (2):155–166, 2009

Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. Ep n p: An accurate o (n) solution to the p n p problem.International journal of computer vision, 81 (2):155–166, 2009

work page 2009
[19]

Sgs- slam: Semantic gaussian splatting for neural dense slam

Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, and Hongyu Wang. Sgs- slam: Semantic gaussian splatting for neural dense slam. InEuropean Conference on Computer Vision, pages 163–

work page
[20]

Lightglue: Local feature matching at light speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF international conference on computer vision, pages 17627–17638, 2023

work page 2023
[21]

OK-Robot: What really matters in integrating open- knowledge models for robotics

Peiqi Liu, Yaswanth Orru, Jay Vakil, Chris Paxton, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. OK-Robot: What really matters in integrating open- knowledge models for robotics. InProceedings of Robotics: Science and Systems (RSS), 2024. doi: 10. 15607/RSS.2024.XX.091

work page 2024
[22]

A comprehensive survey of visual slam algorithms.Robotics, 11(1):24, 2022

Andr ´ea Macario Barros, Maugan Michel, Yoann Moline, Gwenol´e Corre, and Fr ´ed´erick Carrel. A comprehensive survey of visual slam algorithms.Robotics, 11(1):24, 2022

work page 2022
[23]

CAT-SLAM: Probabilistic localisation and mapping us- ing a continuous appearance-based trajectory.The In- ternational Journal of Robotics Research (IJRR), 31(4): 429–451, 2012

Will Maddern, Michael Milford, and Gordon Wyeth. CAT-SLAM: Probabilistic localisation and mapping us- ing a continuous appearance-based trajectory.The In- ternational Journal of Robotics Research (IJRR), 31(4): 429–451, 2012. doi: 10.1177/0278364912438273

work page doi:10.1177/0278364912438273 2012
[24]

Scaling local control to large-scale topological navigation

Xiangyun Meng, Nathan Ratliff, Yu Xiang, and Dieter Fox. Scaling local control to large-scale topological navigation. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 672–678. IEEE, 2020

work page 2020
[25]

Mapping a suburb with a single camera using a biologically inspired slam system.IEEE Transactions on Robotics, 24(5): 1038–1053, 2008

Michael J Milford and Gordon F Wyeth. Mapping a suburb with a single camera using a biologically inspired slam system.IEEE Transactions on Robotics, 24(5): 1038–1053, 2008

work page 2008
[26]

Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163, 2015

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163, 2015

work page 2015
[27]

Mast3r-slam: Real-time dense slam with 3d reconstruc- tion priors

Riku Murai, Eric Dexheimer, and Andrew J Davison. Mast3r-slam: Real-time dense slam with 3d reconstruc- tion priors. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16695–16705, 2025

work page 2025
[28]

Xfeat: Accelerated features for lightweight image matching

Guilherme Potje, Felipe Cadar, Andr ´e Araujo, Renato Martins, and Erickson R Nascimento. Xfeat: Accelerated features for lightweight image matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2682–2691, 2024

work page 2024
[29]

Linear n-point camera pose determination.IEEE Transactions on pattern anal- ysis and machine intelligence, 21(8):774–780, 1999

Long Quan and Zhongdan Lan. Linear n-point camera pose determination.IEEE Transactions on pattern anal- ysis and machine intelligence, 21(8):774–780, 1999

work page 1999
[30]

Beyond the Kalman Filter: Particle Filters for Track- ing Applications

Branko Ristic, Sanjeev Arulampalam, and Neil Gordon. Beyond the Kalman Filter: Particle Filters for Track- ing Applications. Artech House Radar Library. Artech House, Boston, London, 2004. ISBN 9781580536318

work page 2004
[31]

Semi-parametric topological memory for nav- igation

Nikolay Savinov, Alexey Dosovitskiy, and Vladlen Koltun. Semi-parametric topological memory for nav- igation. InInternational Conference on Learning Repre- sentations, 2018

work page 2018
[32]

Rover: A multi-season dataset for visual slam.IEEE Transactions on Robotics, 2025

Fabian Schmidt, Julian Daubermann, Marcel Mitschke, Constantin Blessing, Stephan Meyer, Markus Enzweiler, and Abhinav Valada. Rover: A multi-season dataset for visual slam.IEEE Transactions on Robotics, 2025

work page 2025
[33]

Xuesong Shi, Dongjiang Li, Pengpeng Zhao, Qinbin Tian, Yuxin Tian, Qiwei Long, Chunhao Zhu, Jingwei Song, Fei Qiao, Le Song, Yangquan Guo, Zhigang Wang, Yimin Zhang, Baoxing Qin, Wei Yang, Fangshi Wang, Rosa H. M. Chan, and Qi She. Are we ready for ser- vice robots? the OpenLORIS-Scene datasets for lifelong SLAM. In2020 International Conference on Robotic...

work page 2020
[34]

Placenav: Topological navigation through place recognition

Lauri Suomela, Jussi Kalliola, Harry Edelman, and Joni- Kristian K ¨am¨ar¨ainen. Placenav: Topological navigation through place recognition. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 5205–5213. IEEE, 2024

work page 2024
[35]

S Urban, J Leitloff, and S Hinz. Mlpnp–a real-time maximum likelihood solution to the perspective-n-point problem.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 3:131–138, 2016

work page 2016
[36]

Probable object location (polo) score estimation for efficient object goal naviga- tion

Jiaming Wang and Harold Soh. Probable object location (polo) score estimation for efficient object goal naviga- tion. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 5221–5227. IEEE, 2024

work page 2024
[37]

Genie: A generalizable navigation system for in-the-wild envi- ronments.IEEE Robotics and Automation Letters, 2025

Jiaming Wang, Diwen Liu, Jizhuo Chen, Jiaxuan Da, Nuowen Qian, Minh Man Tram, and Harold Soh. Genie: A generalizable navigation system for in-the-wild envi- ronments.IEEE Robotics and Automation Letters, 2025

work page 2025
[38]

Topo-bench: An open-source topological mapping eval- uation framework with quantifiable perceptual aliasing

Jiaming Wang, Diwen Liu, Jizhuo Chen, and Harold Soh. Topo-bench: An open-source topological mapping eval- uation framework with quantifiable perceptual aliasing. arXiv preprint arXiv:2510.04100, 2025

work page arXiv 2025
[39]

Vggt: Visual geometry grounded transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InProceedings of the Computer Vision and Pattern Recognition Confer- ence, pages 5294–5306, 2025

work page 2025
[40]

Hierarchical open-vocabulary 3d scene graphs for language-grounded robot navigation,

Abdelrhman Werby, Chenguang Huang, Martin B ¨uchner, Abhinav Valada, and Wolfram Burgard. Hierarchi- cal Open-V ocabulary 3D Scene Graphs for Language- Grounded Robot Navigation. InProceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024. doi: 10.15607/RSS.2024.XX.077

work page doi:10.15607/rss.2024.xx.077 2024
[41]

Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields

Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, and Achuta Kadambi. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 21676–21685, 2024

work page 2024
[42]

Sni-slam: Semantic neural implicit slam

Siting Zhu, Guangming Wang, Hermann Blum, Jiuming Liu, Liang Song, Marc Pollefeys, and Hesheng Wang. Sni-slam: Semantic neural implicit slam. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21167–21177, 2024. APPENDIXA EXPERIMENTDETAILS A. Topological Localization Baselines This appendix describes the topological...

work page 2024
[43]

If the maximum similarity exceeds a fixed thresholdτ, the corresponding node is selected as the localization result; otherwise, the localization estimate remains unchanged

Greedy Matching (GM):The greedy matching baseline localizes by selecting the node with the highest similarity score to the current observation. If the maximum similarity exceeds a fixed thresholdτ, the corresponding node is selected as the localization result; otherwise, the localization estimate remains unchanged. This baseline reflects a common retrieva...

work page
[44]

A candidate match between nodes(v i, vj)is accepted if the aggregated similarity over a window of size2h+1satisfies f sim(zvi−h, zvj −h),

Sequence Matching (SM):Instead of matching a single observation, sequence matching aggregates similarity scores over a short temporal window to improve robustness against perceptual aliasing and viewpoint variation. A candidate match between nodes(v i, vj)is accepted if the aggregated similarity over a window of size2h+1satisfies f sim(zvi−h, zvj −h), . ....

work page
[45]

Probabilistic Belief Update (PBU):The probabilistic belief update baseline maintains a discrete posterior belief bt(v) =P(v t =v|z 1:t)over the topological nodesv∈ Vat timet. Given the belief at the previous timestep, the state is first propagated via a motion modelP(v t |v t−1)that constrains allowable transitions based on the graph topology: P(v t |v t−...

work page
[46]

The GSF represents the filtering density as a finite mixture p(xt−1 |z 1:t−1, u1:t−2) = Kt−1X k=1 w(k) t−1 N xt−1;µ (k) t−1,Σ (k) t−1 , withw (k) t−1 ≥0and P k w(k) t−1 = 1

Preliminaries and Notation:Letx t ∈R n be the (locally Euclidean) state with motion and measurement models xt =f t(xt−1, ut−1) +q t, q t ∼ N(0, Q t), zt =h t(xt) +r t, r t ∼ N(0, R t), (10) whereQ t, Rt ≻0. The GSF represents the filtering density as a finite mixture p(xt−1 |z 1:t−1, u1:t−2) = Kt−1X k=1 w(k) t−1 N xt−1;µ (k) t−1,Σ (k) t−1 , withw (k) t−1 ...

work page
[47]

a) Prediction (per component).:For eachk= 1,

Exact Linear–Gaussian GSF:Assume linear–Gaussian models: xt =F txt−1 +B tut−1 +q t, z t =H txt +r t. a) Prediction (per component).:For eachk= 1, . . . , Kt−1, µ(k) t|t−1 =F tµ(k) t−1 +B tut−1, Σ(k) t|t−1 =F tΣ(k) t−1F ⊤ t +Q t, w(k) t|t−1 =w (k) t−1. (11) Thusp(x t |z 1:t−1, u1:t−1) =P k w(k) t|t−1N(x t;µ (k) t|t−1,Σ (k) t|t−1). b) Update (per component ...

work page
[48]

Nonlinear GSF via Local Gaussianization:For nonlin- ear (10), GSF applies a local Gaussian filter to each compo- nent. a) EKF-style (per component).:Linearize around the current component mean: ft(x, u)≈f t(µ(k) t−1, u) +F (k) t (x−µ (k) t−1), ht(x)≈h t(µ(k) t|t−1) +H (k) t (x−µ (k) t|t−1), whereF (k) t , H(k) t are Jacobians. Then apply (11)–(14) with (F...

work page
[49]

a) Product of Gaussians.: N1(x)N2(x) =N(m 1;m 2, S1+S2)N(x;m, S),(15) whereS= (S −1 1 +S −1 2 )−1 andm=S(S −1 1 m1 +S −1 2 m2)

Mixture Identities:LetN i(x) =N(x;m i, Si)fori∈ {1,2}. a) Product of Gaussians.: N1(x)N2(x) =N(m 1;m 2, S1+S2)N(x;m, S),(15) whereS= (S −1 1 +S −1 2 )−1 andm=S(S −1 1 m1 +S −1 2 m2). b) Innovation evidence.:For predicted(µ −,Σ −)and measurementz=Hx+r,r∼ N(0, R), the innovation y=z−Hµ − satisfiesy∼ N(0, S)withS=HΣ −H ⊤ +R, yielding the evidence term in (14)

work page
[50]

b) Reduction / merging.:Iteratively merge nearby com- ponents (e.g., using a KL-based criterion) untilK t ≤K max

Mixture Growth Control:To prevent unbounded mixture growth, GSF typically uses: a) Pruning.:Remove components withw (k) t < ε. b) Reduction / merging.:Iteratively merge nearby com- ponents (e.g., using a KL-based criterion) untilK t ≤K max. Merging two components with weightsa, bby moment match- ing gives µ= aµa +bµ b a+b ,(16) Σ = a Σa + (µa −µ)(µ a −µ) ...

work page
[51]

In practice, gating and sparsification reduce theK×C t expansion

Mixture–Mixture Update (Optional):If the measurement factor is approximated by a mixture Qt(x) = PCt c=1 π(c) t N(x;ν (c) t ,Λ (c) t ), then the update is a mixture–mixture product: p(xt |z 1:t)∝p −(xt)Q t(xt), with p−(xt) = X k w(k) t|t−1N xt;µ (k) t|t−1,Σ (k) t|t−1 , and p(xt |z 1:t) = Kt−1X k=1 CtX c=1 ˜wk,c N(x t;m k,c, Sk,c).(17) Here(m k,c, Sk,c)fol...

work page
[52]

During prediction, covariances are transported through group composition using the ap- propriate adjoint (first-order), yielding the manifold GSF expressions used in the main text

Manifold Adaptation (Lie Groups):On a Lie groupX (e.g.,SE(3)), represent each mixture component as a Gaussian in a consistent tangent chartϕ(·), apply the Euclidean GSF updates to ξt =ϕ(x t), and reconstruct means viaexp(·). During prediction, covariances are transported through group composition using the ap- propriate adjoint (first-order), yielding the...

work page
[53]

2)Update:apply (12)–(14) (or (17) for mixture likelihoods)

One-Step GSF Summary:Given{w (k) t−1, µ(k) t−1,Σ (k) t−1} Kt−1 k=1 : leftmargin=1.2em,itemsep=2pt 1)Predict:propagate each component (linear (11), or EKF/UKF/CKF per component). 2)Update:apply (12)–(14) (or (17) for mixture likelihoods). 3)Control:prune/reduce (and optionally split) to enforceK t ≤ Kmax. On manifolds, perform all steps in the chosen chart...

work page
[54]

The (right-invariant) stochastic motion model is Xt =X t−1 ∆Tt exp(νt), ν t ∼ N(0, Q t)⊂se(3), withν t independent ofε

Setup:LetX t−1 ∈SE(3)be distributed asX t−1 =µexp(ε) withε∼ N(0,Σ)⊂se(3). The (right-invariant) stochastic motion model is Xt =X t−1 ∆Tt exp(νt), ν t ∼ N(0, Q t)⊂se(3), withν t independent ofε

work page
[55]

Prediction Kernel:For a single mixand, the predicted density is ¯p(xt) = Z p(xt |x t−1)N se(3) log(µ−1xt−1); 0,Σ dxt−1

work page
[56]

Using the group adjoint and BCH, µexp(ε) ∆Tt =µ∆T t exp Ad∆T −1 t ε+O(∥ε∥ 2)

First-Order Pushforward:WriteX t−1 =µexp(ε). Using the group adjoint and BCH, µexp(ε) ∆Tt =µ∆T t exp Ad∆T −1 t ε+O(∥ε∥ 2) . Post-multiplying byexp(ν t)and applying BCH again yields µ∆Tt exp Ad∆T −1 t ε exp(νt) =µ∆T t exp Ad∆T −1 t ε +ν t +O(∥ε∥ 2 +∥ν t∥2) . Neglecting higher-order terms, the updated error in the right- invariant chart at the predicted mea...

work page
[57]

Mixtures and Weights:Since R p(xt |x t−1)dx t = 1, prediction preserves mixture weights: ifp(x) = P k wkpk(x)then ¯p(x) =P k wk ¯pk(x)

work page
[58]

Small-Increment Approximation:If∆T t = exp(ξ t)with ∥ξt∥ ≪1, then Ad∆T −1 t =I−ad(ξ t) +O(∥ξ t∥2), and the transported covariance expands as Ad∆T −1 t ΣAd⊤ ∆T −1 t = Σ−ad(ξ t)Σ−Σ ad(ξ t)⊤ +O(∥ξ t∥2∥Σ∥). At high update rates (small∥ξ t∥) and when covariances are main- tained in the updated right-invariant chart, a common conservative approximation is Σ− ≈Σ...

work page

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review arXiv 2023

[2] [2]

Boq: A place is worth a bag of learnable queries

Amar Ali-Bey, Brahim Chaib-draa, and Philippe Giguere. Boq: A place is worth a bag of learnable queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17794–17803, 2024

work page 2024

[3] [3]

D. L. Alspach and H. W. Sorenson. Nonlinear bayesian estimation using gaussian sum approximations. IEEE Transactions on Automatic Control, 17(4):439– 448, 1972. doi: 10.1109/TAC.1972.1100034

work page doi:10.1109/tac.1972.1100034 1972

[4] [4]

Fast and incremental method for loop-closure detection using bags of visual words.IEEE transactions on robotics, 24(5):1027–1037, 2008

Adrien Angeli, David Filliat, St ´ephane Doncieux, and Jean-Arcady Meyer. Fast and incremental method for loop-closure detection using bags of visual words.IEEE transactions on robotics, 24(5):1027–1037, 2008

work page 2008

[5] [5]

Megaloc: One re- trieval to place them all

Gabriele Berton and Carlo Masone. Megaloc: One re- trieval to place them all. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2861– 2867, 2025

work page 2025

[6] [6]

Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam.IEEE transactions on robotics, 37(6):1874–1890, 2021

Carlos Campos, Richard Elvira, Juan J G ´omez Rodr´ıguez, Jos ´e MM Montiel, and Juan D Tard ´os. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam.IEEE transactions on robotics, 37(6):1874–1890, 2021

work page 2021

[7] [7]

NaVILA: Legged Robot Vision-Language-Action Model for Navigation

Matthew Chang, Theophile Gervet, Mukul Khanna, Sri- ram Yenamandra, Dhruv Shah, So Yeon Min, Kavit Shah, Chris Paxton, Saurabh Gupta, Dhruv Batra, Roozbeh Mottaghi, Jitendra Malik, and Devendra Singh Chaplot. GOAT: GO to any thing. InProceedings of Robotics: Science and Systems (RSS), 2024. doi: 10.15607/RSS. 2024.XX.073

work page doi:10.15607/rss 2024

[8] [8]

Object goal navigation using goal-oriented semantic exploration

Devendra Singh Chaplot, Dhiraj Prakashchand Gandhi, Abhinav Gupta, and Russ R Salakhutdinov. Object goal navigation using goal-oriented semantic exploration. Advances in Neural Information Processing Systems, 33: 4247–4258, 2020

work page 2020

[9] [9]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gem- ini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

work page Pith review arXiv 2025

[10] [10]

Appearance-only slam at large scale with fab-map 2.0.The International Journal of Robotics Research, 30(9):1100–1123, 2011

Mark Cummins and Paul Newman. Appearance-only slam at large scale with fab-map 2.0.The International Journal of Robotics Research, 30(9):1100–1123, 2011. doi: 10.1177/0278364910385483

work page doi:10.1177/0278364910385483 2011

[11] [11]

borglab/gtsam, May 2022

Frank Dellaert and GTSAM Contributors. borglab/gtsam, May 2022. URL https://github.com/borglab/gtsam)

work page 2022

[12] [12]

Figueiredo and A.K

M.A.T. Figueiredo and A.K. Jain. Unsupervised learning of finite mixture models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):381–396, 2002. doi: 10.1109/34.990138

work page doi:10.1109/34.990138 2002

[13] [13]

ConceptGraphs: Open-vocabulary 3D scene graphs for perception and planning

Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Kr- ishna Murthy Jatavallabhula, Aditya Sen, Aditya Agar- wal, Corban Rivera, William Knudson, Erik Sudderth, Oscar Beijbom, et al. ConceptGraphs: Open-vocabulary 3D scene graphs for perception and planning. InProceed- ings of the IEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024

[14] [14]

Hughes, Y

N. Hughes, Y . Chang, and L. Carlone. Hydra: A real-time spatial perception system for 3D scene graph construction and optimization. 2022

work page 2022

[15] [15]

ConceptFusion: Open-set multimodal 3D mapping

Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, et al. ConceptFusion: Open-set multimodal 3D mapping. InProceedings of Robotics: Science and Systems (RSS), 2023

work page 2023

[16] [16]

Appearance- based loop closure detection for online large-scale and long-term operation.IEEE Transactions on Robotics, 29 (3):734–745, 2013

Mathieu Labbe and Francois Michaud. Appearance- based loop closure detection for online large-scale and long-term operation.IEEE Transactions on Robotics, 29 (3):734–745, 2013

work page 2013

[17] [17]

Mathieu Labb ´e and Franc ¸ois Michaud. Rtab-map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation.Journal of field robotics, 36(2):416–446, 2019

work page 2019

[18] [18]

Ep n p: An accurate o (n) solution to the p n p problem.International journal of computer vision, 81 (2):155–166, 2009

Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. Ep n p: An accurate o (n) solution to the p n p problem.International journal of computer vision, 81 (2):155–166, 2009

work page 2009

[19] [19]

Sgs- slam: Semantic gaussian splatting for neural dense slam

Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, and Hongyu Wang. Sgs- slam: Semantic gaussian splatting for neural dense slam. InEuropean Conference on Computer Vision, pages 163–

work page

[20] [20]

Lightglue: Local feature matching at light speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF international conference on computer vision, pages 17627–17638, 2023

work page 2023

[21] [21]

OK-Robot: What really matters in integrating open- knowledge models for robotics

Peiqi Liu, Yaswanth Orru, Jay Vakil, Chris Paxton, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. OK-Robot: What really matters in integrating open- knowledge models for robotics. InProceedings of Robotics: Science and Systems (RSS), 2024. doi: 10. 15607/RSS.2024.XX.091

work page 2024

[22] [22]

A comprehensive survey of visual slam algorithms.Robotics, 11(1):24, 2022

Andr ´ea Macario Barros, Maugan Michel, Yoann Moline, Gwenol´e Corre, and Fr ´ed´erick Carrel. A comprehensive survey of visual slam algorithms.Robotics, 11(1):24, 2022

work page 2022

[23] [23]

CAT-SLAM: Probabilistic localisation and mapping us- ing a continuous appearance-based trajectory.The In- ternational Journal of Robotics Research (IJRR), 31(4): 429–451, 2012

Will Maddern, Michael Milford, and Gordon Wyeth. CAT-SLAM: Probabilistic localisation and mapping us- ing a continuous appearance-based trajectory.The In- ternational Journal of Robotics Research (IJRR), 31(4): 429–451, 2012. doi: 10.1177/0278364912438273

work page doi:10.1177/0278364912438273 2012

[24] [24]

Scaling local control to large-scale topological navigation

Xiangyun Meng, Nathan Ratliff, Yu Xiang, and Dieter Fox. Scaling local control to large-scale topological navigation. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 672–678. IEEE, 2020

work page 2020

[25] [25]

Mapping a suburb with a single camera using a biologically inspired slam system.IEEE Transactions on Robotics, 24(5): 1038–1053, 2008

Michael J Milford and Gordon F Wyeth. Mapping a suburb with a single camera using a biologically inspired slam system.IEEE Transactions on Robotics, 24(5): 1038–1053, 2008

work page 2008

[26] [26]

Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163, 2015

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163, 2015

work page 2015

[27] [27]

Mast3r-slam: Real-time dense slam with 3d reconstruc- tion priors

Riku Murai, Eric Dexheimer, and Andrew J Davison. Mast3r-slam: Real-time dense slam with 3d reconstruc- tion priors. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16695–16705, 2025

work page 2025

[28] [28]

Xfeat: Accelerated features for lightweight image matching

Guilherme Potje, Felipe Cadar, Andr ´e Araujo, Renato Martins, and Erickson R Nascimento. Xfeat: Accelerated features for lightweight image matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2682–2691, 2024

work page 2024

[29] [29]

Linear n-point camera pose determination.IEEE Transactions on pattern anal- ysis and machine intelligence, 21(8):774–780, 1999

Long Quan and Zhongdan Lan. Linear n-point camera pose determination.IEEE Transactions on pattern anal- ysis and machine intelligence, 21(8):774–780, 1999

work page 1999

[30] [30]

Beyond the Kalman Filter: Particle Filters for Track- ing Applications

Branko Ristic, Sanjeev Arulampalam, and Neil Gordon. Beyond the Kalman Filter: Particle Filters for Track- ing Applications. Artech House Radar Library. Artech House, Boston, London, 2004. ISBN 9781580536318

work page 2004

[31] [31]

Semi-parametric topological memory for nav- igation

Nikolay Savinov, Alexey Dosovitskiy, and Vladlen Koltun. Semi-parametric topological memory for nav- igation. InInternational Conference on Learning Repre- sentations, 2018

work page 2018

[32] [32]

Rover: A multi-season dataset for visual slam.IEEE Transactions on Robotics, 2025

Fabian Schmidt, Julian Daubermann, Marcel Mitschke, Constantin Blessing, Stephan Meyer, Markus Enzweiler, and Abhinav Valada. Rover: A multi-season dataset for visual slam.IEEE Transactions on Robotics, 2025

work page 2025

[33] [33]

Xuesong Shi, Dongjiang Li, Pengpeng Zhao, Qinbin Tian, Yuxin Tian, Qiwei Long, Chunhao Zhu, Jingwei Song, Fei Qiao, Le Song, Yangquan Guo, Zhigang Wang, Yimin Zhang, Baoxing Qin, Wei Yang, Fangshi Wang, Rosa H. M. Chan, and Qi She. Are we ready for ser- vice robots? the OpenLORIS-Scene datasets for lifelong SLAM. In2020 International Conference on Robotic...

work page 2020

[34] [34]

Placenav: Topological navigation through place recognition

Lauri Suomela, Jussi Kalliola, Harry Edelman, and Joni- Kristian K ¨am¨ar¨ainen. Placenav: Topological navigation through place recognition. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 5205–5213. IEEE, 2024

work page 2024

[35] [35]

S Urban, J Leitloff, and S Hinz. Mlpnp–a real-time maximum likelihood solution to the perspective-n-point problem.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 3:131–138, 2016

work page 2016

[36] [36]

Probable object location (polo) score estimation for efficient object goal naviga- tion

Jiaming Wang and Harold Soh. Probable object location (polo) score estimation for efficient object goal naviga- tion. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 5221–5227. IEEE, 2024

work page 2024

[37] [37]

Genie: A generalizable navigation system for in-the-wild envi- ronments.IEEE Robotics and Automation Letters, 2025

Jiaming Wang, Diwen Liu, Jizhuo Chen, Jiaxuan Da, Nuowen Qian, Minh Man Tram, and Harold Soh. Genie: A generalizable navigation system for in-the-wild envi- ronments.IEEE Robotics and Automation Letters, 2025

work page 2025

[38] [38]

Topo-bench: An open-source topological mapping eval- uation framework with quantifiable perceptual aliasing

Jiaming Wang, Diwen Liu, Jizhuo Chen, and Harold Soh. Topo-bench: An open-source topological mapping eval- uation framework with quantifiable perceptual aliasing. arXiv preprint arXiv:2510.04100, 2025

work page arXiv 2025

[39] [39]

Vggt: Visual geometry grounded transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InProceedings of the Computer Vision and Pattern Recognition Confer- ence, pages 5294–5306, 2025

work page 2025

[40] [40]

Hierarchical open-vocabulary 3d scene graphs for language-grounded robot navigation,

Abdelrhman Werby, Chenguang Huang, Martin B ¨uchner, Abhinav Valada, and Wolfram Burgard. Hierarchi- cal Open-V ocabulary 3D Scene Graphs for Language- Grounded Robot Navigation. InProceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024. doi: 10.15607/RSS.2024.XX.077

work page doi:10.15607/rss.2024.xx.077 2024

[41] [41]

Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields

Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, and Achuta Kadambi. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 21676–21685, 2024

work page 2024

[42] [42]

Sni-slam: Semantic neural implicit slam

Siting Zhu, Guangming Wang, Hermann Blum, Jiuming Liu, Liang Song, Marc Pollefeys, and Hesheng Wang. Sni-slam: Semantic neural implicit slam. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21167–21177, 2024. APPENDIXA EXPERIMENTDETAILS A. Topological Localization Baselines This appendix describes the topological...

work page 2024

[43] [43]

If the maximum similarity exceeds a fixed thresholdτ, the corresponding node is selected as the localization result; otherwise, the localization estimate remains unchanged

Greedy Matching (GM):The greedy matching baseline localizes by selecting the node with the highest similarity score to the current observation. If the maximum similarity exceeds a fixed thresholdτ, the corresponding node is selected as the localization result; otherwise, the localization estimate remains unchanged. This baseline reflects a common retrieva...

work page

[44] [44]

A candidate match between nodes(v i, vj)is accepted if the aggregated similarity over a window of size2h+1satisfies f sim(zvi−h, zvj −h),

Sequence Matching (SM):Instead of matching a single observation, sequence matching aggregates similarity scores over a short temporal window to improve robustness against perceptual aliasing and viewpoint variation. A candidate match between nodes(v i, vj)is accepted if the aggregated similarity over a window of size2h+1satisfies f sim(zvi−h, zvj −h), . ....

work page

[45] [45]

Probabilistic Belief Update (PBU):The probabilistic belief update baseline maintains a discrete posterior belief bt(v) =P(v t =v|z 1:t)over the topological nodesv∈ Vat timet. Given the belief at the previous timestep, the state is first propagated via a motion modelP(v t |v t−1)that constrains allowable transitions based on the graph topology: P(v t |v t−...

work page

[46] [46]

The GSF represents the filtering density as a finite mixture p(xt−1 |z 1:t−1, u1:t−2) = Kt−1X k=1 w(k) t−1 N xt−1;µ (k) t−1,Σ (k) t−1 , withw (k) t−1 ≥0and P k w(k) t−1 = 1

Preliminaries and Notation:Letx t ∈R n be the (locally Euclidean) state with motion and measurement models xt =f t(xt−1, ut−1) +q t, q t ∼ N(0, Q t), zt =h t(xt) +r t, r t ∼ N(0, R t), (10) whereQ t, Rt ≻0. The GSF represents the filtering density as a finite mixture p(xt−1 |z 1:t−1, u1:t−2) = Kt−1X k=1 w(k) t−1 N xt−1;µ (k) t−1,Σ (k) t−1 , withw (k) t−1 ...

work page

[47] [47]

a) Prediction (per component).:For eachk= 1,

Exact Linear–Gaussian GSF:Assume linear–Gaussian models: xt =F txt−1 +B tut−1 +q t, z t =H txt +r t. a) Prediction (per component).:For eachk= 1, . . . , Kt−1, µ(k) t|t−1 =F tµ(k) t−1 +B tut−1, Σ(k) t|t−1 =F tΣ(k) t−1F ⊤ t +Q t, w(k) t|t−1 =w (k) t−1. (11) Thusp(x t |z 1:t−1, u1:t−1) =P k w(k) t|t−1N(x t;µ (k) t|t−1,Σ (k) t|t−1). b) Update (per component ...

work page

[48] [48]

Nonlinear GSF via Local Gaussianization:For nonlin- ear (10), GSF applies a local Gaussian filter to each compo- nent. a) EKF-style (per component).:Linearize around the current component mean: ft(x, u)≈f t(µ(k) t−1, u) +F (k) t (x−µ (k) t−1), ht(x)≈h t(µ(k) t|t−1) +H (k) t (x−µ (k) t|t−1), whereF (k) t , H(k) t are Jacobians. Then apply (11)–(14) with (F...

work page

[49] [49]

a) Product of Gaussians.: N1(x)N2(x) =N(m 1;m 2, S1+S2)N(x;m, S),(15) whereS= (S −1 1 +S −1 2 )−1 andm=S(S −1 1 m1 +S −1 2 m2)

Mixture Identities:LetN i(x) =N(x;m i, Si)fori∈ {1,2}. a) Product of Gaussians.: N1(x)N2(x) =N(m 1;m 2, S1+S2)N(x;m, S),(15) whereS= (S −1 1 +S −1 2 )−1 andm=S(S −1 1 m1 +S −1 2 m2). b) Innovation evidence.:For predicted(µ −,Σ −)and measurementz=Hx+r,r∼ N(0, R), the innovation y=z−Hµ − satisfiesy∼ N(0, S)withS=HΣ −H ⊤ +R, yielding the evidence term in (14)

work page

[50] [50]

b) Reduction / merging.:Iteratively merge nearby com- ponents (e.g., using a KL-based criterion) untilK t ≤K max

Mixture Growth Control:To prevent unbounded mixture growth, GSF typically uses: a) Pruning.:Remove components withw (k) t < ε. b) Reduction / merging.:Iteratively merge nearby com- ponents (e.g., using a KL-based criterion) untilK t ≤K max. Merging two components with weightsa, bby moment match- ing gives µ= aµa +bµ b a+b ,(16) Σ = a Σa + (µa −µ)(µ a −µ) ...

work page

[51] [51]

In practice, gating and sparsification reduce theK×C t expansion

Mixture–Mixture Update (Optional):If the measurement factor is approximated by a mixture Qt(x) = PCt c=1 π(c) t N(x;ν (c) t ,Λ (c) t ), then the update is a mixture–mixture product: p(xt |z 1:t)∝p −(xt)Q t(xt), with p−(xt) = X k w(k) t|t−1N xt;µ (k) t|t−1,Σ (k) t|t−1 , and p(xt |z 1:t) = Kt−1X k=1 CtX c=1 ˜wk,c N(x t;m k,c, Sk,c).(17) Here(m k,c, Sk,c)fol...

work page

[52] [52]

During prediction, covariances are transported through group composition using the ap- propriate adjoint (first-order), yielding the manifold GSF expressions used in the main text

Manifold Adaptation (Lie Groups):On a Lie groupX (e.g.,SE(3)), represent each mixture component as a Gaussian in a consistent tangent chartϕ(·), apply the Euclidean GSF updates to ξt =ϕ(x t), and reconstruct means viaexp(·). During prediction, covariances are transported through group composition using the ap- propriate adjoint (first-order), yielding the...

work page

[53] [53]

2)Update:apply (12)–(14) (or (17) for mixture likelihoods)

One-Step GSF Summary:Given{w (k) t−1, µ(k) t−1,Σ (k) t−1} Kt−1 k=1 : leftmargin=1.2em,itemsep=2pt 1)Predict:propagate each component (linear (11), or EKF/UKF/CKF per component). 2)Update:apply (12)–(14) (or (17) for mixture likelihoods). 3)Control:prune/reduce (and optionally split) to enforceK t ≤ Kmax. On manifolds, perform all steps in the chosen chart...

work page

[54] [54]

The (right-invariant) stochastic motion model is Xt =X t−1 ∆Tt exp(νt), ν t ∼ N(0, Q t)⊂se(3), withν t independent ofε

Setup:LetX t−1 ∈SE(3)be distributed asX t−1 =µexp(ε) withε∼ N(0,Σ)⊂se(3). The (right-invariant) stochastic motion model is Xt =X t−1 ∆Tt exp(νt), ν t ∼ N(0, Q t)⊂se(3), withν t independent ofε

work page

[55] [55]

Prediction Kernel:For a single mixand, the predicted density is ¯p(xt) = Z p(xt |x t−1)N se(3) log(µ−1xt−1); 0,Σ dxt−1

work page

[56] [56]

Using the group adjoint and BCH, µexp(ε) ∆Tt =µ∆T t exp Ad∆T −1 t ε+O(∥ε∥ 2)

First-Order Pushforward:WriteX t−1 =µexp(ε). Using the group adjoint and BCH, µexp(ε) ∆Tt =µ∆T t exp Ad∆T −1 t ε+O(∥ε∥ 2) . Post-multiplying byexp(ν t)and applying BCH again yields µ∆Tt exp Ad∆T −1 t ε exp(νt) =µ∆T t exp Ad∆T −1 t ε +ν t +O(∥ε∥ 2 +∥ν t∥2) . Neglecting higher-order terms, the updated error in the right- invariant chart at the predicted mea...

work page

[57] [57]

Mixtures and Weights:Since R p(xt |x t−1)dx t = 1, prediction preserves mixture weights: ifp(x) = P k wkpk(x)then ¯p(x) =P k wk ¯pk(x)

work page

[58] [58]

Small-Increment Approximation:If∆T t = exp(ξ t)with ∥ξt∥ ≪1, then Ad∆T −1 t =I−ad(ξ t) +O(∥ξ t∥2), and the transported covariance expands as Ad∆T −1 t ΣAd⊤ ∆T −1 t = Σ−ad(ξ t)Σ−Σ ad(ξ t)⊤ +O(∥ξ t∥2∥Σ∥). At high update rates (small∥ξ t∥) and when covariances are main- tained in the updated right-invariant chart, a common conservative approximation is Σ− ≈Σ...

work page