pith. sign in

arxiv: 2604.24707 · v1 · submitted 2026-04-27 · 💻 cs.RO

Passage-Aware Structural Mapping for RGB-D Visual SLAM

Pith reviewed 2026-05-08 02:41 UTC · model grok-4.3

classification 💻 cs.RO
keywords visual SLAMRGB-Ddoor detectionstructural mappingscene graphindoor navigationpassage detectiontraversable openings
0
0 comments X

The pith

A method detects doors and traversable openings in RGB-D visual SLAM by fusing geometric, semantic, and topological cues before adding them to scene graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to improve visual SLAM for indoor robots by adding explicit detection of doors and passages, elements that current systems treat as ordinary obstacles or ignore. It models doors as planar surfaces inside walls and classifies them as traversable or blocked according to whether they lie in the same plane as the wall. Passages are then located by combining two signals: repeated camera-wall contact across frames and gaps in the reconstructed wall surface. The detections are inserted into the vS-Graphs scene graph so that rooms are connected through explicit passage nodes rather than inferred walls. A reader would care because robots that know where doors actually exist can plan shorter, safer routes and avoid treating every opening as a potential collision zone.

Core claim

Doors are modeled as planar entities embedded within walls and classified as traversable or non-traversable according to coplanarity with the supporting wall; passages are obtained by accumulating traversal evidence from camera-wall interactions across keyframes together with geometric opening validation from discontinuities in the mapped wall geometry. These detections are produced by jointly fusing geometric, semantic, and topological cues and are inserted into vS-Graphs, enriching its scene graph with passage-level abstractions and thereby improving the representation of room connectivity, as shown by qualitative results on indoor office sequences.

What carries the argument

The passage detection module that fuses cues to model doors as planar wall-embedded entities classified by coplanarity and to validate openings from interaction evidence plus geometric discontinuities.

If this is right

  • Room connectivity is represented explicitly through passage nodes in the scene graph rather than inferred from walls alone.
  • Qualitative tests on office sequences confirm reliable doorway detection under the fused-cue approach.
  • The enriched graph supplies a concrete foundation for later BIM-informed extensions of VSLAM.
  • Structural mapping gains passage-level abstractions that directly support indoor robot navigation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Robots using the enriched graph could treat detected passages as preferred routes during path planning, reducing unnecessary detours around walls.
  • The same passage nodes might serve as natural landmarks for loop closure in larger environments where rooms are revisited through the same openings.
  • Adding passage information could make the SLAM map more useful for downstream tasks such as semantic labeling of navigable space.

Load-bearing premise

Doors are planar and their traversability can be determined from coplanarity with walls, while passages can be reliably inferred from camera-wall interactions and geometric discontinuities.

What would settle it

Indoor sequences containing known doors that the system either misses entirely or mislabels as non-traversable would show the fused-cue inference is not reliable.

Figures

Figures reproduced from arXiv: 2604.24707 by Ali Tourani, Asier Bikandi-Noya, David P\'erez Saura, Holger Voos, Jose Luis Sanchez-Lopez, Miguel Fernandez-Cortizas, Saad Ejaz.

Figure 1
Figure 1. Figure 1: Overview of door and passage mapping within vS view at source ↗
Figure 2
Figure 2. Figure 2: Examples of geometric openings detected as gaps view at source ↗
Figure 3
Figure 3. Figure 3: System architecture showing the integration of passage view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results of the proposed passage detection view at source ↗
Figure 5
Figure 5. Figure 5: Potential integration of the proposed passage detection view at source ↗
read the original abstract

Doorways and passages are critical structural elements for indoor robot navigation, yet they remain underexplored in modern Visual SLAM (VSLAM) frameworks. This paper presents a passage-aware structural mapping approach for RGB-D VSLAM that detects doors and traversable openings by jointly fusing geometric, semantic, and topological cues. Doors are modeled as planar entities embedded within walls and classified as traversable or non-traversable based on their coplanarity with the supporting wall. Passages are inferred through two complementary strategies: traversal evidence accumulated from camera-wall interactions across consecutive keyframes, and geometric opening validation based on discontinuities in the mapped wall geometry. The proposed method is integrated into vS-Graphs as a proof of concept, enriching its scene graph with passage-level abstractions and improving room connectivity modeling. Qualitative evaluations on indoor office sequences demonstrate reliable doorway detection, and the framework lays the foundation for exploiting these elements in BIM-informed VSLAM. The source code is publicly available at https://github.com/snt-arg/visual_sgraphs/tree/doorway_integration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents a passage-aware structural mapping approach for RGB-D Visual SLAM. Doors and traversable openings are detected by jointly fusing geometric, semantic, and topological cues. Doors are modeled as planar entities embedded in walls and classified as traversable or non-traversable based on coplanarity with the supporting wall plane. Passages are inferred via two strategies: accumulated traversal evidence from camera-wall interactions across keyframes, and geometric opening validation from discontinuities in the mapped wall geometry. The method is integrated into vS-Graphs as a proof of concept to enrich its scene graph with passage-level abstractions and improve room connectivity modeling. Qualitative evaluations on indoor office sequences are reported to demonstrate reliable doorway detection, with the source code made publicly available.

Significance. If validated, the approach could advance indoor VSLAM by incorporating structural abstractions such as doors and passages into scene graphs, aiding navigation and connectivity modeling in structured environments. The public release of the source code is a clear strength that supports reproducibility and extension toward BIM-informed systems. However, the current evaluation provides limited evidence for assessing real-world robustness.

major comments (2)
  1. [Abstract] Abstract and evaluation: The central claim of 'reliable doorway detection' and improved room connectivity rests on qualitative success on office sequences, but no quantitative metrics (e.g., precision/recall for door detection, trajectory error impact, or room connectivity accuracy), ablation studies, baseline comparisons, or failure-case analysis are provided. This leaves the robustness of coplanarity-based classification and the two passage-inference strategies untested against common RGB-D issues such as wall-plane estimation noise or partial door openings.
  2. [Method] Method description: The traversability classification assumes doors are reliably planar and that coplanarity with the wall plane directly determines traversability, while passages are inferred from camera-wall interactions and geometric discontinuities. No sensitivity analysis or validation against mapping inaccuracies, dynamic door states, or trajectories with limited wall proximity is shown, making these steps load-bearing assumptions for the integration claims.
minor comments (1)
  1. The abstract states that the framework 'lays the foundation for exploiting these elements in BIM-informed VSLAM,' but the manuscript provides no concrete discussion or examples of how the enriched scene graph would interface with BIM data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point-by-point below, providing clarifications on the proof-of-concept scope while agreeing to strengthen the discussion of limitations and assumptions in a revised version.

read point-by-point responses
  1. Referee: [Abstract] Abstract and evaluation: The central claim of 'reliable doorway detection' and improved room connectivity rests on qualitative success on office sequences, but no quantitative metrics (e.g., precision/recall for door detection, trajectory error impact, or room connectivity accuracy), ablation studies, baseline comparisons, or failure-case analysis are provided. This leaves the robustness of coplanarity-based classification and the two passage-inference strategies untested against common RGB-D issues such as wall-plane estimation noise or partial door openings.

    Authors: We acknowledge that the evaluation is primarily qualitative, consistent with the paper's positioning as a proof-of-concept for integrating passage abstractions into vS-Graphs rather than a full benchmark. The reported results across indoor office sequences illustrate the joint geometric-semantic-topological fusion and the two inference strategies in practice. We agree that quantitative metrics, ablations, and failure analysis would provide stronger validation of robustness to plane noise and partial openings. In the revised manuscript we will add an explicit limitations section discussing these aspects, the absence of such metrics, and directions for future quantitative evaluation, while retaining the current qualitative demonstrations and open-source code as the core contribution. revision: partial

  2. Referee: [Method] Method description: The traversability classification assumes doors are reliably planar and that coplanarity with the wall plane directly determines traversability, while passages are inferred from camera-wall interactions and geometric discontinuities. No sensitivity analysis or validation against mapping inaccuracies, dynamic door states, or trajectories with limited wall proximity is shown, making these steps load-bearing assumptions for the integration claims.

    Authors: The coplanarity criterion and dual inference strategies (traversal evidence and geometric discontinuities) are presented as complementary mechanisms suited to structured indoor settings where doors are typically planar and wall-aligned. The method description already notes the reliance on these cues, with the two strategies intended to cross-validate against individual failures such as noisy plane estimates. We agree that further elaboration on sensitivity to mapping inaccuracies, dynamic states, and limited wall proximity would clarify the assumptions. The revised manuscript will expand the method section with additional discussion of these considerations and qualitative examples of edge cases, without introducing new experiments beyond the original proof-of-concept scope. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper describes an algorithmic extension to vS-Graphs for detecting doors and passages via fusion of geometric coplanarity checks, semantic cues, and topological evidence from camera-wall interactions. No equations, fitted parameters, or first-principles derivations are presented that reduce to the inputs by construction. The detection rules are stated as direct applications of observable scene properties rather than self-referential definitions or renamed empirical fits. Integration into an existing framework is presented as a proof-of-concept without invoking uniqueness theorems or load-bearing self-citations that would collapse the central claim.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on domain assumptions about planar walls and coplanarity for door classification plus observable traversal evidence; no free parameters or new invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Walls and doors can be modeled as planar entities
    Stated in the door modeling description; used to classify traversability via coplanarity.
  • domain assumption Camera-wall interactions across keyframes provide reliable traversal evidence
    One of the two complementary strategies for inferring passages.

pith-pipeline@v0.9.0 · 5508 in / 1227 out tokens · 39804 ms · 2026-05-08T02:41:07.448695+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    A Comprehensive Survey of Visual SLAM Algorithms,

    A. Macario Barros, M. Michel, Y . Moline, G. Corre, and F. Carrel. “A Comprehensive Survey of Visual SLAM Algorithms,” Robotics, vol. 11, no. 1, p. 24, 2022. https://doi.org/10.3390/robotics11010024

  2. [2]

    From SLAM to Situational Awareness: Challenges and Survey,

    H. Bavle, J.L. Sanchez-Lopez, C. Cimarelli, A. Tourani, and H. V oos, “From SLAM to Situational Awareness: Challenges and Survey,” Sen- sors, vol. 23, no. 10, p. 4849, 2023. https://doi.org/10.3390/s23104849

  3. [3]

    Visual SLAM: What are the Current Trends and What to Expect?

    A. Tourani, H. Bavle, J.L. Sanchez-Lopez, and H. V oos, “Visual SLAM: What are the Current Trends and What to Expect?” Sensors, vol. 22, no. 23, p. 9297, 2022. https://doi.org/10.3390/s22239297

  4. [4]

    RSO- SLAM: A Robust Semantic Visual SLAM with Optical Flow in Complex Dynamic Environments,

    L. Qin, C. Wu, Z. Chen, X. Kong, Z. Lv, and Z. Zhao, “RSO- SLAM: A Robust Semantic Visual SLAM with Optical Flow in Complex Dynamic Environments,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 10, pp. 14669-14684, 2024. https://doi.org/10.1109/TITS.2024.3402241

  5. [5]

    PS-SLAM: A Visual SLAM for Semantic Mapping in Dynamic Outdoor Environment using Panoptic Segmentation,

    G. Li, J. Cai, C. Huang, H. Luo, and J. Yu, “PS-SLAM: A Visual SLAM for Semantic Mapping in Dynamic Outdoor Environment using Panoptic Segmentation,” IEEE Access, vol. 13, pp. 46534-46545, 2025. https://doi.org/10.1109/ACCESS.2025.3547002

  6. [6]

    3D Active Metric-Semantic SLAM,

    Y . Tao, X. Liu, I. Spasojevic, S. Agarwal, and V . Kumar, “3D Active Metric-Semantic SLAM,” IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2989–2996, 2024, https://doi.org/10.1109/LRA.2024.3363542

  7. [7]

    Seman- ticfusion: Dense 3D Semantic Mapping with Convolutional Neural Networks,

    J. McCormac, A. Handa, A. Davison, and S. Leutenegger, “Seman- ticfusion: Dense 3D Semantic Mapping with Convolutional Neural Networks,” IEEE International Conference on Robotics and Automation, pp. 4628–4635, 2017. https://doi.org/10.1109/ICRA.2017.7989538

  8. [8]

    Khronos: A unified approach for spatio-temporal metric-semantic slam in dynamic environments,

    L. Schmid, M. Abate, Y . Chang, and L. Carlone, “Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments,” arXiv preprint arXiv:2402.13817, 2024. https://doi.org/10.48550/arXiv.2402.13817

  9. [9]

    vs-graphs: Integrating visual slam and situa- tional graphs through multi-level scene understanding,

    A. Tourani, S. Ejaz, H. Bavle, M. Fernandez-Cortizas, D. Morilla- Cabello, J.L. Sanchez-Lopez, and H. V oos, “vS-Graphs: Tightly Cou- pling Visual SLAM and 3D Scene Graphs Exploiting Hierarchi- cal Scene Understanding,” ArXiv preprint arXiv:2503.01783, 2025. https://doi.org/10.48550/arXiv.2503.01783

  10. [10]

    Vision-based Situational Graphs Exploiting Fiducial Markers for the Integration of Semantic Entities,

    A. Tourani, H. Bavle, D.I. Avs ¸ar, J.L. Sanchez-Lopez, R. Munoz-Salinas, and H. V oos, “Vision-based Situational Graphs Exploiting Fiducial Markers for the Integration of Semantic Entities,” Robotics, vol. 13, no. 7, p. 106, 2024. https://doi.org/10.3390/robotics13070106

  11. [11]

    In: 2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pp

    J. Hu, L. Huang, T. Ren, S. Zhang, R. Ji, and L. Cao, “You Only Segment Once: Towards Real-time Panoptic Segmentation,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17819–17829, 2023. https://doi.org/10.1109/CVPR52729.2023.01709

  12. [12]

    BIM Informed Visual SLAM for Construction Monitoring,

    A. Bikandi-Noya, M. Fernandez-Cortizas, M. Shaheer, A. Tourani, J.L. Sanchez-Lopez, and H. V oos, “BIM Informed Visual SLAM for Construction Monitoring,” ArXiv preprint arXiv:2509.13972, 2025. https://doi.org/10.48550/arXiv.2509.13972

  13. [13]

    Optimal Randomized RANSAC,

    O. Chum, and J. Matas, “Optimal Randomized RANSAC,” IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. 30, no. 8, pp. 1472–1482, 2008. https://doi.org/10.1109/TPAMI.2007.70787

  14. [14]

    Situationally-aware Path Planning Exploiting 3D Scene Graphs,

    S. Ejaz, M. Giberna, M. Shaheer, J.A. Millan-Romera, A. Tourani, P. Kremer, H. V oos, and J.L. Sanchez-Lopez, “Situationally-aware Path Planning Exploiting 3D Scene Graphs,” IEEE Robotics and Automation Letters, vol. 11, no. 3, pp. 3358 - 3365, 2026. https://doi.org/10.1109/LRA.2026.3656775

  15. [15]

    Microscopic traffic simulation: A tool for the design, analysis and evaluation of intelligent transport systems,

    P.M. Bastos Soares, A. Tourani, M. Fernandez-Cortizas, A. Bikandi- Noya, H. V oos, and J.L. Sanchez-Lopez, “SMapper: A Multi-Modal Data Acquisition Platform for SLAM Benchmarking,” Journal of Intelligent & Robotic Systems, vol. 112, no. 20, 2026. https://doi.org/10.1007/s10846- 026-02351-7