Learning-Based Hierarchical Scene Graph Matching for Robot Localization Leveraging Prior Maps

Holger Voos; Jose Andres Millan-Romera; Jose Luis Sanchez-Lopez; Matteo Giorgi; Nimrod Millenium Ndulue

arxiv: 2604.27821 · v1 · submitted 2026-04-30 · 💻 cs.RO

Learning-Based Hierarchical Scene Graph Matching for Robot Localization Leveraging Prior Maps

Nimrod Millenium Ndulue , Jose Andres Millan-Romera , Matteo Giorgi , Holger Voos , Jose Luis Sanchez-Lopez This is my paper

Pith reviewed 2026-05-07 04:52 UTC · model grok-4.3

classification 💻 cs.RO

keywords scene graph matchingrobot localizationBIM priorshierarchical graphszero-shot generalizationSLAM drift correctionLiDAR mappingindoor navigation

0 comments

The pith

A learned hierarchical scene graph matcher trained only on floor plans outperforms combinatorial baselines on real LiDAR data while running an order of magnitude faster.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method for matching hierarchical scene graphs constructed from robot LiDAR sensors against offline prior maps such as Building Information Models to support accurate indoor localization. Current combinatorial matching techniques scale poorly with environment size, while earlier learned approaches treat graphs as flat structures and overlook the natural room-to-surface hierarchy. The new pipeline augments both graphs with edge types that capture intra-level and inter-level semantic relationships, then trains an end-to-end differentiable model exclusively on floor-plan data. Successful matching would let robots correct SLAM drift by anchoring observations to known architectural structure without collecting large amounts of real-world labeled data.

Core claim

The paper establishes that augmenting both the online sensor graph and the prior map graph with semantically motivated edge types for intra- and inter-level relationships allows an end-to-end trained model to compute reliable node correspondences simultaneously across the hierarchy. When trained exclusively on floor plans, the resulting matcher achieves higher F1 scores than combinatorial baselines on real LiDAR environments and executes an order of magnitude faster, demonstrating zero-shot generalization for BIM-assisted robot localization.

What carries the argument

A learned end-to-end differentiable pipeline that augments scene graphs with semantically motivated edge types encoding intra-level and inter-level relationships, enabling hierarchical node matching from rooms down to surfaces.

If this is right

Hierarchical matching can be performed in one forward pass rather than separate stages for rooms and surfaces.
The model transfers from synthetic floor-plan training data to real sensor data without additional adaptation.
Runtime speed improves enough to support online use inside a robot's navigation loop.
Higher matching accuracy directly strengthens drift correction when SLAM is anchored to BIM priors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same edge-augmentation idea could be tested on other sensor modalities such as RGB-D or radar to broaden applicability.
Extending the hierarchy to include movable objects might support localization in partially dynamic environments.
Integration with multi-floor or multi-building priors could address scaling questions left open by the current indoor focus.
A controlled ablation that removes the inter-level edges would quantify how much the hierarchy itself contributes to the observed gains.

Load-bearing premise

That adding semantically motivated edge types for intra- and inter-level relationships and training solely on floor plans will yield reliable node correspondences on real LiDAR data without domain-specific fine-tuning or post-processing.

What would settle it

A head-to-head evaluation in which the learned matcher records a lower F1 score than the combinatorial baseline on a set of real LiDAR scene graphs would falsify the claim of viable zero-shot generalization.

Figures

Figures reproduced from arXiv: 2604.27821 by Holger Voos, Jose Andres Millan-Romera, Jose Luis Sanchez-Lopez, Matteo Giorgi, Nimrod Millenium Ndulue.

**Figure 1.** Figure 1: Overview of the proposed pipeline. A shared MLP improves the initial node features, after which a shared GATv2 encoder produces structure-aware embeddings for both the A-graph (derived from BIM) and the S-graph (built online from LiDAR SLAM). A dot-product affinity matrix is computed, normalized via Sinkhorn with dummy-column padding to handle partial observations, and decoded into a hard one-to-one corres… view at source ↗

**Figure 2.** Figure 2: Example graph used for evaluation: a floor plan from the MSD syn view at source ↗

read the original abstract

Accurate localization is a fundamental requirement for autonomous robots operating in indoor environments. Scene graphs encode the spatial structure of an environment as a hierarchy of semantic entities and their relationships, and can be constructed both online from robot sensor data and offline from architectural priors such as Building Information Models (BIM). Matching these two complementary representations enables drift correction in SLAM by grounding robot observations against a known structural prior. However, establishing reliable node-to-node correspondences between them remains an open challenge: existing combinatorial methods are prohibitively expensive at scale, and prior learned approaches address only flat graph matching, ignoring the multi-level semantic structure present in both representations. Here we present a learned, end-to-end differentiable pipeline that augments both graphs with semantically motivated edge types encoding intra- and inter- level relationships, explicitly exploiting this hierarchy to enable simultaneous matching from high-level room concepts down to low-level wall surfaces. Trained exclusively on floor plans, the proposed method outperforms the combinatorial baseline in F1 on real LiDAR environments while running an order of magnitude faster, demonstrating viable zero-shot generalization for BIM-assisted robot localization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces hierarchical scene graph matching with intra- and inter-level edges that trains on floor plans and claims zero-shot outperformance on real LiDAR data, but the robustness to sensor-induced graph distortions needs verification from the full experiments.

read the letter

The main thing to know is that this paper gives a learned way to match hierarchical scene graphs built from BIM floor plans against those extracted from robot LiDAR, with the model trained only on the clean plans and then applied directly to real sensor data for localization drift correction. It claims better F1 scores and much higher speed than a combinatorial baseline. That zero-shot transfer is the central result they are selling for indoor robot deployment. What is actually new is the explicit addition of edge types that capture both relationships inside a level and connections between levels, so the matching can happen simultaneously across scales from rooms down to wall surfaces. They make the whole pipeline end-to-end differentiable, which lets them train it as a single model instead of separate stages. That is a reasonable technical move beyond the flat-graph learned matchers that came before. The paper does well at framing a practical robotics problem and at showing that the hierarchical structure can be exploited without needing real LiDAR data for training. If the reported speed and accuracy numbers hold up in the full results, this would be useful for anyone trying to ground SLAM against architectural priors in structured buildings. The soft spots are mostly around the generalization claim. The stress-test note is right to flag that LiDAR graphs often have missing nodes, boundary errors, and label noise that clean floor-plan graphs do not. Without ablations that add controlled perturbations during testing or some form of domain randomization in training, it is hard to know whether the F1 advantage comes from the method itself or from the particular test environments chosen. The abstract gives no dataset sizes, no error bars, and no details on how the LiDAR graphs are built, so those sections will need close reading to judge reproducibility. This is aimed at robotics researchers working on semantic SLAM or prior-assisted localization. A reader who needs faster map matching in indoor settings with existing BIM data would find the approach and the speed claims worth looking at. It shows clear thinking on the hierarchy and the training setup, so it deserves a serious referee even if the zero-shot robustness requires more evidence in review. I would send it out for peer review rather than desk reject.

Referee Report

3 major / 2 minor

Summary. The paper proposes a learning-based hierarchical scene graph matching pipeline for aligning robot-observed LiDAR scene graphs with prior BIM-derived maps to enable drift correction in indoor SLAM. Graphs are augmented with semantically motivated intra- and inter-level edge types; an end-to-end differentiable model is trained exclusively on floor plans and evaluated for node correspondences from room-level concepts down to wall surfaces. The central empirical claim is that the method achieves higher F1 scores than a combinatorial baseline on real LiDAR data while running an order of magnitude faster, thereby demonstrating viable zero-shot generalization.

Significance. If the zero-shot generalization result holds, the work would offer a computationally efficient route to leveraging architectural priors for robust localization, addressing a practical bottleneck in combinatorial scene-graph matching. The hierarchical formulation that explicitly exploits multi-level semantics constitutes a clear advance over existing flat-graph matching techniques and could influence future BIM-assisted SLAM systems in structured indoor environments.

major comments (3)

[Abstract] Abstract: the claim that the method 'outperforms the combinatorial baseline in F1 on real LiDAR environments' is presented without any numerical F1 values, dataset sizes, error bars, statistical tests, or implementation details, rendering the magnitude and reliability of the reported advantage impossible to assess from the given text.
[Methods] Methods (training description): the end-to-end training is performed exclusively on clean floor-plan graphs with no domain randomization, noise injection, or explicit modeling of LiDAR-specific distortions (missing nodes, boundary inaccuracies, semantic extraction errors); this directly undermines the zero-shot transfer claim because the architecture description provides no mechanism that would guarantee invariance to the graph perturbations present in real sensor data.
[Results] Results: no ablation studies or sensitivity analyses are reported that measure performance degradation under controlled perturbations of the test graphs (e.g., random node deletion or label noise); without such controls it remains possible that the observed F1 advantage arises from favorable test-set selection rather than from the claimed learned robustness.

minor comments (2)

[Abstract] Abstract: the phrase 'semantically motivated edge types' is introduced without a concrete definition or illustrative example; adding one sentence of clarification would improve readability.
[Abstract] Abstract: the statement 'running an order of magnitude faster' should be supported by explicit wall-clock timings or complexity comparisons in the results section rather than left as a qualitative claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have prepared point-by-point responses to each major comment below. Where the comments identify opportunities for improvement, we will revise the manuscript accordingly to strengthen the presentation of our results on the learned hierarchical scene graph matching approach.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the method 'outperforms the combinatorial baseline in F1 on real LiDAR environments' is presented without any numerical F1 values, dataset sizes, error bars, statistical tests, or implementation details, rendering the magnitude and reliability of the reported advantage impossible to assess from the given text.

Authors: We agree that the abstract would benefit from greater quantitative specificity. The full manuscript reports concrete F1 scores, dataset sizes (multiple real LiDAR environments), runtime comparisons (order of magnitude faster), and implementation details in the results section. In the revised version we will incorporate representative numerical F1 values, dataset scale, and a brief indication of the performance margin directly into the abstract while preserving its length constraints. revision: yes
Referee: [Methods] Methods (training description): the end-to-end training is performed exclusively on clean floor-plan graphs with no domain randomization, noise injection, or explicit modeling of LiDAR-specific distortions (missing nodes, boundary inaccuracies, semantic extraction errors); this directly undermines the zero-shot transfer claim because the architecture description provides no mechanism that would guarantee invariance to the graph perturbations present in real sensor data.

Authors: Training exclusively on clean floor-plan graphs is a deliberate design choice that exploits the availability of complete architectural priors. The zero-shot generalization to real LiDAR data is demonstrated empirically through successful node correspondence across room-to-surface levels despite sensor noise. The hierarchical edge augmentation and end-to-end differentiable matching learn correspondence patterns rather than exact structures, providing robustness. We will revise the methods section to more explicitly articulate these inductive biases and their role in handling the cited perturbations, supported by the observed transfer results. revision: partial
Referee: [Results] Results: no ablation studies or sensitivity analyses are reported that measure performance degradation under controlled perturbations of the test graphs (e.g., random node deletion or label noise); without such controls it remains possible that the observed F1 advantage arises from favorable test-set selection rather than from the claimed learned robustness.

Authors: We acknowledge that controlled ablation studies would provide additional evidence for robustness. The current evaluation focuses on real LiDAR data to reflect practical conditions, but we will add sensitivity analyses in the revised results section. These will include performance metrics under simulated perturbations such as random node deletion and label noise applied to the test graphs, quantifying F1 degradation to better substantiate the learned model's contribution to generalization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training-to-evaluation pipeline with independent test data

full rationale

The paper describes a learned end-to-end differentiable pipeline that augments scene graphs with intra- and inter-level edge types and performs hierarchical matching. It is trained exclusively on floor-plan graphs and evaluated for F1 and runtime on separate real LiDAR-derived graphs. No equations, derivations, or self-citations are shown that reduce the reported performance advantage to a fitted parameter, a self-definition, or a prior result by the same authors. The zero-shot generalization claim rests on held-out empirical comparison against a combinatorial baseline rather than on any tautological reduction of the test metric to the training inputs. The architecture choices (message passing over augmented edges) are presented as design decisions, not as predictions derived from the evaluation data itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on standard machine-learning assumptions plus two paper-specific modeling choices. No explicit numerical free parameters are named. The invented entities are the added edge types. Axioms are the usual ones for differentiable graph matching.

axioms (2)

domain assumption Scene graphs can be augmented with semantically motivated edge types that encode intra- and inter-level relationships without introducing inconsistencies
Invoked when the abstract states the pipeline augments both graphs with these edge types to enable simultaneous matching from high-level rooms to low-level surfaces.
standard math End-to-end differentiability of the matching pipeline is feasible and preserves the hierarchical structure
Stated as the core of the learned pipeline; standard assumption in modern graph neural network literature.

invented entities (1)

Semantically motivated edge types for intra- and inter-level relationships no independent evidence
purpose: To explicitly encode hierarchy so the model can match from room concepts down to wall surfaces
Introduced in the abstract as the key augmentation that distinguishes the method from flat-graph approaches; no independent evidence provided beyond the claimed performance gain.

pith-pipeline@v0.9.0 · 5504 in / 1611 out tokens · 40648 ms · 2026-05-07T04:52:11.818811+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Hydra: A real-time spatial perception system for 3D scene graph construction and optimization,

N. Hughes, Y . Chang, and L. Carlone, “Hydra: A real-time spatial perception system for 3D scene graph construction and optimization,” inRobotics: Science and Systems (RSS), 2022

work page 2022
[2]

Sit- uational graphs for robot navigation in structured indoor environments,

H. Bavle, J. L. Sanchez-Lopez, M. Shaheer, J. Civera, and H. V oos, “Sit- uational graphs for robot navigation in structured indoor environments,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9107–9114, 2022

work page 2022
[3]

Graph-based global robot localization informing situational graphs with architectural graphs,

M. Shaheer, J. A. Millan-Romera, H. Bavle, J. L. Sanchez-Lopez, J. Civera, and H. V oos, “Graph-based global robot localization informing situational graphs with architectural graphs,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 9155–9162

work page 2023
[4]

Connecting semantic building information models and robotics: An application to 2D LiDAR-based localization,

R. W. Hendrikx, E. de Gelder, D. Habets, P. Pauwels, E. Torta, and J. P. van den Heuvel, “Connecting semantic building information models and robotics: An application to 2D LiDAR-based localization,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 11 654–11 660

work page 2021
[5]

A pose graph- based localization system for long-term navigation in CAD floor plans,

F. Boniardi, T. Caselitz, R. K ¨ummerle, and W. Burgard, “A pose graph- based localization system for long-term navigation in CAD floor plans,” Robotics and Autonomous Systems, vol. 112, pp. 84–97, 2019

work page 2019
[6]

Semantic localization on BIM-generated maps using a 3D LiDAR sensor,

H. Yin, J. M. Liew, W. L. Lee, M. H. Ang, K.-W. Yeoh, and J. Tian, “Semantic localization on BIM-generated maps using a 3D LiDAR sensor,”Automation in Construction, vol. 146, p. 104759, 2022

work page 2022
[7]

Neural graph matching network: Learning Lawler’s quadratic assignment problem with extension to hypergraph and multi-graph matching,

R. Wang, J. Yan, and X. Yang, “Neural graph matching network: Learning Lawler’s quadratic assignment problem with extension to hypergraph and multi-graph matching,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5261–5279, 2022

work page 2022
[8]

LIO-BIM – coupling lidar inertial odometry with building information modeling for robot localization and mapping,

J. St ¨uhrenberg and K. Smarsly, “LIO-BIM – coupling lidar inertial odometry with building information modeling for robot localization and mapping,”Advanced Engineering Informatics, vol. 66, p. 103477, 2025

work page 2025
[9]

Learning combinatorial embedding networks for deep graph matching,

R. Wang, J. Yan, and X. Yang, “Learning combinatorial embedding networks for deep graph matching,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 3056– 3065

work page 2019
[10]

Learning deep graph matching with channel-independent embedding and hungarian attention,

T. Yu, R. Wang, J. Yan, and B. Li, “Learning deep graph matching with channel-independent embedding and hungarian attention,” inInterna- tional Conference on Learning Representations (ICLR), 2020

work page 2020
[11]

Graph matching with bi-level noisy correspondence,

Y . Lin, M. Guo, P. Hu, C. Wang, and J. Lv, “Graph matching with bi-level noisy correspondence,” inProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), 2023

work page 2023
[12]

SuperGlue: Learning feature matching with graph neural networks,

P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperGlue: Learning feature matching with graph neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2020, pp. 4938–4947

work page 2020
[13]

arXiv preprint arXiv:2403.19474 , year=

Y . Xie, A. Pagani, and D. Stricker, “SG-PGM: Partial graph matching network with semantic geometric fusion for 3D scene graph alignment and its downstream tasks,” inarXiv preprint arXiv:2403.19474, 2024

work page arXiv 2024
[14]

How attentive are graph attention networks?

S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” inInternational Conference on Learning Representations (ICLR), 2022

work page 2022
[15]

Graph neural network-based scene graph matching for robot localization,

M. Giorgi, “Graph neural network-based scene graph matching for robot localization,” Master’s thesis, University of Pisa, 2024

work page 2024
[16]

Concerning nonnegative matrices and doubly stochastic matrices,

R. Sinkhorn and P. Knopp, “Concerning nonnegative matrices and doubly stochastic matrices,”Pacific Journal of Mathematics, vol. 21, no. 2, pp. 343–348, 1967

work page 1967
[17]

The Hungarian method for the assignment problem,

H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1–2, pp. 83–97, 1955

work page 1955
[18]

Modified swiss dwellings: A 2D floor plan dataset for semantic scene understanding,

S. van Engelenburg, T. Lucassen, F. I. Karahanoglu, and M. A. Westen- berg, “Modified swiss dwellings: A 2D floor plan dataset for semantic scene understanding,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023

work page 2023
[19]

Optuna: A next- generation hyperparameter optimization framework,

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next- generation hyperparameter optimization framework,” inProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019, pp. 2623–2631

work page 2019

[1] [1]

Hydra: A real-time spatial perception system for 3D scene graph construction and optimization,

N. Hughes, Y . Chang, and L. Carlone, “Hydra: A real-time spatial perception system for 3D scene graph construction and optimization,” inRobotics: Science and Systems (RSS), 2022

work page 2022

[2] [2]

Sit- uational graphs for robot navigation in structured indoor environments,

H. Bavle, J. L. Sanchez-Lopez, M. Shaheer, J. Civera, and H. V oos, “Sit- uational graphs for robot navigation in structured indoor environments,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9107–9114, 2022

work page 2022

[3] [3]

Graph-based global robot localization informing situational graphs with architectural graphs,

M. Shaheer, J. A. Millan-Romera, H. Bavle, J. L. Sanchez-Lopez, J. Civera, and H. V oos, “Graph-based global robot localization informing situational graphs with architectural graphs,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 9155–9162

work page 2023

[4] [4]

Connecting semantic building information models and robotics: An application to 2D LiDAR-based localization,

R. W. Hendrikx, E. de Gelder, D. Habets, P. Pauwels, E. Torta, and J. P. van den Heuvel, “Connecting semantic building information models and robotics: An application to 2D LiDAR-based localization,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 11 654–11 660

work page 2021

[5] [5]

A pose graph- based localization system for long-term navigation in CAD floor plans,

F. Boniardi, T. Caselitz, R. K ¨ummerle, and W. Burgard, “A pose graph- based localization system for long-term navigation in CAD floor plans,” Robotics and Autonomous Systems, vol. 112, pp. 84–97, 2019

work page 2019

[6] [6]

Semantic localization on BIM-generated maps using a 3D LiDAR sensor,

H. Yin, J. M. Liew, W. L. Lee, M. H. Ang, K.-W. Yeoh, and J. Tian, “Semantic localization on BIM-generated maps using a 3D LiDAR sensor,”Automation in Construction, vol. 146, p. 104759, 2022

work page 2022

[7] [7]

Neural graph matching network: Learning Lawler’s quadratic assignment problem with extension to hypergraph and multi-graph matching,

R. Wang, J. Yan, and X. Yang, “Neural graph matching network: Learning Lawler’s quadratic assignment problem with extension to hypergraph and multi-graph matching,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5261–5279, 2022

work page 2022

[8] [8]

LIO-BIM – coupling lidar inertial odometry with building information modeling for robot localization and mapping,

J. St ¨uhrenberg and K. Smarsly, “LIO-BIM – coupling lidar inertial odometry with building information modeling for robot localization and mapping,”Advanced Engineering Informatics, vol. 66, p. 103477, 2025

work page 2025

[9] [9]

Learning combinatorial embedding networks for deep graph matching,

R. Wang, J. Yan, and X. Yang, “Learning combinatorial embedding networks for deep graph matching,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 3056– 3065

work page 2019

[10] [10]

Learning deep graph matching with channel-independent embedding and hungarian attention,

T. Yu, R. Wang, J. Yan, and B. Li, “Learning deep graph matching with channel-independent embedding and hungarian attention,” inInterna- tional Conference on Learning Representations (ICLR), 2020

work page 2020

[11] [11]

Graph matching with bi-level noisy correspondence,

Y . Lin, M. Guo, P. Hu, C. Wang, and J. Lv, “Graph matching with bi-level noisy correspondence,” inProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), 2023

work page 2023

[12] [12]

SuperGlue: Learning feature matching with graph neural networks,

P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperGlue: Learning feature matching with graph neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2020, pp. 4938–4947

work page 2020

[13] [13]

arXiv preprint arXiv:2403.19474 , year=

Y . Xie, A. Pagani, and D. Stricker, “SG-PGM: Partial graph matching network with semantic geometric fusion for 3D scene graph alignment and its downstream tasks,” inarXiv preprint arXiv:2403.19474, 2024

work page arXiv 2024

[14] [14]

How attentive are graph attention networks?

S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” inInternational Conference on Learning Representations (ICLR), 2022

work page 2022

[15] [15]

Graph neural network-based scene graph matching for robot localization,

M. Giorgi, “Graph neural network-based scene graph matching for robot localization,” Master’s thesis, University of Pisa, 2024

work page 2024

[16] [16]

Concerning nonnegative matrices and doubly stochastic matrices,

R. Sinkhorn and P. Knopp, “Concerning nonnegative matrices and doubly stochastic matrices,”Pacific Journal of Mathematics, vol. 21, no. 2, pp. 343–348, 1967

work page 1967

[17] [17]

The Hungarian method for the assignment problem,

H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1–2, pp. 83–97, 1955

work page 1955

[18] [18]

Modified swiss dwellings: A 2D floor plan dataset for semantic scene understanding,

S. van Engelenburg, T. Lucassen, F. I. Karahanoglu, and M. A. Westen- berg, “Modified swiss dwellings: A 2D floor plan dataset for semantic scene understanding,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023

work page 2023

[19] [19]

Optuna: A next- generation hyperparameter optimization framework,

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next- generation hyperparameter optimization framework,” inProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019, pp. 2623–2631

work page 2019