SceneAligner: 3D-Grounded Floorplan Localization in the Wild
Pith reviewed 2026-05-22 06:52 UTC · model grok-4.3
The pith
Reconstructing 3D scenes from images allows floorplan localization in large buildings using density map proxies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that floorplan localization in unconstrained environments can be performed by grounding the task in a 3D reconstruction: the scene is reconstructed and projected to a 2D density map proxy, then aligned to the rasterized floorplan using a similarity transform enabled by cross-modal matching from a fine-tuned foundation model. This allows operation without precise vectorized maps or small-scale assumptions.
What carries the argument
The key mechanism is the projection of a gravity-aligned 3D scene reconstruction into a 2D density map that serves as a floorplan proxy, combined with adaptation of a 2D foundation model for cross-modal alignment.
If this is right
- Substantial improvements in localization accuracy over previous approaches in large-scale settings.
- Effective performance even when only a single input image is available.
- Ability to use rasterized floorplans instead of requiring vectorized ones.
- The approach scales to real-world public buildings with unconstrained image collections.
Where Pith is reading between the lines
- This suggests potential for integration into mobile navigation systems for museums or airports.
- Future work could test the method on dynamic environments where the floorplan changes over time.
- The density map proxy might be useful for other tasks like 3D to 2D matching in robotics.
Load-bearing premise
The 3D reconstruction from the image collection must yield a gravity-aligned scene with a 2D density projection accurate and complete enough to proxy the floorplan reliably.
What would settle it
A test case in a large building where the image collection is too sparse to reconstruct a complete density map, leading to poor alignment accuracy with the floorplan.
Figures
read the original abstract
Many public buildings provide floorplans with a "you are here" indicator to help visitors orient themselves. Floorplan localization seeks to computationally replicate this capability by determining where visual observations were captured within a floorplan. However, existing methods typically assume controlled small-scale environments and precise vectorized floorplans, limiting their ability to operate in large-scale buildings and rasterized floorplans. In this work, we present an approach for performing floorplan localization in the wild by grounding the task in a reconstructed 3D representation of the scene. Given an unconstrained image collection, our method reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy. Floorplan localization is then formulated as aligning this proxy with the input floorplan via a 2D similarity transform. To bridge the appearance gap between density maps and architectural floorplans, we adapt a 2D foundation model to learn cross-modal correspondences, introducing a fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency. Extensive experiments demonstrate substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image. Our code and data will be publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SceneAligner for floorplan localization in large-scale buildings using rasterized floorplans. Given an unconstrained image collection, the method reconstructs a gravity-aligned 3D scene from the images, projects it to a 2D density map serving as a floorplan proxy, and aligns the proxy to the input floorplan via a 2D similarity transform. A 2D foundation model is fine-tuned to bridge the appearance gap between density maps and architectural drawings, using a scheme that encourages semantically aligned matches while preserving structural consistency. The paper reports extensive experiments demonstrating substantial improvements over prior methods, including in extremely sparse settings with as little as a single input image.
Significance. If the central claims are supported by the full experimental evidence, this work would be significant for extending floorplan localization beyond small-scale controlled settings to practical large-scale public buildings with raster floorplans. The 3D-grounded proxy approach combined with foundation model adaptation offers a plausible path to handling unconstrained inputs, and the planned public release of code and data would support reproducibility and community follow-up.
major comments (2)
- [§3.1] §3.1 (3D reconstruction and density projection): The central claim of substantial gains even with single images rests on the assumption that the reconstructed 3D scene yields a sufficiently complete and accurate 2D density projection to serve as a reliable floorplan proxy. Standard SfM/monocular pipelines are known to produce gaps or misalignments in textureless/large interiors; the manuscript should add quantitative proxy-fidelity metrics (e.g., coverage ratio or structural similarity to ground-truth floorplans) specifically for the sparse and single-image regimes to verify this load-bearing step.
- [§4] §4 (Experiments, sparse-setting results): The reported improvements in extremely sparse cases are load-bearing for the main contribution. The evaluation should include per-scene error distributions, failure-case analysis, or reconstruction-quality ablations rather than aggregate metrics alone, so readers can assess whether gains persist when the density proxy is incomplete.
minor comments (2)
- [Abstract] Abstract: The phrase 'large-scale buildings' would benefit from a brief quantitative characterization (e.g., typical floor area or number of rooms) to help readers gauge the operating regime.
- [§3] Notation: The distinction between the input raster floorplan and the projected density map could be made clearer with consistent symbols or a small diagram in the method overview.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to incorporate additional quantitative analysis and granular evaluations in sparse regimes.
read point-by-point responses
-
Referee: [§3.1] §3.1 (3D reconstruction and density projection): The central claim of substantial gains even with single images rests on the assumption that the reconstructed 3D scene yields a sufficiently complete and accurate 2D density projection to serve as a reliable floorplan proxy. Standard SfM/monocular pipelines are known to produce gaps or misalignments in textureless/large interiors; the manuscript should add quantitative proxy-fidelity metrics (e.g., coverage ratio or structural similarity to ground-truth floorplans) specifically for the sparse and single-image regimes to verify this load-bearing step.
Authors: We agree that explicit quantification of proxy fidelity strengthens the central claims. In the revised manuscript we have added a new analysis subsection reporting coverage ratio (fraction of floorplan area covered by projected 3D points) and SSIM between the density map and ground-truth floorplan, computed separately for single-image, 5-image, and full-set regimes across all test scenes. These metrics confirm that structural similarity remains adequate for alignment even when coverage is low, directly supporting the reported localization gains. revision: yes
-
Referee: [§4] §4 (Experiments, sparse-setting results): The reported improvements in extremely sparse cases are load-bearing for the main contribution. The evaluation should include per-scene error distributions, failure-case analysis, or reconstruction-quality ablations rather than aggregate metrics alone, so readers can assess whether gains persist when the density proxy is incomplete.
Authors: We acknowledge that aggregate numbers alone leave open questions about robustness. The revision now includes per-scene localization error box plots (supplementary material), a failure-case analysis subsection in the main text that examines scenes with incomplete reconstructions due to textureless walls, and an ablation that correlates point-cloud density with final alignment error. These additions show that our method continues to outperform baselines even under partial proxy coverage. revision: yes
Circularity Check
No significant circularity; derivation relies on external reconstruction and models
full rationale
The paper's core pipeline reconstructs a gravity-aligned 3D scene from unconstrained images (including single-image cases), projects it to a 2D density map as floorplan proxy, and aligns via 2D similarity transform after fine-tuning a foundation model for cross-modal matching. No quoted equations, definitions, or steps in the abstract or described method reduce a claimed prediction or result to a fitted parameter or self-referential input by construction. The approach invokes standard external 3D reconstruction and foundation models rather than deriving the target alignment from quantities defined using the floorplan itself. Experiments claim improvements on benchmarks, but the derivation chain remains independent of the final localization output.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Unconstrained image collections yield sufficiently accurate gravity-aligned 3D reconstructions for large-scale indoor scenes.
- domain assumption A 2D foundation model can be fine-tuned to produce semantically aligned matches between density maps and rasterized floorplans while preserving structural consistency.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
reconstructs a gravity-aligned 3D scene and projects it into a 2D density map that serves as a floorplan proxy... adapt a 2D foundation model... fine-tuning scheme that encourages semantically aligned matches while preserving structural consistency
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
L=λ featLfeat +λ regrLregr +λ topoLtopo +λ geoLgeo... symmetric InfoNCE loss... topology preservation loss Ltopo and a geometry consistency loss Lgeo
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-VL Technical Report.ar...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
SURF: Speeded Up Robust Features
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. SURF: Speeded Up Robust Features. In Proceedings of the European Conference on Computer Vision (ECCV), 2006
work page 2006
-
[3]
Robust LiDAR- based localization in architectural floor plans
Federico Boniardi, Tim Caselitz, Rainer Kummerle, and Wolfram Burgard. Robust LiDAR- based localization in architectural floor plans. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3318–3324. IEEE, 2017
work page 2017
-
[4]
Federico Boniardi, Tim Caselitz, Rainer Kümmerle, and Wolfram Burgard. A pose graph-based localization system for long-term navigation in CAD floor plans.Robotics and Autonomous Systems, pages 84–97, 2019
work page 2019
-
[5]
Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network
Federico Boniardi, Abhinav Valada, Rohit Mohan, Tim Caselitz, and Wolfram Burgard. Robot Localization in Floor Plans Using a Room Layout Edge Extraction Network. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5291–5297. IEEE, 2019
work page 2019
-
[6]
F3Loc: Fusion and Filtering for Floorplan Localization
Changan Chen, Rui Wang, Christoph V ogel, and Marc Pollefeys. F3Loc: Fusion and Filtering for Floorplan Localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18029–18038, 2024
work page 2024
-
[7]
Floor-SP: Inverse CAD for Floor- plans by Sequential Room-wise Shortest Path
Jiacheng Chen, Chen Liu, Jiaye Wu, and Yasutaka Furukawa. Floor-SP: Inverse CAD for Floor- plans by Sequential Room-wise Shortest Path. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019
work page 2019
-
[8]
You Are Here: Mimicking the Human Thinking Process in Reading Floor-Plans
Hang Chu, Dong Ki Kim, and Tsuhan Chen. You Are Here: Mimicking the Human Thinking Process in Reading Floor-Plans. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2210–2218, 2015
work page 2015
-
[9]
Indoor-Outdoor 3D Reconstruction Alignment
Andrea Cohen, Johannes L Schönberger, Pablo Speciale, Torsten Sattler, Jan-Michael Frahm, and Marc Pollefeys. Indoor-Outdoor 3D Reconstruction Alignment. InProceedings of the European Conference on Computer Vision (ECCV), pages 285–300. Springer, 2016
work page 2016
-
[10]
Tamir Cohen, Leo Segre, Shay Shomer-Chai, Shai Avidan, and Hadar Averbuch-Elor. Scene Grounding In the Wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
work page 2026
-
[11]
SuperPoint: Self-Supervised Interest Point Detection and Description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. SuperPoint: Self-Supervised Interest Point Detection and Description. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 224–236, 2018
work page 2018
-
[12]
Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981
work page 1981
-
[13]
Supercharging Floorplan Localization with Semantic Rays
Yuval Grader and Hadar Averbuch-Elor. Supercharging Floorplan Localization with Semantic Rays. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 27116–27125, 2025
work page 2025
-
[14]
LaLaLoc++: Global Floor Plan Compre- hension for Layout Localisation in Unvisited Environments
Henry Howard-Jenkins and Victor Adrian Prisacariu. LaLaLoc++: Global Floor Plan Compre- hension for Layout Localisation in Unvisited Environments. InProceedings of the European Conference on Computer Vision (ECCV), pages 693–709, 2022
work page 2022
-
[15]
Henry Howard-Jenkins, Jose-Raul Ruiz-Sarmiento, and Victor Adrian Prisacariu. LaLaLoc: La- tent Layout Localisation in Dynamic, Unvisited Environments.arXiv preprint arXiv:2104.09169, 2021. 10
-
[16]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations (ICLR), 2022
work page 2022
-
[17]
C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction
Kuan Wei Huang, Brandon Li, Bharath Hariharan, and Noah Snavely. C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction. InAdvances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[18]
W-RGB-D: Floor-plan-based indoor global localization using a depth camera and WiFi
Seigo Ito, Felix Endres, Markus Kuderer, Gian Diego Tipaldi, Cyrill Stachniss, and Wolfram Burgard. W-RGB-D: Floor-plan-based indoor global localization using a depth camera and WiFi. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 417–422. IEEE, 2014
work page 2014
-
[19]
Fully Geometric Panoramic Localization
Junho Kim, Jiwon Jeong, and Young Min Kim. Fully Geometric Panoramic Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[20]
Long-tail Internet photo reconstruction
Yuan Li, Yuanbo Xiangli, Hadar Averbuch-Elor, Noah Snavely, and Ruojin Cai. Long-tail Internet photo reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
work page 2026
-
[21]
Online Localization with Imprecise Floor Space Maps using Stochastic Gradient Descent
Zhikai Li, Marcelo H Ang, and Daniela Rus. Online Localization with Imprecise Floor Space Maps using Stochastic Gradient Descent. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8571–8578. IEEE, 2020
work page 2020
-
[22]
LightGlue: Local Feature Matching at Light Speed
Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. LightGlue: Local Feature Matching at Light Speed. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[23]
FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans
Chen Liu, Jiaye Wu, and Yasutaka Furukawa. FloorNet: A Unified Framework for Floorplan Reconstruction from 3D Scans. InProceedings of the European Conference on Computer Vision (ECCV), 2018
work page 2018
-
[24]
WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting, 2025
Yifan Liu, Zhiyuan Min, Zhenwei Wang, Junta Wu, Tengfei Wang, Yixuan Yuan, Yawei Luo, and Chunchao Guo. WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting, 2025
work page 2025
-
[25]
PolyRoom: Room-aware Transformer for Floorplan Reconstruction
Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, and Shuhan Shen. PolyRoom: Room-aware Transformer for Floorplan Reconstruction. InProceedings of the European Conference on Computer Vision (ECCV), 2024
work page 2024
-
[26]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations (ICLR), 2019
work page 2019
-
[27]
David G Lowe. Distinctive Image Features from Scale-Invariant Keypoints.International Journal of Computer Vision (IJCV), 2004
work page 2004
-
[28]
The 3D Jigsaw Puzzle: Mapping Large Indoor Spaces
Ricardo Martin-Brualla, Yanling He, Bryan C Russell, and Steven M Seitz. The 3D Jigsaw Puzzle: Mapping Large Indoor Spaces. InProceedings of the European Conference on Computer Vision (ECCV), pages 1–16. Springer, 2014
work page 2014
-
[29]
SeDAR: Reading Floorplans Like a Human—Using Deep Learning to Enable Human-Inspired Localisation
Oscar Mendez, Simon Hadfield, Nicolas Pugeault, and Richard Bowden. SeDAR: Reading Floorplans Like a Human—Using Deep Learning to Enable Human-Inspired Localisation. International Journal of Computer Vision (IJCV), 128:1286–1310, 2020
work page 2020
-
[30]
ProtoSnap: Prototype Alignment for Cuneiform Signs
Rachel Mikulinsky, Morris Alper, Shai Gordin, Enrique Jiménez, Yoram Cohen, and Hadar Averbuch-Elor. ProtoSnap: Prototype Alignment for Cuneiform Signs. InInternational Conference on Learning Representations (ICLR), volume 2025, pages 88720–88739, 2025
work page 2025
-
[31]
LASER: LAtent SpacE Rendering for 2D Visual Localization
Zhixiang Min, Naji Khosravan, Zachary Bessinger, Manjunath Narayana, Sing Bing Kang, Enrique Dunn, and Ivaylo Boyadzhiev. LASER: LAtent SpacE Rendering for 2D Visual Localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11122–11131, 2022. 11
work page 2022
-
[32]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation Learning with Contrastive Predictive Coding.arXiv preprint arXiv:1807.03748, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[33]
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models From Natural Language Supervision. InInternational Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021
work page 2021
-
[34]
ORB: An efficient alternative to SIFT or SURF
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. ORB: An efficient alternative to SIFT or SURF. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2011
work page 2011
-
[35]
3D Wikipedia: Using online text to automatically label and navigate reconstructed geometry
Bryan C Russell, Ricardo Martin-Brualla, Daniel J Butler, Steven M Seitz, and Luke Zettlemoyer. 3D Wikipedia: Using online text to automatically label and navigate reconstructed geometry. ACM Transactions on Graphics (TOG), 32(6):1–10, 2013
work page 2013
-
[36]
SuperGlue: Learning Feature Matching With Graph Neural Networks
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. SuperGlue: Learning Feature Matching With Graph Neural Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
work page 2020
-
[37]
Scene Segmentation Using the Wisdom of Crowds
Ian Simon and Steven M Seitz. Scene Segmentation Using the Wisdom of Crowds. In Proceedings of the European Conference on Computer Vision (ECCV), pages 541–553. Springer, 2008
work page 2008
-
[38]
Scene Summarization for Online Image Collections
Ian Simon, Noah Snavely, and Steven M Seitz. Scene Summarization for Online Image Collections. In2007 IEEE 11th International conference on computer vision, pages 1–8. IEEE, 2007
work page 2007
-
[39]
Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
RoFormer: Enhanced transformer with Rotary Position Embedding.Neurocomputing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with Rotary Position Embedding.Neurocomputing, 568:127063, 2024
work page 2024
-
[41]
LoFTR: Detector-Free Local Feature Matching with Transformers
Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. LoFTR: Detector-Free Local Feature Matching with Transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
work page 2021
-
[42]
Emer- gent Correspondence from Image Diffusion
Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emer- gent Correspondence from Image Diffusion. InAdvances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[43]
GeoCalib: Learning Single-image Calibration with Geometric Optimization
Alexander Veicht, Paul-Edouard Sarlin, Philipp Lindenberger, and Marc Pollefeys. GeoCalib: Learning Single-image Calibration with Geometric Optimization. InProceedings of the Euro- pean Conference on Computer Vision (ECCV), 2024
work page 2024
-
[44]
VGGT: Visual Geometry Grounded Transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. VGGT: Visual Geometry Grounded Transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[45]
Lost Shopping! Monocular Localization in Large Indoor Spaces
Shenlong Wang, Sanja Fidler, and Raquel Urtasun. Lost Shopping! Monocular Localization in Large Indoor Spaces. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2695–2703, 2015
work page 2015
-
[46]
DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUSt3R: Geometric 3D Vision Made Easy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 12
work page 2024
-
[47]
GLFP: Global Localization from a Floor Plan
Xipeng Wang, Ryan J Marcotte, and Edwin Olson. GLFP: Global Localization from a Floor Plan. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1627–1632. IEEE, 2019
work page 2019
-
[48]
π3: Permutation-Equivariant Visual Geometry Learning
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He. π3: Permutation-Equivariant Visual Geometry Learning. InInternational Conference on Learning Representations (ICLR), 2026
work page 2026
-
[49]
Discovering Details and Scene Structure with Hierarchical Iconoid Shift
Tobias Weyand and Bastian Leibe. Discovering Details and Scene Structure with Hierarchical Iconoid Shift. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3479–3486, 2013
work page 2013
-
[50]
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
Xiaoshi Wu, Hadar Averbuch-Elor, Jin Sun, and Noah Snavely. Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 428–437, 2021
work page 2021
-
[51]
UnLoc: Leveraging Depth Uncertainties for Floorplan Localization
Matthias Wüest, Francis Engelmann, Ondrej Miksik, Marc Pollefeys, and Daniel Barath. UnLoc: Leveraging Depth Uncertainties for Floorplan Localization. InInternational Conference on Learning Representations (ICLR), 2026
work page 2026
-
[52]
Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries
Yuanwen Yue, Theodora Kontogianni, Konrad Schindler, and Francis Engelmann. Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[53]
Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, and Zihan Zhou. Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling. InProceedings of the European Conference on Computer Vision (ECCV), 2020. 13 Appendix We refer readers to the accompanying viewer.html for 360◦ view comparisons of floorplan-aligned 3D reconstructions (Sec. A)...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.