GRLoc: Geometric Representation Regression for Visual Localization

Changyang Li; Lixiang Liu; Qingan Yan; Xuejian Ma; Yi Xu; Zhan Li

arxiv: 2511.13864 · v2 · pith:7N3XR7ERnew · submitted 2025-11-17 · 💻 cs.CV

GRLoc: Geometric Representation Regression for Visual Localization

Changyang Li , Xuejian Ma , Lixiang Liu , Zhan Li , Qingan Yan , Yi Xu This is my paper

Pith reviewed 2026-05-21 18:03 UTC · model grok-4.3

classification 💻 cs.CV

keywords geometricposecameraregressionrepresentationsabsolutedirectlydisentangled

0 comments

The pith

The paper reformulates absolute pose regression as regressing disentangled world-coordinate raymaps and pointmaps from images, then recovering pose via a differentiable solver, claiming SOTA results on 7-Scenes and Cambridge Landmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard absolute pose regression networks take a photo and directly output the camera's position and orientation. This can cause the model to simply remember the training photos instead of learning the actual 3D layout of the scene. The new method instead asks the network to output two separate geometric maps: one describing the directions of rays from the camera and another describing the 3D points those rays hit. These maps are expressed in the real-world coordinate system. A fixed mathematical solver then turns the two maps into the final camera pose. Because rotation and translation are predicted separately and the final step is not learned, the network is forced to produce geometrically consistent outputs rather than arbitrary numbers that happen to fit the training data.

Core claim

Modeling the inverse rendering process by explicitly regressing disentangled geometric representations (raymap directions for rotation and pointmap for translation) in world coordinates, followed by a differentiable deterministic solver, yields more robust and generalizable absolute pose estimation than direct black-box regression.

Load-bearing premise

The network's predicted raymap and pointmap are sufficiently accurate and consistent with each other that the deterministic solver can recover a correct pose without the prediction errors dominating the final result; this is stated in the abstract as the benefit of separating the visual-to-geometry mapping from the pose calculation.

Figures

Figures reproduced from arXiv: 2511.13864 by Changyang Li, Lixiang Liu, Qingan Yan, Xuejian Ma, Yi Xu, Zhan Li.

**Figure 1.** Figure 1: Localization errors on the 7-Scenes [48] and Cambridge [27] datasets. Our GRLoc outperforms prior APR (green) methods and demonstrates competitive performance against leading RPR (orange) and PPR (blue) approaches. Our refined model GRLocref achieves comparable results with SOTA PPR methods. banks that retrieve the poses of the most visually similar training images [45]. This tendency to “memorize” rathe… view at source ↗

**Figure 2.** Figure 2: Our Geometric Representation Regression (GRR) [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Geometric representations for an example 4 × 4 grid: (1) a ray bundle of world-space view directions and (2) a pointmap of 3D points. While the forward rendering process of NVS (casting rays to form pixels) is well-defined, its inverse is not. Our goal of “inverting” this process, which is to estimate 3D representations from a 2D image, is a learned perception task. It requires considering both local p… view at source ↗

**Figure 4.** Figure 4: An overview of our proposed architecture. Left: the query image is fed into a decoupled dual-branch network. Each branch’s [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of GRLoc’s predictions. †: Cambridge Landmarks [27]; ∗: 7-Scenes [48]. Λ indicates scene-specific upper-limit of translation error. Sequential error bars show rotation errors [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Visual comparisons on 7-Scenes dataset [ [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Absolute Pose Regression (APR) has emerged as a compelling paradigm for visual localization. However, APR models typically operate as black boxes, directly regressing a 6-DoF pose from a query image, which can lead to memorizing training views rather than understanding 3D scene geometry. In this work, we propose a geometrically-grounded alternative. Inspired by novel view synthesis, which renders images from intermediate geometric representations, we reformulate APR as its inverse that regresses the underlying 3D representations directly from the image, and we name this paradigm Geometric Representation Regression (GRR). Our model explicitly predicts two disentangled geometric representations in the world coordinate system: (1) a raymap's directions to estimate camera rotation, and (2) a corresponding pointmap to estimate camera translation. The final camera pose is then recovered from these geometric components using a differentiable deterministic solver. This disentangled approach, which separates the learned visual-to-geometry mapping from the final pose calculation, introduces a strong geometric prior into the network. We find that the explicit decoupling of rotation and translation predictions measurably boosts performance. We demonstrate state-of-the-art performance on 7-Scenes and Cambridge Landmarks datasets, validating that modeling the inverse rendering process is a more robust path toward generalizable absolute pose estimation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper reframes absolute pose regression as predicting world-coordinate raymaps for rotation and pointmaps for translation before a deterministic solver recovers the pose, which adds a geometric prior but leaves the consistency of those two predictions unaddressed.

read the letter

The main thing to know is that the authors treat absolute pose regression as the inverse of novel view synthesis. Instead of regressing a 6-DoF pose directly, the network outputs a raymap whose directions give rotation and a pointmap whose positions give translation, both in world coordinates, and a fixed differentiable solver turns those into the final pose. This separation is the central design choice and is presented as a way to inject an explicit geometric prior rather than letting the network memorize views.

Referee Report

2 major / 2 minor

Summary. The paper proposes GRLoc for absolute pose regression by reformulating the task as geometric representation regression. From a query image it explicitly regresses two disentangled world-coordinate representations—a raymap whose directions encode camera rotation and a pointmap that encodes translation—then recovers the 6-DoF pose with a differentiable deterministic solver. The authors claim that separating the learned visual-to-geometry mapping from the final pose calculation injects a strong geometric prior, yielding better generalization than direct black-box 6-DoF regression, and report state-of-the-art results on the 7-Scenes and Cambridge Landmarks benchmarks.

Significance. If the central claim holds, the work is significant because it replaces end-to-end black-box regression with an explicit inverse-rendering formulation that keeps rotation and translation predictions disentangled and delegates pose recovery to a non-learned solver. The deterministic solver is a clear methodological strength: it removes the risk that the network simply memorizes pose-to-image mappings and supplies a concrete geometric prior that future APR methods can build upon.

major comments (2)

[§3.3] §3.3 (Loss function): the objective optimizes raymap direction error and pointmap position error separately; no term penalizes inconsistency between the rotation implied by the raymap and the translation implied by the pointmap. Because the solver is applied after the fact, any such inconsistency directly undermines the claim that the geometric prior prevents error accumulation.
[§4.2, Table 3] §4.2 and Table 3: the reported ablations compare full GRLoc against direct regression baselines but do not isolate the effect of removing the deterministic solver or of injecting controlled noise into the predicted raymap/pointmap. Without this test the robustness argument remains unverified.

minor comments (2)

[Figure 2] Figure 2: the diagram of the raymap and pointmap heads would be clearer if it explicitly showed the world-coordinate frame and the subsequent solver block.
[§4.1] §4.1: the description of the differentiable solver would benefit from a short derivation or pseudocode showing how the two representations are combined into a single pose.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3.3] §3.3 (Loss function): the objective optimizes raymap direction error and pointmap position error separately; no term penalizes inconsistency between the rotation implied by the raymap and the translation implied by the pointmap. Because the solver is applied after the fact, any such inconsistency directly undermines the claim that the geometric prior prevents error accumulation.

Authors: We thank the referee for this observation. The loss is deliberately applied to the disentangled representations to encourage accurate geometric predictions in world coordinates. The differentiable solver then recovers a consistent pose by construction. We nevertheless agree that an explicit consistency term would provide an additional training signal. In the revised manuscript we will augment the objective in §3.3 with a term that penalizes the discrepancy between the rotation implied by the predicted raymap and the translation implied by the pointmap. revision: yes
Referee: [§4.2, Table 3] §4.2 and Table 3: the reported ablations compare full GRLoc against direct regression baselines but do not isolate the effect of removing the deterministic solver or of injecting controlled noise into the predicted raymap/pointmap. Without this test the robustness argument remains unverified.

Authors: We agree that isolating the solver’s contribution and testing sensitivity to noise would strengthen the robustness claims. In the revised §4.2 and Table 3 we will add two new ablations: (i) replacing the deterministic solver with a small learned head that regresses pose directly from the raymap and pointmap, and (ii) injecting controlled Gaussian noise into the predicted representations before the solver and measuring the resulting pose error. These experiments will directly quantify the benefit of the non-learned solver. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation introduces independent geometric components and solver

full rationale

The paper proposes regressing disentangled world-coordinate raymap directions and pointmap from images, then recovering pose via a deterministic differentiable solver. This explicitly separates visual-to-geometry mapping from pose calculation and is presented as an alternative to direct 6-DoF regression. No equations or steps in the abstract reduce the final pose or performance claims to the inputs by construction, self-definition, or fitted parameters renamed as predictions. The geometric prior is added by design rather than assumed from the target result. Claims are supported by empirical evaluation on standard external benchmarks (7-Scenes, Cambridge Landmarks), with no load-bearing self-citations or uniqueness theorems invoked in the provided text. The derivation remains self-contained against external validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that accurate per-pixel ray directions and 3D points can be regressed from a single image and that these predictions remain consistent enough for the solver; no explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption A single RGB image contains sufficient information to regress accurate world-coordinate ray directions and point locations.
Invoked when the network is asked to output the two geometric representations directly from the query image.

pith-pipeline@v0.9.0 · 5768 in / 1371 out tokens · 68719 ms · 2026-05-21T18:03:34.806606+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our model explicitly predicts two disentangled geometric representations in the world coordinate system: (1) a raymap's directions to estimate camera rotation, and (2) a corresponding pointmap to estimate camera translation. The final camera pose is then recovered from these geometric components using a differentiable deterministic solver.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

decoupling the predictions for rotation and translation is key to improving performance... rotation and translation are competing objectives

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 1 internal anchor

[1]

Distill- pose: Lightweight camera localization using auxiliary learn- ing

Yehya Abouelnaga, Mai Bui, and Slobodan Ilic. Distill- pose: Lightweight camera localization using auxiliary learn- ing. In2021 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 7919–7924. IEEE,

work page
[2]

Map-free visual relocalization: Metric pose relative to a single image

Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, Aron Monszpart, Victor Prisacariu, Dani- yar Turmukhambetov, and Eric Brachmann. Map-free visual relocalization: Metric pose relative to a single image. In European Conference on Computer Vision, pages 690–708. Springer, 2022. 6

work page 2022
[3]

Reloc- net: Continuous metric learning relocalisation using neural nets

Vassileios Balntas, Shuda Li, and Victor Prisacariu. Reloc- net: Continuous metric learning relocalisation using neural nets. InProceedings of the European conference on com- puter vision (ECCV), pages 751–767, 2018. 2

work page 2018
[4]

Visual camera re- localization from rgb and rgb-d images using dsac.IEEE transactions on pattern analysis and machine intelligence, 44(9):5847–5865, 2021

Eric Brachmann and Carsten Rother. Visual camera re- localization from rgb and rgb-d images using dsac.IEEE transactions on pattern analysis and machine intelligence, 44(9):5847–5865, 2021. 2, 8

work page 2021
[5]

Dsac-differentiable ransac for camera localization

Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. Dsac-differentiable ransac for camera localization. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6684–6692, 2017. 2

work page 2017
[6]

On the limits of pseudo ground truth in vi- sual camera re-localisation

Eric Brachmann, Martin Humenberger, Carsten Rother, and Torsten Sattler. On the limits of pseudo ground truth in vi- sual camera re-localisation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6218– 6228, 2021. 6

work page 2021
[7]

Accelerated coordinate encoding: Learning to relocalize in minutes using rgb and poses

Eric Brachmann, Tommaso Cavallari, and Victor Adrian Prisacariu. Accelerated coordinate encoding: Learning to relocalize in minutes using rgb and poses. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5044–5053, 2023. 2, 8

work page 2023
[8]

Geometry-aware learning of maps for cam- era localization

Samarth Brahmbhatt, Jinwei Gu, Kihwan Kim, James Hays, and Jan Kautz. Geometry-aware learning of maps for cam- era localization. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2616–2625,

work page
[9]

Direct- posenet: Absolute pose regression with photometric consis- tency

Shuai Chen, Zirui Wang, and Victor Prisacariu. Direct- posenet: Absolute pose regression with photometric consis- tency. In2021 International Conference on 3D Vision (3DV), pages 1175–1185. IEEE, 2021

work page 2021
[10]

Dfnet: Enhance absolute pose regression with direct feature matching

Shuai Chen, Xinghui Li, Zirui Wang, and Victor A Prisacariu. Dfnet: Enhance absolute pose regression with direct feature matching. InEuropean Conference on Com- puter Vision, pages 1–17. Springer, 2022. 1, 2, 3, 5, 6, 8

work page 2022
[11]

Neural refine- ment for absolute pose regression with feature synthesis

Shuai Chen, Yash Bhalgat, Xinghui Li, Jia-Wang Bian, Kejie Li, Zirui Wang, and Victor Adrian Prisacariu. Neural refine- ment for absolute pose regression with feature synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20987–20996, 2024. 3, 8

work page 2024
[12]

Map-relative pose regression for visual re-localization

Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu, and Eric Brachmann. Map-relative pose regression for visual re-localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20665– 20674, 2024. 2, 6, 8

work page 2024
[13]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 6

work page 2009
[14]

Camnet: Coarse-to-fine retrieval for camera re- localization

Mingyu Ding, Zhe Wang, Jiankai Sun, Jianping Shi, and Ping Luo. Camnet: Coarse-to-fine retrieval for camera re- localization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2871–2880, 2019. 2

work page 2019
[15]

Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization

Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, and Yanchao Yang. Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16739–16752, 2025. 2, 8

work page 2025
[16]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 4, 7

work page internal anchor Pith review Pith/arXiv arXiv 2010
[17]

D2- net: A trainable cnn for joint description and detection of local features

Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Polle- feys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2- net: A trainable cnn for joint description and detection of local features. InProceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 8092–8101,

work page
[18]

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981

Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 2

work page 1981
[19]

Domain-adversarial training of neural networks.Journal of machine learning research, 17(59):1–35, 2016

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pas- cal Germain, Hugo Larochelle, Franc ¸ois Laviolette, Mario March, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of machine learning research, 17(59):1–35, 2016. 5, 6

work page 2016
[20]

Complete solution classification for the perspective-three-point problem.IEEE transactions on pattern analysis and machine intelligence, 25(8):930–943,

Xiao-Shan Gao, Xiao-Rong Hou, Jianliang Tang, and Hang-Fei Cheng. Complete solution classification for the perspective-three-point problem.IEEE transactions on pattern analysis and machine intelligence, 25(8):930–943,

work page
[21]

Feature query networks: Neural surface description for camera pose refinement

Hugo Germain, Daniel DeTone, Geoffrey Pascoe, Tan- ner Schmidt, David Novotny, Richard Newcombe, Chris Sweeney, Richard Szeliski, and Vasileios Balntas. Feature query networks: Neural surface description for camera pose refinement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5071– 5081, 2022. 3, 8

work page 2022
[22]

Learning to produce semi-dense correspondences for visual localization

Khang Truong Giang, Soohwan Song, and Sungho Jo. Learning to produce semi-dense correspondences for visual localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19468– 19478, 2024. 2

work page 2024
[23]

From sparse to dense: Camera relocaliza- tion with scene-specific detector from feature gaussian splat- ting

Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, and Guofeng Zhang. From sparse to dense: Camera relocaliza- tion with scene-specific detector from feature gaussian splat- ting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 27059–27069, 2025. 3, 7, 8

work page 2025
[24]

Robust im- age retrieval-based visual localization using kapture.arXiv preprint arXiv:2007.13867, 2020

Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Vincent Leroy, J´erˆome Revaud, Philippe Rerole, No´e Pion, Cesar De Souza, and Gabriela Csurka. Robust im- age retrieval-based visual localization using kapture.arXiv preprint arXiv:2007.13867, 2020. 2

work page arXiv 2007
[25]

R-score: Revisiting scene coordinate regression for robust large-scale visual localization

Xudong Jiang, Fangjinhua Wang, Silvano Galliani, Christoph V ogel, and Marc Pollefeys. R-score: Revisiting scene coordinate regression for robust large-scale visual localization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11536–11546, 2025. 2

work page 2025
[26]

A solution for the best rotation to relate two sets of vectors.Foundations of Crystallography, 32(5): 922–923, 1976

Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors.Foundations of Crystallography, 32(5): 922–923, 1976. 5

work page 1976
[27]

Posenet: A convolutional network for real-time 6-dof cam- era relocalization

Alex Kendall, Matthew Grimes, and Roberto Cipolla. Posenet: A convolutional network for real-time 6-dof cam- era relocalization. InProceedings of the IEEE international conference on computer vision, pages 2938–2946, 2015. 1, 2, 6, 7, 8

work page 2015
[28]

3d gaussian splatting for real-time radiance field rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

work page
[29]

3d gaussian splat- ting as markov chain monte carlo

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Wei- wei Sun, Yang-Che Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splat- ting as markov chain monte carlo. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. Spotlight Presentation. 6

work page 2024
[30]

Unleashing the power of data synthesis in visual localization.arXiv preprint arXiv:2412.00138, 2024

Sihang Li, Siqi Tan, Bowen Chang, Jing Zhang, Chen Feng, and Yiming Li. Unleashing the power of data synthesis in visual localization.arXiv preprint arXiv:2412.00138, 2024. 1, 2, 3, 5, 6, 7, 8

work page arXiv 2024
[31]

Learning neural volumetric pose features for camera localization

Jingyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan, Renjie Chen, Ligang Liu, and Jieping Ye. Learning neural volumetric pose features for camera localization. InEuropean Conference on Computer Vision, pages 198–214. Springer, 2024. 1, 2, 3, 5, 6, 8

work page 2024
[32]

Lightglue: Local feature matching at light speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. Lightglue: Local feature matching at light speed. In Proceedings of the IEEE/CVF international conference on computer vision, pages 17627–17638, 2023. 2

work page 2023
[33]

Gs-cpr: Efficient camera pose refinement via 3d gaussian splatting.arXiv preprint arXiv:2408.11085, 2024

Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan Hu, Ming Cheng, Zirui Wang, Victor Adrian Prisacariu, and Tristan Braud. Gs-cpr: Efficient camera pose refinement via 3d gaussian splatting.arXiv preprint arXiv:2408.11085, 2024. 3, 6, 8

work page arXiv 2024
[34]

Hr-apr: Apr-agnostic framework with uncertainty estimation and hierarchical re- finement for camera relocalisation

Changkun Liu, Shuai Chen, Yukun Zhao, Huajian Huang, Victor Prisacariu, and Tristan Braud. Hr-apr: Apr-agnostic framework with uncertainty estimation and hierarchical re- finement for camera relocalisation. In2024 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 8544–8550. IEEE, 2024. 3, 8

work page 2024
[35]

Efficient global 2d-3d matching for camera localization in a large-scale 3d map

Liu Liu, Hongdong Li, and Yuchao Dai. Efficient global 2d-3d matching for camera localization in a large-scale 3d map. InProceedings of the IEEE International Conference on Computer Vision, pages 2372–2381, 2017. 2

work page 2017
[36]

Swin transformer v2: Scaling up capacity and resolution

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 12009–12019, 2022. 6

work page 2022
[37]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 1, 3

work page 2021
[38]

Lens: Localization enhanced by nerf synthesis

Arthur Moreau, Nathan Piasco, Dzmitry Tsishkou, Bogdan Stanciulescu, and Arnaud de La Fortelle. Lens: Localization enhanced by nerf synthesis. InConference on Robot Learn- ing, pages 1347–1356. PMLR, 2022. 1, 3, 5, 8

work page 2022
[39]

Crossfire: Camera relocalization on self-supervised features from an implicit representation

Arthur Moreau, Nathan Piasco, Moussab Bennehar, Dzmitry Tsishkou, Bogdan Stanciulescu, and Arnaud de La Fortelle. Crossfire: Camera relocalization on self-supervised features from an implicit representation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 252–262, 2023. 3, 8

work page 2023
[40]

From coarse to fine: Robust hierarchical localization at large scale

Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12716–12725, 2019. 2

work page 2019
[41]

Superglue: Learning feature matching with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020

work page 2020
[42]

Back to the feature: Learning robust camera localization from pixels to pose

Paul-Edouard Sarlin, Ajaykumar Unagar, Mans Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, et al. Back to the feature: Learning robust camera localization from pixels to pose. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3247–3257, 2021

work page 2021
[43]

Fast image- based localization using direct 2d-to-3d matching

Torsten Sattler, Bastian Leibe, and Leif Kobbelt. Fast image- based localization using direct 2d-to-3d matching. In2011 International Conference on Computer Vision, pages 667–

work page
[44]

Efficient & effective prioritized matching for large-scale image-based localization.IEEE transactions on pattern analysis and ma- chine intelligence, 39(9):1744–1756, 2016

Torsten Sattler, Bastian Leibe, and Leif Kobbelt. Efficient & effective prioritized matching for large-scale image-based localization.IEEE transactions on pattern analysis and ma- chine intelligence, 39(9):1744–1756, 2016. 2

work page 2016
[45]

Understanding the limitations of cnn-based absolute camera pose regression

Torsten Sattler, Qunjie Zhou, Marc Pollefeys, and Laura Leal-Taixe. Understanding the limitations of cnn-based absolute camera pose regression. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3302–3312, 2019. 1

work page 2019
[46]

Camera pose auto-encoders for improving pose regression

Yoli Shavit and Yosi Keller. Camera pose auto-encoders for improving pose regression. InEuropean Conference on Computer Vision, pages 140–157. Springer, 2022. 2, 8

work page 2022
[47]

Learning multi- scene absolute pose regression with transformers

Yoli Shavit, Ron Ferens, and Yosi Keller. Learning multi- scene absolute pose regression with transformers. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 2733–2742, 2021. 1, 2, 8

work page 2021
[48]

Scene co- ordinate regression forests for camera relocalization in rgb-d images

Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene co- ordinate regression forests for camera relocalization in rgb-d images. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2930–2937, 2013. 1, 6, 7, 8

work page 2013
[49]

Inloc: Indoor visual localization with dense matching and view synthesis

Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Ak- ihiko Torii. Inloc: Indoor visual localization with dense matching and view synthesis. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7199–7209, 2018. 2

work page 2018
[50]

Learning camera localization via dense scene matching

Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, and Ping Tan. Learning camera localization via dense scene matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1831– 1841, 2021. 2

work page 2021
[51]

The unreasonable effectiveness of pre- trained features for camera pose refinement

Gabriele Trivigno, Carlo Masone, Barbara Caputo, and Torsten Sattler. The unreasonable effectiveness of pre- trained features for camera pose refinement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 12786–12798, 2024. 3, 8

work page 2024
[52]

Adversarial discriminative domain adaptation

Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 7167–7176, 2017. 5

work page 2017
[53]

Glace: Global local accelerated coordinate encoding

Fangjinhua Wang, Xudong Jiang, Silvano Galliani, Christoph V ogel, and Marc Pollefeys. Glace: Global local accelerated coordinate encoding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21562–21571, 2024. 2, 8

work page 2024
[54]

Learning to localize in new environments from syn- thetic training data

Dominik Winkelbauer, Maximilian Denninger, and Rudolph Triebel. Learning to localize in new environments from syn- thetic training data. In2021 IEEE International Confer- ence on Robotics and Automation (ICRA), pages 5840–5846. IEEE, 2021. 2, 8

work page 2021
[55]

gsplat: An open-source library for gaussian splatting.Journal of Machine Learning Research, 26(34):1–17, 2025

Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, et al. gsplat: An open-source library for gaussian splatting.Journal of Machine Learning Research, 26(34):1–17, 2025. 6

work page 2025
[56]

inerf: Inverting neural radiance fields for pose estimation

Lin Yen-Chen, Pete Florence, Jonathan T Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. inerf: Inverting neural radiance fields for pose estimation. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021. 2

work page 2021
[57]

Camera pose voting for large-scale image-based localization

Bernhard Zeisl, Torsten Sattler, and Marc Pollefeys. Camera pose voting for large-scale image-based localization. InPro- ceedings of the IEEE International Conference on Computer Vision, pages 2704–2712, 2015. 2

work page 2015
[58]

Semantic-guided camera ray regression for visual localization

Yesheng Zhang and Xu Zhao. Semantic-guided camera ray regression for visual localization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 25639–25648, 2025. 3

work page 2025
[59]

Pnerfloc: Visual localization with point- based neural radiance fields

Boming Zhao, Luwei Yang, Mao Mao, Hujun Bao, and Zhaopeng Cui. Pnerfloc: Visual localization with point- based neural radiance fields. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7450–7459,

work page
[60]

The nerfect match: Exploring nerf features for visual localization

Qunjie Zhou, Maxim Maximov, Or Litany, and Laura Leal- Taix´e. The nerfect match: Exploring nerf features for visual localization. InEuropean Conference on Computer Vision, pages 108–127. Springer, 2024. 3, 8

work page 2024

[1] [1]

Distill- pose: Lightweight camera localization using auxiliary learn- ing

Yehya Abouelnaga, Mai Bui, and Slobodan Ilic. Distill- pose: Lightweight camera localization using auxiliary learn- ing. In2021 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 7919–7924. IEEE,

work page

[2] [2]

Map-free visual relocalization: Metric pose relative to a single image

Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, Aron Monszpart, Victor Prisacariu, Dani- yar Turmukhambetov, and Eric Brachmann. Map-free visual relocalization: Metric pose relative to a single image. In European Conference on Computer Vision, pages 690–708. Springer, 2022. 6

work page 2022

[3] [3]

Reloc- net: Continuous metric learning relocalisation using neural nets

Vassileios Balntas, Shuda Li, and Victor Prisacariu. Reloc- net: Continuous metric learning relocalisation using neural nets. InProceedings of the European conference on com- puter vision (ECCV), pages 751–767, 2018. 2

work page 2018

[4] [4]

Visual camera re- localization from rgb and rgb-d images using dsac.IEEE transactions on pattern analysis and machine intelligence, 44(9):5847–5865, 2021

Eric Brachmann and Carsten Rother. Visual camera re- localization from rgb and rgb-d images using dsac.IEEE transactions on pattern analysis and machine intelligence, 44(9):5847–5865, 2021. 2, 8

work page 2021

[5] [5]

Dsac-differentiable ransac for camera localization

Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. Dsac-differentiable ransac for camera localization. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6684–6692, 2017. 2

work page 2017

[6] [6]

On the limits of pseudo ground truth in vi- sual camera re-localisation

Eric Brachmann, Martin Humenberger, Carsten Rother, and Torsten Sattler. On the limits of pseudo ground truth in vi- sual camera re-localisation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6218– 6228, 2021. 6

work page 2021

[7] [7]

Accelerated coordinate encoding: Learning to relocalize in minutes using rgb and poses

Eric Brachmann, Tommaso Cavallari, and Victor Adrian Prisacariu. Accelerated coordinate encoding: Learning to relocalize in minutes using rgb and poses. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5044–5053, 2023. 2, 8

work page 2023

[8] [8]

Geometry-aware learning of maps for cam- era localization

Samarth Brahmbhatt, Jinwei Gu, Kihwan Kim, James Hays, and Jan Kautz. Geometry-aware learning of maps for cam- era localization. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2616–2625,

work page

[9] [9]

Direct- posenet: Absolute pose regression with photometric consis- tency

Shuai Chen, Zirui Wang, and Victor Prisacariu. Direct- posenet: Absolute pose regression with photometric consis- tency. In2021 International Conference on 3D Vision (3DV), pages 1175–1185. IEEE, 2021

work page 2021

[10] [10]

Dfnet: Enhance absolute pose regression with direct feature matching

Shuai Chen, Xinghui Li, Zirui Wang, and Victor A Prisacariu. Dfnet: Enhance absolute pose regression with direct feature matching. InEuropean Conference on Com- puter Vision, pages 1–17. Springer, 2022. 1, 2, 3, 5, 6, 8

work page 2022

[11] [11]

Neural refine- ment for absolute pose regression with feature synthesis

Shuai Chen, Yash Bhalgat, Xinghui Li, Jia-Wang Bian, Kejie Li, Zirui Wang, and Victor Adrian Prisacariu. Neural refine- ment for absolute pose regression with feature synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 20987–20996, 2024. 3, 8

work page 2024

[12] [12]

Map-relative pose regression for visual re-localization

Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu, and Eric Brachmann. Map-relative pose regression for visual re-localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20665– 20674, 2024. 2, 6, 8

work page 2024

[13] [13]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 6

work page 2009

[14] [14]

Camnet: Coarse-to-fine retrieval for camera re- localization

Mingyu Ding, Zhe Wang, Jiankai Sun, Jianping Shi, and Ping Luo. Camnet: Coarse-to-fine retrieval for camera re- localization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2871–2880, 2019. 2

work page 2019

[15] [15]

Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization

Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, and Yanchao Yang. Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16739–16752, 2025. 2, 8

work page 2025

[16] [16]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 4, 7

work page internal anchor Pith review Pith/arXiv arXiv 2010

[17] [17]

D2- net: A trainable cnn for joint description and detection of local features

Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Polle- feys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2- net: A trainable cnn for joint description and detection of local features. InProceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 8092–8101,

work page

[18] [18]

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981

Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 2

work page 1981

[19] [19]

Domain-adversarial training of neural networks.Journal of machine learning research, 17(59):1–35, 2016

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pas- cal Germain, Hugo Larochelle, Franc ¸ois Laviolette, Mario March, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of machine learning research, 17(59):1–35, 2016. 5, 6

work page 2016

[20] [20]

Complete solution classification for the perspective-three-point problem.IEEE transactions on pattern analysis and machine intelligence, 25(8):930–943,

Xiao-Shan Gao, Xiao-Rong Hou, Jianliang Tang, and Hang-Fei Cheng. Complete solution classification for the perspective-three-point problem.IEEE transactions on pattern analysis and machine intelligence, 25(8):930–943,

work page

[21] [21]

Feature query networks: Neural surface description for camera pose refinement

Hugo Germain, Daniel DeTone, Geoffrey Pascoe, Tan- ner Schmidt, David Novotny, Richard Newcombe, Chris Sweeney, Richard Szeliski, and Vasileios Balntas. Feature query networks: Neural surface description for camera pose refinement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5071– 5081, 2022. 3, 8

work page 2022

[22] [22]

Learning to produce semi-dense correspondences for visual localization

Khang Truong Giang, Soohwan Song, and Sungho Jo. Learning to produce semi-dense correspondences for visual localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19468– 19478, 2024. 2

work page 2024

[23] [23]

From sparse to dense: Camera relocaliza- tion with scene-specific detector from feature gaussian splat- ting

Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, and Guofeng Zhang. From sparse to dense: Camera relocaliza- tion with scene-specific detector from feature gaussian splat- ting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 27059–27069, 2025. 3, 7, 8

work page 2025

[24] [24]

Robust im- age retrieval-based visual localization using kapture.arXiv preprint arXiv:2007.13867, 2020

Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Vincent Leroy, J´erˆome Revaud, Philippe Rerole, No´e Pion, Cesar De Souza, and Gabriela Csurka. Robust im- age retrieval-based visual localization using kapture.arXiv preprint arXiv:2007.13867, 2020. 2

work page arXiv 2007

[25] [25]

R-score: Revisiting scene coordinate regression for robust large-scale visual localization

Xudong Jiang, Fangjinhua Wang, Silvano Galliani, Christoph V ogel, and Marc Pollefeys. R-score: Revisiting scene coordinate regression for robust large-scale visual localization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11536–11546, 2025. 2

work page 2025

[26] [26]

A solution for the best rotation to relate two sets of vectors.Foundations of Crystallography, 32(5): 922–923, 1976

Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors.Foundations of Crystallography, 32(5): 922–923, 1976. 5

work page 1976

[27] [27]

Posenet: A convolutional network for real-time 6-dof cam- era relocalization

Alex Kendall, Matthew Grimes, and Roberto Cipolla. Posenet: A convolutional network for real-time 6-dof cam- era relocalization. InProceedings of the IEEE international conference on computer vision, pages 2938–2946, 2015. 1, 2, 6, 7, 8

work page 2015

[28] [28]

3d gaussian splatting for real-time radiance field rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

work page

[29] [29]

3d gaussian splat- ting as markov chain monte carlo

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Wei- wei Sun, Yang-Che Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splat- ting as markov chain monte carlo. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. Spotlight Presentation. 6

work page 2024

[30] [30]

Unleashing the power of data synthesis in visual localization.arXiv preprint arXiv:2412.00138, 2024

Sihang Li, Siqi Tan, Bowen Chang, Jing Zhang, Chen Feng, and Yiming Li. Unleashing the power of data synthesis in visual localization.arXiv preprint arXiv:2412.00138, 2024. 1, 2, 3, 5, 6, 7, 8

work page arXiv 2024

[31] [31]

Learning neural volumetric pose features for camera localization

Jingyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan, Renjie Chen, Ligang Liu, and Jieping Ye. Learning neural volumetric pose features for camera localization. InEuropean Conference on Computer Vision, pages 198–214. Springer, 2024. 1, 2, 3, 5, 6, 8

work page 2024

[32] [32]

Lightglue: Local feature matching at light speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. Lightglue: Local feature matching at light speed. In Proceedings of the IEEE/CVF international conference on computer vision, pages 17627–17638, 2023. 2

work page 2023

[33] [33]

Gs-cpr: Efficient camera pose refinement via 3d gaussian splatting.arXiv preprint arXiv:2408.11085, 2024

Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan Hu, Ming Cheng, Zirui Wang, Victor Adrian Prisacariu, and Tristan Braud. Gs-cpr: Efficient camera pose refinement via 3d gaussian splatting.arXiv preprint arXiv:2408.11085, 2024. 3, 6, 8

work page arXiv 2024

[34] [34]

Hr-apr: Apr-agnostic framework with uncertainty estimation and hierarchical re- finement for camera relocalisation

Changkun Liu, Shuai Chen, Yukun Zhao, Huajian Huang, Victor Prisacariu, and Tristan Braud. Hr-apr: Apr-agnostic framework with uncertainty estimation and hierarchical re- finement for camera relocalisation. In2024 IEEE Inter- national Conference on Robotics and Automation (ICRA), pages 8544–8550. IEEE, 2024. 3, 8

work page 2024

[35] [35]

Efficient global 2d-3d matching for camera localization in a large-scale 3d map

Liu Liu, Hongdong Li, and Yuchao Dai. Efficient global 2d-3d matching for camera localization in a large-scale 3d map. InProceedings of the IEEE International Conference on Computer Vision, pages 2372–2381, 2017. 2

work page 2017

[36] [36]

Swin transformer v2: Scaling up capacity and resolution

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 12009–12019, 2022. 6

work page 2022

[37] [37]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 1, 3

work page 2021

[38] [38]

Lens: Localization enhanced by nerf synthesis

Arthur Moreau, Nathan Piasco, Dzmitry Tsishkou, Bogdan Stanciulescu, and Arnaud de La Fortelle. Lens: Localization enhanced by nerf synthesis. InConference on Robot Learn- ing, pages 1347–1356. PMLR, 2022. 1, 3, 5, 8

work page 2022

[39] [39]

Crossfire: Camera relocalization on self-supervised features from an implicit representation

Arthur Moreau, Nathan Piasco, Moussab Bennehar, Dzmitry Tsishkou, Bogdan Stanciulescu, and Arnaud de La Fortelle. Crossfire: Camera relocalization on self-supervised features from an implicit representation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 252–262, 2023. 3, 8

work page 2023

[40] [40]

From coarse to fine: Robust hierarchical localization at large scale

Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12716–12725, 2019. 2

work page 2019

[41] [41]

Superglue: Learning feature matching with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020

work page 2020

[42] [42]

Back to the feature: Learning robust camera localization from pixels to pose

Paul-Edouard Sarlin, Ajaykumar Unagar, Mans Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, et al. Back to the feature: Learning robust camera localization from pixels to pose. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 3247–3257, 2021

work page 2021

[43] [43]

Fast image- based localization using direct 2d-to-3d matching

Torsten Sattler, Bastian Leibe, and Leif Kobbelt. Fast image- based localization using direct 2d-to-3d matching. In2011 International Conference on Computer Vision, pages 667–

work page

[44] [44]

Efficient & effective prioritized matching for large-scale image-based localization.IEEE transactions on pattern analysis and ma- chine intelligence, 39(9):1744–1756, 2016

Torsten Sattler, Bastian Leibe, and Leif Kobbelt. Efficient & effective prioritized matching for large-scale image-based localization.IEEE transactions on pattern analysis and ma- chine intelligence, 39(9):1744–1756, 2016. 2

work page 2016

[45] [45]

Understanding the limitations of cnn-based absolute camera pose regression

Torsten Sattler, Qunjie Zhou, Marc Pollefeys, and Laura Leal-Taixe. Understanding the limitations of cnn-based absolute camera pose regression. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3302–3312, 2019. 1

work page 2019

[46] [46]

Camera pose auto-encoders for improving pose regression

Yoli Shavit and Yosi Keller. Camera pose auto-encoders for improving pose regression. InEuropean Conference on Computer Vision, pages 140–157. Springer, 2022. 2, 8

work page 2022

[47] [47]

Learning multi- scene absolute pose regression with transformers

Yoli Shavit, Ron Ferens, and Yosi Keller. Learning multi- scene absolute pose regression with transformers. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 2733–2742, 2021. 1, 2, 8

work page 2021

[48] [48]

Scene co- ordinate regression forests for camera relocalization in rgb-d images

Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene co- ordinate regression forests for camera relocalization in rgb-d images. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2930–2937, 2013. 1, 6, 7, 8

work page 2013

[49] [49]

Inloc: Indoor visual localization with dense matching and view synthesis

Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Ak- ihiko Torii. Inloc: Indoor visual localization with dense matching and view synthesis. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7199–7209, 2018. 2

work page 2018

[50] [50]

Learning camera localization via dense scene matching

Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, and Ping Tan. Learning camera localization via dense scene matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1831– 1841, 2021. 2

work page 2021

[51] [51]

The unreasonable effectiveness of pre- trained features for camera pose refinement

Gabriele Trivigno, Carlo Masone, Barbara Caputo, and Torsten Sattler. The unreasonable effectiveness of pre- trained features for camera pose refinement. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 12786–12798, 2024. 3, 8

work page 2024

[52] [52]

Adversarial discriminative domain adaptation

Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 7167–7176, 2017. 5

work page 2017

[53] [53]

Glace: Global local accelerated coordinate encoding

Fangjinhua Wang, Xudong Jiang, Silvano Galliani, Christoph V ogel, and Marc Pollefeys. Glace: Global local accelerated coordinate encoding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21562–21571, 2024. 2, 8

work page 2024

[54] [54]

Learning to localize in new environments from syn- thetic training data

Dominik Winkelbauer, Maximilian Denninger, and Rudolph Triebel. Learning to localize in new environments from syn- thetic training data. In2021 IEEE International Confer- ence on Robotics and Automation (ICRA), pages 5840–5846. IEEE, 2021. 2, 8

work page 2021

[55] [55]

gsplat: An open-source library for gaussian splatting.Journal of Machine Learning Research, 26(34):1–17, 2025

Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, et al. gsplat: An open-source library for gaussian splatting.Journal of Machine Learning Research, 26(34):1–17, 2025. 6

work page 2025

[56] [56]

inerf: Inverting neural radiance fields for pose estimation

Lin Yen-Chen, Pete Florence, Jonathan T Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. inerf: Inverting neural radiance fields for pose estimation. In2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021. 2

work page 2021

[57] [57]

Camera pose voting for large-scale image-based localization

Bernhard Zeisl, Torsten Sattler, and Marc Pollefeys. Camera pose voting for large-scale image-based localization. InPro- ceedings of the IEEE International Conference on Computer Vision, pages 2704–2712, 2015. 2

work page 2015

[58] [58]

Semantic-guided camera ray regression for visual localization

Yesheng Zhang and Xu Zhao. Semantic-guided camera ray regression for visual localization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 25639–25648, 2025. 3

work page 2025

[59] [59]

Pnerfloc: Visual localization with point- based neural radiance fields

Boming Zhao, Luwei Yang, Mao Mao, Hujun Bao, and Zhaopeng Cui. Pnerfloc: Visual localization with point- based neural radiance fields. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7450–7459,

work page

[60] [60]

The nerfect match: Exploring nerf features for visual localization

Qunjie Zhou, Maxim Maximov, Or Litany, and Laura Leal- Taix´e. The nerfect match: Exploring nerf features for visual localization. InEuropean Conference on Computer Vision, pages 108–127. Springer, 2024. 3, 8

work page 2024