On the Role of Geometry in Geo-Localization

Ariel Shamir; Moti Kadosh; Yael Moses

arxiv: 1906.10855 · v1 · pith:ONQVMKHGnew · submitted 2019-06-26 · 💻 cs.CV

On the Role of Geometry in Geo-Localization

Moti Kadosh , Yael Moses , Ariel Shamir This is my paper

Pith reviewed 2026-05-25 16:12 UTC · model grok-4.3

classification 💻 cs.CV

keywords geo-localizationcamera pose estimationconvolutional neural networkslean imagesgeometric learning3D scenepose recovery

0 comments

The pith

A convolutional neural network can recover camera pose from lean images that contain only geometric cues such as edges and relative depth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether CNNs perform geo-localization by learning scene geometry or by other means. It does this by training and testing on lean images projected from a minimal 3D city model that strips away texture and fine details. The network succeeds at estimating pose, and the authors conclude the success reflects geometric learning of the area rather than memorization. A sympathetic reader would care because the result isolates how much of the network's power comes from understanding 3D structure alone.

Core claim

The network is capable of estimating the camera pose from the lean images, and it does so not by memorization but by some measure of geometric learning of the geographical area. The main contributions are providing insight into the role of geometry in the CNN learning process and demonstrating the power of CNNs for recovering camera pose using lean images.

What carries the argument

Lean images: projections from a simple 3D city model that contain solely geometric information (edges, faces, or relative depth). They isolate geometry so the network must rely on it for pose estimation.

If this is right

CNNs can estimate camera pose using only geometric information without texture.
The network learns the geometry of the geographical area rather than memorizing specific images.
This approach supplies a way to measure the contribution of geometry inside CNN-based localization.
Pose recovery remains possible when input is restricted to edges, faces, and relative depth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lean-image protocol could be used to test geometric learning in other vision tasks such as object recognition.
Results suggest that depth sensors or edge detectors alone might support localization in structured environments if the network is trained accordingly.
Extending the method to real-world depth maps instead of synthetic lean images would test whether the geometric learning transfers outside the simple 3D model.

Load-bearing premise

The lean-image construction and experimental protocol successfully isolate pure geometric cues so that observed performance reflects geometric learning rather than unintended patterns or memorization.

What would settle it

Train on lean images of one set of viewpoints and test on lean images from the same model but with geometry altered (for example by swapping building heights while keeping edge patterns similar) and check whether accuracy drops sharply.

Figures

Figures reproduced from arXiv: 1906.10855 by Ariel Shamir, Moti Kadosh, Yael Moses.

**Figure 1.** Figure 1: Top: lean images contain mostly geometric features: edges (left), faces (center), and depth information (right). We train a CNN to solve the localization problem using such images alone. Bottom: a top view of a city area (buildings are marked as white) where color indicates the localization success rate of the network from red (high) to blue (low). For instance, note how open spaces are more distinct than… view at source ↗

**Figure 2.** Figure 2: Bird’s-eye view of one of the areas we used. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Example of sampling positions on a area of the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration in 2D of the evaluation measures for [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Transfer learning: learning from scratch vs. start [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Humans can build a mental map of a geographical area to find their way and recognize places. The basic task we consider is geo-localization - finding the pose (position & orientation) of a camera in a large 3D scene from a single image. We aim to experimentally explore the role of geometry in geo-localization in a convolutional neural network (CNN) solution. We do so by ignoring the often available texture of the scene. We therefore deliberately avoid using texture or rich geometric details and use images projected from a simple 3D model of a city, which we term lean images. Lean images contain solely information that relates to the geometry of the area viewed (edges, faces, or relative depth). We find that the network is capable of estimating the camera pose from the lean images, and it does so not by memorization but by some measure of geometric learning of the geographical area. The main contributions of this paper are: (i) providing insight into the role of geometry in the CNN learning process; and (ii) demonstrating the power of CNNs for recovering camera pose using lean images.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CNNs recover pose from edge-and-depth-only lean images, but the abstract supplies no numbers or controls to back the geometric-learning claim over memorization.

read the letter

The paper tests whether a CNN can estimate camera pose using only very stripped-down images that contain edges, faces, or relative depth from a simple 3D city model. The central result is that the network succeeds on these lean images and appears to do so by learning something about the geometry of the area rather than memorizing views. That framing is the main new element: deliberately removing texture and rich detail to isolate geometry as the cue. The setup is straightforward and directly targets the question of what the network actually uses. If the experiments hold, the finding could affect how people think about training data for visual localization tasks. The authors also claim the behavior is not memorization, which is the part that would matter most for the insight. The approach itself is clean enough that it earns credit for asking a focused question that prior geo-localization papers have not framed exactly this way. The soft spot is the complete lack of reported numbers. The abstract states success and non-memorization but gives no error rates, no dataset sizes, no baseline comparisons, and no description of the controls used to rule out other patterns. Without those details the claim stays at the level of an assertion. The experimental protocol for building the lean images also needs to be checked to confirm it truly removes everything except geometry. This work is aimed at computer vision researchers who study pose estimation or try to understand the features CNNs rely on. A reader already working on visual navigation or interpretability questions would get the most out of the experimental design. It is worth sending to peer review because the question is well-posed and the method is simple to evaluate; a referee can require the missing metrics and controls without changing the core idea. The paper is not ready as is, but the underlying experiment deserves that step.

Referee Report

2 major / 2 minor

Summary. The manuscript explores the role of geometry in geo-localization using convolutional neural networks by training on 'lean images' generated from a simple 3D city model, which contain only geometric features such as edges, faces, or relative depth without texture. The central claim is that the CNN can estimate camera pose from these images via geometric learning of the geographical area rather than memorization.

Significance. Should the experimental results be substantiated with quantitative evidence and controls, the findings would contribute to understanding the extent to which CNNs can rely on pure geometric cues for pose estimation tasks, potentially informing the design of more robust localization systems.

major comments (2)

[Abstract] Abstract: the claim of successful pose estimation and non-memorization is asserted without any quantitative metrics, error bars, dataset sizes, or explicit controls for memorization, leaving the central empirical claim only partially supported by the provided information.
[Method / Experiments] The experimental protocol for constructing lean images and ruling out memorization (e.g., via held-out test views or ablation on geometric components) is not described in sufficient detail to confirm that observed performance isolates geometric learning rather than unintended patterns.

minor comments (2)

Provide the specific CNN architecture, loss function, and training hyperparameters used for the pose estimation task.
Clarify how the three variants of lean images (edges, faces, relative depth) were generated and whether results are reported separately for each.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of results and methods.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of successful pose estimation and non-memorization is asserted without any quantitative metrics, error bars, dataset sizes, or explicit controls for memorization, leaving the central empirical claim only partially supported by the provided information.

Authors: We agree that the abstract, as a concise summary, does not include the quantitative details present in the experiments section. In the revision we will update the abstract to report key metrics on pose estimation accuracy, dataset sizes, and the use of held-out views that support the claim of geometric learning over memorization. revision: yes
Referee: [Method / Experiments] The experimental protocol for constructing lean images and ruling out memorization (e.g., via held-out test views or ablation on geometric components) is not described in sufficient detail to confirm that observed performance isolates geometric learning rather than unintended patterns.

Authors: We will expand the methods and experiments sections to include a more precise description of lean-image generation from the 3D model (specifying retained geometric elements such as edges and depth) and the exact train/test protocol using held-out views. This will clarify how the setup isolates geometric cues. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This is an empirical experimental study reporting CNN performance on lean images (projections from a simple 3D city model containing only edges/faces/relative depth) for camera pose estimation. No mathematical derivation chain, equations, fitted parameters, or self-referential definitions exist in the claims. The central claim rests on experimental results indicating geometric learning rather than memorization, with no load-bearing steps that reduce to inputs by construction, self-citation, or ansatz smuggling. This matches the default expectation for non-circular empirical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard assumptions from deep learning for computer vision and the premise that lean images isolate geometry. No free parameters, invented entities, or non-standard axioms are mentioned.

axioms (1)

domain assumption Convolutional neural networks can be trained to regress camera pose from image-derived geometric features
Invoked implicitly by the choice to train a CNN on lean images for pose estimation.

pith-pipeline@v0.9.0 · 5722 in / 1205 out tokens · 70331 ms · 2026-05-25T16:12:05.280063+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 2 internal anchors

[1]

Baatz, O

G. Baatz, O. Saurer, K. K ¨oser, and M. Pollefeys. Large scale visual geo-localization of images in mountainous terrain. In European Conference on Computer Vision (ECCV) , pages 517–530. Springer, 2012. 3

work page 2012
[2]

Bansal and K

M. Bansal and K. Daniilidis. Geometric urban geo- localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3978–3985. IEEE, 2014. 1, 3

work page 2014
[3]

Berlin 3d city model, 2016

Berlin Partner fr Wirtschaft und Technolo- gie GmbH. Berlin 3d city model, 2016. https://www.businesslocationcenter.de/en/W A/B/seite0.jsp. 2, 4

work page 2016
[4]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2009. 3, 4

work page 2009
[5]

M. A. Fischler and R. C. Bolles. Random sample consen- sus: a paradigm for model ﬁtting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. 3

work page 1981
[6]

R. M. Haralick, H. Joo, C. Lee, X. Zhuang, V . G. Vaidya, and M. B. Kim. Pose estimation from corresponding point data. IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1426–1446, Nov 1989. 3

work page 1989
[7]

Hays and A

J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008. 2, 3

work page 2008
[8]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 770–778, 2016. 3, 4

work page 2016
[9]

Irschara, C

A. Irschara, C. Zach, J. Frahm, and H. Bischof. From structure-from-motion point clouds to fast location recogni- tion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2599–2606, June 2009. 3

work page 2009
[10]

Kendall and R

A. Kendall and R. Cipolla. Modelling uncertainty in deep learning for camera relocalization. In IEEE International Conference on Robotics and Automation (ICRA) , pages 4762–4769. IEEE, 2016. 3

work page 2016
[11]

Kendall, R

A. Kendall, R. Cipolla, et al. Geometric loss functions for camera pose regression with deep learning. In IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR) , volume 3, page 8, 2017. 3

work page 2017
[12]

Kendall, M

A. Kendall, M. Grimes, and R. Cipolla. Posenet: A convolu- tional network for real-time 6-dof camera relocalization. In IEEE International Conference on Computer Vision (ICCV), pages 2938–2946. IEEE, 2015. 2, 3, 4

work page 2015
[13]

H. Li. Consensus set maximization with guaranteed global optimality for robust geometry estimation. In IEEE Interna- tional Conference on Computer Vision (ICCV), pages 1074– 1080, Sept 2009. 3

work page 2009
[14]

Y . Li, N. Snavely, and D. P. Huttenlocher. Location recogni- tion using prioritized feature matching. In European Confer- ence on Computer Vision (ECCV), pages 791–804. Springer,

work page
[15]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision , 60(2):91–110, Nov 2004. 1, 2, 3

work page 2004
[16]

B. C. Matei, N. V . Valk, Z. Zhu, H. Cheng, and H. S. Sawh- ney. Image to lidar matching for geotagging in urban envi- ronments. In IEEE Workshop on Applications of Computer Vision (WACV), pages 413–420, Jan 2013. 3

work page 2013
[17]

Image-based Localization using Hourglass Networks

I. Melekhov, J. Ylioinas, J. Kannala, and E. Rahtu. Image- based localization using hourglass networks. arXiv preprint arXiv:1703.07971, 2017. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

Nister and H

D. Nister and H. Stewenius. Scalable recognition with a vo- cabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , volume 2, pages 2161–2168, June 2006. 2

work page 2006
[19]

Piasco, D

N. Piasco, D. Sidib ´e, C. Demonceaux, and V . Gouet-Brunet. A survey on visual-based localization: On the beneﬁt of het- erogeneous data. Pattern Recognition, 74:90–109, 2018. 3

work page 2018
[20]

Ramalingam, S

S. Ramalingam, S. Bouaziz, and P. Sturm. Pose estimation using both points and lines for geo-localization. In IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 4716–4723. IEEE, 2011. 1, 3

work page 2011
[21]

D. P. Robertson and R. Cipolla. An image-based system for urban navigation. In British Machine Vision Conference (BMVC), volume 19, page 165, 2004. 2, 3

work page 2004
[22]

Russakovsky, J

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. 3, 4

work page 2015
[23]

Sattler, B

T. Sattler, B. Leibe, and L. Kobbelt. Fast image-based lo- calization using direct 2d-to-3d matching. In IEEE Interna- tional Conference on Computer Vision (ICCV) , pages 667– 674, Nov 2011. 3

work page 2011
[24]

Sattler, B

T. Sattler, B. Leibe, and L. Kobbelt. Improving image-based localization by active correspondence search. In European Conference on Computer Vision (ECCV) , pages 752–765. Springer, 2012. 3

work page 2012
[25]

Schindler, M

G. Schindler, M. Brown, and R. Szeliski. City-scale location recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–7, June 2007. 2

work page 2007
[26]

S. Se, D. Lowe, and J. Little. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. The International Journal of Robotics Research, 21(8):735–758, 2002. 1, 3

work page 2002
[27]

Video google: a text retrieval approach to object matching in videos

Sivic and Zisserman. Video google: a text retrieval approach to object matching in videos. In IEEE International Confer- ence on Computer Vision (ICCV) , pages 1470–1477 vol.2, Oct 2003. 2, 3

work page 2003
[28]

Sv ¨arm, O

L. Sv ¨arm, O. Enqvist, F. Kahl, and M. Oskarsson. City-scale localization for cameras with known vertical direction.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(7):1455–1461, 2017. 3

work page 2017
[29]

Svarm, O

L. Svarm, O. Enqvist, M. Oskarsson, and F. Kahl. Accu- rate localization and pose estimation for large 3d models. In IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 532–539, 2014. 3

work page 2014
[30]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3, 4

work page 2015
[31]

Walch, C

F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsen- beck, and D. Cremers. Image-based localization using lstms for structured feature correlation. In IEEE International Conference on Computer Vision (ICCV) , volume 1, page 3,

work page
[32]

O. Wiki. Osm-3d.org — openstreetmap wiki,, 2018. [Online; accessed 1-November-2018]. 9

work page 2018
[33]

Understanding deep learning requires rethinking generalization

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generaliza- tion. arXiv preprint arXiv:1611.03530, 2016. 1

work page internal anchor Pith review Pith/arXiv arXiv 2016
[34]

Zhang and J

W. Zhang and J. Kosecka. Image based localization in urban environments. In Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06) , pages 33–40, June 2006. 2, 3

work page 2006

[1] [1]

Baatz, O

G. Baatz, O. Saurer, K. K ¨oser, and M. Pollefeys. Large scale visual geo-localization of images in mountainous terrain. In European Conference on Computer Vision (ECCV) , pages 517–530. Springer, 2012. 3

work page 2012

[2] [2]

Bansal and K

M. Bansal and K. Daniilidis. Geometric urban geo- localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3978–3985. IEEE, 2014. 1, 3

work page 2014

[3] [3]

Berlin 3d city model, 2016

Berlin Partner fr Wirtschaft und Technolo- gie GmbH. Berlin 3d city model, 2016. https://www.businesslocationcenter.de/en/W A/B/seite0.jsp. 2, 4

work page 2016

[4] [4]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2009. 3, 4

work page 2009

[5] [5]

M. A. Fischler and R. C. Bolles. Random sample consen- sus: a paradigm for model ﬁtting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981. 3

work page 1981

[6] [6]

R. M. Haralick, H. Joo, C. Lee, X. Zhuang, V . G. Vaidya, and M. B. Kim. Pose estimation from corresponding point data. IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1426–1446, Nov 1989. 3

work page 1989

[7] [7]

Hays and A

J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008. 2, 3

work page 2008

[8] [8]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 770–778, 2016. 3, 4

work page 2016

[9] [9]

Irschara, C

A. Irschara, C. Zach, J. Frahm, and H. Bischof. From structure-from-motion point clouds to fast location recogni- tion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2599–2606, June 2009. 3

work page 2009

[10] [10]

Kendall and R

A. Kendall and R. Cipolla. Modelling uncertainty in deep learning for camera relocalization. In IEEE International Conference on Robotics and Automation (ICRA) , pages 4762–4769. IEEE, 2016. 3

work page 2016

[11] [11]

Kendall, R

A. Kendall, R. Cipolla, et al. Geometric loss functions for camera pose regression with deep learning. In IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR) , volume 3, page 8, 2017. 3

work page 2017

[12] [12]

Kendall, M

A. Kendall, M. Grimes, and R. Cipolla. Posenet: A convolu- tional network for real-time 6-dof camera relocalization. In IEEE International Conference on Computer Vision (ICCV), pages 2938–2946. IEEE, 2015. 2, 3, 4

work page 2015

[13] [13]

H. Li. Consensus set maximization with guaranteed global optimality for robust geometry estimation. In IEEE Interna- tional Conference on Computer Vision (ICCV), pages 1074– 1080, Sept 2009. 3

work page 2009

[14] [14]

Y . Li, N. Snavely, and D. P. Huttenlocher. Location recogni- tion using prioritized feature matching. In European Confer- ence on Computer Vision (ECCV), pages 791–804. Springer,

work page

[15] [15]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision , 60(2):91–110, Nov 2004. 1, 2, 3

work page 2004

[16] [16]

B. C. Matei, N. V . Valk, Z. Zhu, H. Cheng, and H. S. Sawh- ney. Image to lidar matching for geotagging in urban envi- ronments. In IEEE Workshop on Applications of Computer Vision (WACV), pages 413–420, Jan 2013. 3

work page 2013

[17] [17]

Image-based Localization using Hourglass Networks

I. Melekhov, J. Ylioinas, J. Kannala, and E. Rahtu. Image- based localization using hourglass networks. arXiv preprint arXiv:1703.07971, 2017. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [18]

Nister and H

D. Nister and H. Stewenius. Scalable recognition with a vo- cabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , volume 2, pages 2161–2168, June 2006. 2

work page 2006

[19] [19]

Piasco, D

N. Piasco, D. Sidib ´e, C. Demonceaux, and V . Gouet-Brunet. A survey on visual-based localization: On the beneﬁt of het- erogeneous data. Pattern Recognition, 74:90–109, 2018. 3

work page 2018

[20] [20]

Ramalingam, S

S. Ramalingam, S. Bouaziz, and P. Sturm. Pose estimation using both points and lines for geo-localization. In IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 4716–4723. IEEE, 2011. 1, 3

work page 2011

[21] [21]

D. P. Robertson and R. Cipolla. An image-based system for urban navigation. In British Machine Vision Conference (BMVC), volume 19, page 165, 2004. 2, 3

work page 2004

[22] [22]

Russakovsky, J

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. 3, 4

work page 2015

[23] [23]

Sattler, B

T. Sattler, B. Leibe, and L. Kobbelt. Fast image-based lo- calization using direct 2d-to-3d matching. In IEEE Interna- tional Conference on Computer Vision (ICCV) , pages 667– 674, Nov 2011. 3

work page 2011

[24] [24]

Sattler, B

T. Sattler, B. Leibe, and L. Kobbelt. Improving image-based localization by active correspondence search. In European Conference on Computer Vision (ECCV) , pages 752–765. Springer, 2012. 3

work page 2012

[25] [25]

Schindler, M

G. Schindler, M. Brown, and R. Szeliski. City-scale location recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–7, June 2007. 2

work page 2007

[26] [26]

S. Se, D. Lowe, and J. Little. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. The International Journal of Robotics Research, 21(8):735–758, 2002. 1, 3

work page 2002

[27] [27]

Video google: a text retrieval approach to object matching in videos

Sivic and Zisserman. Video google: a text retrieval approach to object matching in videos. In IEEE International Confer- ence on Computer Vision (ICCV) , pages 1470–1477 vol.2, Oct 2003. 2, 3

work page 2003

[28] [28]

Sv ¨arm, O

L. Sv ¨arm, O. Enqvist, F. Kahl, and M. Oskarsson. City-scale localization for cameras with known vertical direction.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(7):1455–1461, 2017. 3

work page 2017

[29] [29]

Svarm, O

L. Svarm, O. Enqvist, M. Oskarsson, and F. Kahl. Accu- rate localization and pose estimation for large 3d models. In IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 532–539, 2014. 3

work page 2014

[30] [30]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3, 4

work page 2015

[31] [31]

Walch, C

F. Walch, C. Hazirbas, L. Leal-Taixe, T. Sattler, S. Hilsen- beck, and D. Cremers. Image-based localization using lstms for structured feature correlation. In IEEE International Conference on Computer Vision (ICCV) , volume 1, page 3,

work page

[32] [32]

O. Wiki. Osm-3d.org — openstreetmap wiki,, 2018. [Online; accessed 1-November-2018]. 9

work page 2018

[33] [33]

Understanding deep learning requires rethinking generalization

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generaliza- tion. arXiv preprint arXiv:1611.03530, 2016. 1

work page internal anchor Pith review Pith/arXiv arXiv 2016

[34] [34]

Zhang and J

W. Zhang and J. Kosecka. Image based localization in urban environments. In Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06) , pages 33–40, June 2006. 2, 3

work page 2006