pith. sign in

arxiv: 2606.22094 · v1 · pith:ARDJRAR4new · submitted 2026-06-20 · 💻 cs.CV

Cross-View Yaw Estimation in Location Uncertainty with Line-Aligning Yaw Scoring

Pith reviewed 2026-06-26 12:27 UTC · model grok-4.3

classification 💻 cs.CV
keywords yaw estimationcross-view localizationbird's eye viewground viewline consensus votingradial invariancepose estimationvisual localization
0
0 comments X

The pith

A radially invariant line-consensus voting method estimates yaw to sub-degree precision without requiring accurate location.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LAYS to address yaw estimation as a bottleneck in cross-view localization between ground images and bird's eye view maps. It separates yaw from translation by noting that a matched ground column and BEV pixels imply identical yaw for any camera position along the radial line. Votes from feature-based matches are accumulated in 3D bins over candidate poses, forming a sharp peak at the correct yaw. This produces large accuracy gains on standard datasets and serves as a prior that improves full 3-DoF localization.

Core claim

LAYS is a radially invariant line-consensus voting method. It matches BEV pixels to ground-image columns using feature similarity. Each such match induces a yaw value that remains constant across all camera positions along the radial direction of the pixels. These induced yaws are accumulated into discrete 3D bins; correct correspondences concentrate into a sharp peak that identifies the true yaw to sub-degree precision, removing any dependence on accurate location.

What carries the argument

LAYS, the 3D voting scheme that accumulates yaw values induced by feature matches between ground columns and BEV pixels, using their radial invariance to concentrate correct votes into a peak.

If this is right

  • Sub-degree yaw precision is reached by 3D voting over all candidate poses.
  • Gains of 28 to 45 percentage points occur under unknown yaw, especially with normal field of view.
  • Using the estimated yaw as a prior measurably improves downstream 3-DoF localization.
  • The method succeeds on Mapillary, Ford, KITTI, and VIGOR without location accuracy.
  • Yaw can be recovered independently of translation estimates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The radial-invariance voting could be paired with separate translation solvers to produce initial 3-DoF pose estimates from scratch.
  • Similar line-consensus ideas might apply to other single-degree-of-freedom ambiguities in visual localization.
  • Performance under seasonal or lighting changes would test whether feature similarity remains sufficient.
  • The discrete 3D binning could be extended to joint yaw-and-pitch estimation if radial invariance holds in additional axes.

Load-bearing premise

A ground-image column matched to BEV pixels produces the same yaw for every camera position along the radial line of those pixels, and feature similarity can reliably locate such matches under location uncertainty.

What would settle it

An experiment on ground-BEV image pairs in which radial invariance is violated or feature matches produce no concentrated peak at the ground-truth yaw would falsify the claim of reliable sub-degree estimation.

Figures

Figures reproduced from arXiv: 2606.22094 by Nairan Zhang, Taeho Kang, Yelin Kim, Youngki Lee, Yujiao Shi.

Figure 1
Figure 1. Figure 1: (a) Existing methods rely on pixel-to-pixel correspondences. Their BEV pro￾jections shift based on the assumed camera pose entangling location and yaw, and require a distance estimate d(h) dependent on ground height h. (b) LAYS aggregates ground pixels vertically into a column and aligns it to a BEV radial direction. This height-free, column-to-line correspondence makes estimated yaw invariant for any can￾… view at source ↗
Figure 2
Figure 2. Figure 2: Yaw Voting Mechanism. (a) A ground column defines a specific relative yaw. (b) For any candidate pose, the vector toward its matched BEV pixel defines an absolute angle. The resulting true yaw is calculated by subtracting the relative yaw from this absolute angle. (c) A matched BEV pixel casts yaw votes for candidate poses. Poses geometrically aligned on the correct radial direction consistently vote for t… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the LAYS framework. (a) U-Net encoders process ground and BEV images. Ground pixel features are aggregated column-wise to produce column features Fgrd. (b) Each BEV pixel is matched to the most similar ground column, producing a match score Ms and relative yaw Mr. (c) For each pair of 2D pose and BEV pixel, match scores are voted into yaw bins based on the geometric relationship between absolut… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of LAYS in the MGL dataset. PCA visualizes feature space. (a) is the ground view. (b) are the ground pixel, column features, and confidence: Fpix, Fgrd, and Cgrd, with a colormap for confidence. (c) is the BEV image with the estimation result. (d) are the BEV and the projected ground feature for the estimate pose (for visualization, not used by our method). (e) are match scores Ms within an R… view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of our yaw estimation in the MGL dataset. Our method gives high confidence and a match score for the column-aligned road on the left and the tree in the middle. (a) Ground View (b) Ground Feature (c) BEV & Estimate (e) Match Scores (d) BEV & Projected Feature Red: GT Yellow: Estimate Fgrd Cgrd Fpix d < R Votes [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of our yaw estimation in the MGL dataset. The estimated pose has the most votes from a tree. (a) Ground View (b) Ground Feature (c) BEV & Estimate (e) Match Scores (d) BEV & Projected Feature Red: GT Yellow: Estimate Fgrd Cgrd Fpix d < R Votes [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of our yaw estimation in the MGL dataset. Our method has high confidence and a match score on the features extracted from the building at the front of the road [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of our yaw estimation in the Ford dataset. Our method aligns the road components. (a) Ground View (b) Ground Feature (c) BEV & Estimate (e) Match Scores (d) BEV & Projected Feature Red: GT Yellow: Estimate Fgrd Cgrd Fpix d < R Votes [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of our yaw estimation in the KITTI dataset, same-area (Test 1). Despite the wide-baseline perspective change between the ground view and BEV, our method identifies road and roadside structure correspondences to estimate yaw. (a) Ground View (b) Ground Feature (c) BEV & Estimate (e) Match Scores (d) BEV & Projected Feature Red: GT Yellow: Estimate Fgrd Cgrd Fpix d < R Votes [PITH_FULL_IMAGE:f… view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of our yaw estimation in the KITTI dataset, cross-area (Test 2). Our method generalizes to unseen geographic areas. The same area is mostly urban; our method generalizes to curved highways in the cross-area data, finding radial cor￾respondences along road boundaries [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visualization of our yaw estimation in the VIGOR dataset, same-area setup. Our method aligns a specific building and road. (a) Ground View (b) Ground Feature (c) BEV & Estimate (e) Match Scores (d) BEV & Projected Feature Red: GT Yellow: Estimate Fgrd Cgrd Fpix d < R Votes [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of our yaw estimation in the VIGOR dataset, cross-area setup. Our method aligns the road and a tree. an absolute similarity score, feature vectors are normalized by flipping the sign when they have a negative similarity with the mean of all feature vectors. Experiments in the MGL [21] dataset revealed that our method not only focuses on the road, but also on prominent roadside features [PIT… view at source ↗
Figure 13
Figure 13. Figure 13: Failure case examples in the MGL dataset Severe Occlusion of BEV: road is completely covered by trees when the BEV image was captured. Severe occlusion makes it difficult to match ground view and BEV, especially in day night time change from ground view to BEV. For the estimated pose, vote mainly comes from a small tree in the BEV, and high match scores (red color) are generally on small trees, which alig… view at source ↗
Figure 14
Figure 14. Figure 14: Failure case examples in the KITTI dataset. Symmetric scene: a crossroad produces ambiguous votes between two visually similar directions. Temporal impact, such as shades on the road, reduced confidence (viridis colormap, yellow indicates high confidence, green indicates lower) on the correct road, resulting in similar-looking different road positions selected. Symmetric Scenes At locations with rotationa… view at source ↗
Figure 15
Figure 15. Figure 15: Matching visualization for one VIGOR cross-area sample. Ground view (top): the four columns that contribute most to the predicted yaw, with bar length propor￾tional to the per-pixel saliency |∇ · I| of the matching score w.r.t. the input pixels. BEV (left): thin colored lines mark each column’s view direction from the predicted pose; circles on every pixel show the score each column votes into the predict… view at source ↗
read the original abstract

Accurate yaw estimation is a bottleneck in cross-view localization between ground view and Bird's Eye View (BEV). Existing methods couple yaw with translation and rely on height or projection assumptions that degrade under large yaw ambiguity. We disentangle yaw from location accuracy and introduce LAYS, a radially invariant line-consensus voting method. By exploiting the radial invariance of our formulation, we achieve sub-degree yaw precision via 3D voting over all candidate poses, while eliminating the need for accurate location. Our key observation is that a ground-image column matched to BEV pixels induces the same yaw across all camera positions along the radial direction of the pixels. LAYS matches BEV pixels to ground columns using feature similarity and accumulates the induced yaw votes into discrete 3D bins, where correct correspondences along the radial line concentrate into a sharp peak for the correct yaw. Experiments on Mapillary, Ford, KITTI, and VIGOR show significant gains under unknown yaw, particularly for normal FoV with unknown yaw (+28$\sim$45\%p), and using LAYS as a yaw prior improves downstream 3-DoF localization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces LAYS, a radially invariant line-consensus voting method for estimating yaw in cross-view ground-to-BEV localization. It claims that matching ground-image columns to BEV pixels via feature similarity allows accumulation of yaw votes into 3D bins, where correct radial correspondences produce a sharp peak for the true yaw, achieving sub-degree precision without accurate location knowledge. Experiments on Mapillary, Ford, KITTI, and VIGOR datasets report significant gains (e.g., +28–45% points for normal FoV with unknown yaw) and improved downstream 3-DoF localization when used as a prior.

Significance. If the radial-invariance property and reliable feature matching under location uncertainty hold, the method would address a key bottleneck by decoupling yaw from translation, offering a parameter-free voting approach that could improve robustness in cross-view tasks. The multi-dataset gains and downstream utility suggest practical value, though the lack of detailed validation of the core assumption limits current assessment of impact.

major comments (3)
  1. [Abstract] Abstract: The central claim of sub-degree yaw precision via 3D voting relies on the radial invariance observation and sufficient correct correspondences under location uncertainty, yet the text provides no quantification of surviving correct radial matches, no error analysis, and no ablation studies on descriptor sensitivity or discretization effects, preventing verification of whether the accumulator forms the claimed sharp peak.
  2. [Abstract] The key geometric observation (a ground column matched to BEV pixels induces identical yaw for all positions along the radial line) is presented without formal derivation, proof of invariance under real-world conditions (e.g., non-fronto-parallel surfaces or viewpoint changes), or empirical validation on the tested datasets with explicit location uncertainty regimes.
  3. [Abstract] No implementation details, pseudocode, or parameter settings for the feature matching, 3D binning, or voting procedure are supplied, making it impossible to assess reproducibility or the exact conditions under which the reported gains materialize.
minor comments (1)
  1. [Abstract] The abstract mentions 'normal FoV' but does not define the field-of-view ranges or camera parameters used in the experiments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will incorporate revisions to improve clarity and verifiability of the core claims.

read point-by-point responses
  1. Referee: [Abstract] The central claim of sub-degree yaw precision via 3D voting relies on the radial invariance observation and sufficient correct correspondences under location uncertainty, yet the text provides no quantification of surviving correct radial matches, no error analysis, and no ablation studies on descriptor sensitivity or discretization effects, preventing verification of whether the accumulator forms the claimed sharp peak.

    Authors: We agree that the abstract and current presentation lack explicit quantification of surviving correct radial matches, error analysis, and ablations on descriptor sensitivity or discretization. While the experimental results on four datasets demonstrate the claimed precision, additional supporting analysis is warranted. We will add a dedicated subsection with quantification of correct correspondences under controlled location uncertainty, sensitivity analysis, and discretization ablations to verify peak formation. revision: yes

  2. Referee: [Abstract] The key geometric observation (a ground column matched to BEV pixels induces identical yaw for all positions along the radial line) is presented without formal derivation, proof of invariance under real-world conditions (e.g., non-fronto-parallel surfaces or viewpoint changes), or empirical validation on the tested datasets with explicit location uncertainty regimes.

    Authors: The observation follows directly from the projective geometry of the ground-to-BEV mapping, where matched columns induce constant yaw along radial lines in the BEV plane. We will include a formal derivation in the methods section, discuss invariance assumptions (including limitations under non-fronto-parallel surfaces), and add empirical validation plots showing yaw consistency across location uncertainty regimes on the evaluated datasets. revision: yes

  3. Referee: [Abstract] No implementation details, pseudocode, or parameter settings for the feature matching, 3D binning, or voting procedure are supplied, making it impossible to assess reproducibility or the exact conditions under which the reported gains materialize.

    Authors: We acknowledge the absence of these details in the current manuscript. We will add a dedicated implementation subsection with pseudocode for the voting procedure, exact parameter values (e.g., bin sizes, matching thresholds), and feature descriptor settings to ensure full reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation is self-contained geometric voting

full rationale

The paper introduces LAYS as a line-consensus voting procedure grounded in an explicit geometric observation: a ground-image column matched to BEV pixels induces identical yaw for all camera positions along the radial line. This property is stated directly as the basis for matching via feature similarity and accumulating votes in 3D bins to form a peak at the correct yaw. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citations serve as load-bearing uniqueness theorems, and no ansatz is smuggled in. The method is presented as a direct application of the radial invariance without tautological redefinition or renaming of known results, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper introduces a new method without explicit free parameters or invented physical entities. It relies on standard computer vision domain assumptions for feature matching and geometric projection.

axioms (2)
  • domain assumption Feature similarity between BEV pixels and ground-image columns can be used to identify reliable correspondences
    The voting procedure depends on this matching step to generate yaw votes.
  • standard math Standard projective geometry and radial line properties hold for the camera models used
    Invoked in the key observation about radial invariance.

pith-pipeline@v0.9.1-grok · 5739 in / 1363 out tokens · 20815 ms · 2026-06-26T12:27:51.177637+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 9 canonical work pages

  1. [1]

    In: 2009 IEEE 12th International Conference on Computer Vision

    Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building rome in a day. In: 2009 IEEE 12th International Conference on Computer Vision. pp. 72–79 (2009).https://doi.org/10.1109/ICCV.2009.5459148

  2. [2]

    Agarwal, S., Vora, A., Pandey, G., Williams, W., Kourous, H., McBride, J.: Ford multi-av seasonal dataset (2020)

  3. [3]

    Pattern Anal

    Brejcha, J., Čadík, M.: State-of-the-art in visual geo-localization. Pattern Anal. Appl.20(3), 613–637 (Aug 2017).https://doi.org/10.1007/s10044-017-0611- 1,https://doi.org/10.1007/s10044-017-0611-1

  4. [4]

    In: IEEE/CVF International Conference on Computer Vision (ICCV)

    Delattre, F., Dirnfeld, D., Nguyen, P., Scarano, S., Jones, M.J., Miraldo, P., Learned-Miller, E.: Robust frame-to-frame camera rotation estimation in crowded scenes. In: IEEE/CVF International Conference on Computer Vision (ICCV). pp. 9752–9762 (10 2023)

  5. [5]

    Sensors24(7) (2024).https://doi.org/10.3390/ s24072246,https://www.mdpi.com/1424-8220/24/7/2246

    Du, Y., Mateo, C., Tahri, O.: A multilayer perceptron-based spherical visual com- pass using global features. Sensors24(7) (2024).https://doi.org/10.3390/ s24072246,https://www.mdpi.com/1424-8220/24/7/2246

  6. [6]

    In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Fervers, F., Bullinger, S., Bodensteiner, C., Arens, M., Stiefelhagen, R.: Uncertainty-aware vision-based metric cross-view geolocalization. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21621–21631 (June 2023)

  7. [7]

    International Journal of Robotics Research (IJRR) (2013)

    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR) (2013)

  8. [8]

    IEEE Transactions on Image Processing31, 2094–2105 (2022).https://doi.org/10.1109/TIP.2022.3152046

    Guo, Y., Choi, M., Li, K., Boussaid, F., Bennamoun, M.: Soft exemplar high- lighting for cross-view image-based geo-localization. IEEE Transactions on Image Processing31, 2094–2105 (2022).https://doi.org/10.1109/TIP.2022.3152046

  9. [9]

    In: Proceedings of the IEEE Conf

    Hays, J., Efros, A.A.: im2gps: estimating geographic information from a single image. In: Proceedings of the IEEE Conf. on Computer Vision and Pattern Recog- nition (CVPR) (2008)

  10. [10]

    In: CVPR (2023)

    Jin, L., Zhang, J., Hold-Geoffroy, Y., Wang, O., Matzen, K., Sticha, M., Fouhey, D.F.: Perspective fields for single image camera calibration. In: CVPR (2023)

  11. [11]

    In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR)

    Lentsch, T., Xia, Z., Caesar, H., Kooij, J.F.P.: Slicematch: Geometry-guided ag- gregation for cross-view pose estimation. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). pp. 17225–17234 (June 2023)

  12. [12]

    In: European Conf

    Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: European Conf. on Computer Vision (2012)

  13. [13]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Liu, L., Li, H.: Lending orientation to neural networks for cross-view geo- localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5617–5626 (2019)

  14. [14]

    In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)

    Liu, S., Deng, W.: Very deep convolutional neural network based image classifi- cation using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). pp. 730–734 (2015).https://doi.org/10.1109/ ACPR.2015.7486599

  15. [15]

    Remote Sensing14(6), 1430 (2022)

    Liu, Y., Tao, J., Kong, D., Zhang, Y., Li, P.: A visual compass based on point and line features for uav high-altitude orientation estimation. Remote Sensing14(6), 1430 (2022)

  16. [16]

    In: International Conferenceon LearningRepresentations(2019),https://openreview.net/forum? id=Bkg6RiCqY7 16 Kang et al

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conferenceon LearningRepresentations(2019),https://openreview.net/forum? id=Bkg6RiCqY7 16 Kang et al

  17. [17]

    In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T

    Middelberg, S., Sattler, T., Untzelmann, O., Kobbelt, L.: Scalable 6-dof localiza- tion on mobile devices. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. pp. 268–283. Springer International Publishing, Cham (2014)

  18. [18]

    Advances in Neural Information Processing Sys- tems32(2019)

    Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high- performance deep learning library. Advances in Neural Information Processing Sys- tems32(2019)

  19. [19]

    AEA Annual Meeting paper

    Rajpurohit, A., Kumar, P., Singh, D., Kumar, R.: A review on visual positioning system. SSRN Electronic Journal (01 2024).https://doi.org/10.2139/ssrn. 4485458

  20. [20]

    CoRRabs/1505.04597(2015),http://dblp.uni- trier.de/db/journals/corr/corr1505.html#RonnebergerFB15

    Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. CoRRabs/1505.04597(2015),http://dblp.uni- trier.de/db/journals/corr/corr1505.html#RonnebergerFB15

  21. [21]

    In: CVPR (2023)

    Sarlin, P.E., DeTone, D., Yang, T.Y., Avetisyan, A., Straub, J., Malisiewicz, T., Bulo, S.R., Newcombe, R., Kontschieder, P., Balntas, V.: OrienterNet: Visual Lo- calization in 2D Public Maps with Neural Matching. In: CVPR (2023)

  22. [22]

    In: NeurIPS (2023)

    Sarlin, P.E., Trulls, E., Pollefeys, M., Hosang, J., Lynen, S.: SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding. In: NeurIPS (2023)

  23. [23]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2022)

    Shi, Y., Li, H.: Beyond cross-view image retrieval: Highly accurate vehicle localiza- tion using satellite image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2022)

  24. [24]

    In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part IX

    Shi, Y., Li, H., Perincherry, A., Vora, A.: Weakly-supervised camera localization by ground-to-satellite image registration. In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part IX. p. 39–57. Springer-Verlag, Berlin, Heidelberg (2024).https://doi.org/ 10.1007/978-3-031-72673-6_3,https://d...

  25. [25]

    In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alch’e-Buc, F., Fox, E., Garnett, R

    Shi, Y., Liu, L., Yu, X., Li, H.: Spatial-aware feature aggregation for image based cross-view geo-localization. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alch’e-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Informa- tion Processing Systems 32, pp. 10090–10100. Curran Associates, Inc. (2019), http://papers.nips.cc/paper/9199- spatial- ...

  26. [26]

    Barron, Ben Mildenhall, Dor Verbin, Pratul P

    Shi, Y., Wu, F., Perincherry, A., Vora, A., Li, H.: Boosting 3-dof ground-to-satellite camera localization accuracy via geometry-guided cross-view transformer. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 21459– 21469 (2023).https://doi.org/10.1109/ICCV51070.2023.01967

  27. [27]

    In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Shi, Y., Yu, X., Campbell, D., Li, H.: Where am i looking at? joint location and orientation estimation by cross-view matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4063–4071 (2020).https: //doi.org/10.1109/CVPR42600.2020.00412

  28. [28]

    In: arXiv preprint arXiv:1907.05021 (2019)

    Shi, Y., Yu, X., Liu, L., Zhang, T., Li, H.: Optimal feature transport for cross-view image geo-localization. In: arXiv preprint arXiv:1907.05021 (2019)

  29. [29]

    In: Proceedings of the Asian Conference on Computer Vision (ACCV)

    Shi, Y., Yu, X., Wang, S., Li, H.: Cvlnet: Cross-view feature correspondence learn- ing for video-based camera localization. In: Proceedings of the Asian Conference on Computer Vision (ACCV). pp. 652–669 (December 2022)

  30. [30]

    In: Oh, A., Naumann, T., Glober- LAYS: Line-Aligning Yaw Scoring 17 son, A., Saenko, K., Hardt, M., Levine, S

    Song, Z., xianghui, z., Lu, J., Shi, Y.: Learning dense flow field for highly- accurate cross-view camera localization. In: Oh, A., Naumann, T., Glober- LAYS: Line-Aligning Yaw Scoring 17 son, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Infor- mation Processing Systems. vol. 36, pp. 70612–70625. Curran Associates, Inc. (2023),https : /...

  31. [31]

    In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Wang, S., Nguyen, C., Liu, J., Zhang, Y., Muthu, S., Maken, F.A., Zhang, K., Li, H.: View from above: Orthogonal-view aware cross-view localization. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 14843–14852 (June 2024)

  32. [32]

    In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

    Wang, S., Zhang, Y., Perincherry, A., Vora, A., Li, H.: View consistent purification for accurate cross-view localization. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 8197–8206 (2023)

  33. [33]

    In: Oh, A., Nau- mann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

    Wang, X., Xu, R., Cui, Z., Wan, Z., Zhang, Y.: Fine-grained cross-view geo- localization using a correlation-aware homography estimator. In: Oh, A., Nau- mann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems. vol. 36, pp. 5301–5319. Curran Associates, Inc. (2023),https://proceedings.neurips.cc/...

  34. [34]

    ArXivabs/1602.05314(2016),https : / / api

    Weyand, T., Kostrikov, I., Philbin, J.: Planet - photo geolocation with con- volutional neural networks. ArXivabs/1602.05314(2016),https : / / api . semanticscholar.org/CorpusID:171846

  35. [35]

    In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2024)

    Wu, H., Zhang, Z., Lin, S., Mu, X., Zhao, Q., Yang, M., Qin, T.: Maplocnet: Coarse-to-fine feature registration for visual re-localization in navigation maps. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2024)

  36. [36]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)

    Xia, Z., Alahi, A.: Fg^2: Fine-grained cross-view localization by fine-grained fea- ture matching. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR). pp. 6362–6372 (June 2025)

  37. [37]

    IEEE Transactions on Pattern Analysis and Machine Intelligence46(5), 3813–3831 (2024).https://doi.org/10.1109/TPAMI.2023.3346924

    Xia, Z., Booij, O., Kooij, J.F.P.: Convolutional cross-view pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence46(5), 3813–3831 (2024).https://doi.org/10.1109/TPAMI.2023.3346924

  38. [38]

    In: European Conference on Computer Vision

    Xia, Z., Booij, O., Manfredi, M., Kooij, J.F.: Visual cross-view metric localization with dense uncertainty estimates. In: European Conference on Computer Vision. pp. 90–106. Springer (2022)

  39. [39]

    2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp

    Xian, W., Li, Z., Fisher, M., Eisenmann, J., Shechtman, E., Snavely, N.: Up- rightnet: Geometry-aware camera orientation estimation from single images. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 9973–9982 (2019),https://api.semanticscholar.org/CorpusID:201107189

  40. [40]

    Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network (2015),https://arxiv.org/abs/1505.00853

  41. [41]

    Sensors22(22) (2022)

    Yang, A., Beheshti, M., Hudson, T.E., Vedanthan, R., Riewpaiboon, W., Mongkol- wat, P., Feng, C., Rizzo, J.R.: Unav: An infrastructure-independent vision-based navigation system for people with blindness and low vision. Sensors22(22) (2022). https://doi.org/10.3390/s22228894,https://www.mdpi.com/1424-8220/22/ 22/8894

  42. [42]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Zhu, S., Shah, M., Chen, C.: Transgeo: Transformer is all you need for cross-view image geo-localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1162–1171 (2022)

  43. [43]

    FG2 (Two-stage)

    Zhu, S., Yang, T., Chen, C.: Vigor: Cross-view image geo-localization beyond one- to-one retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3640–3649 (2021) LAYS: Line-Aligning Yaw Scoring 1 Appendix In this appendix, we provide the following: –Proof of Proposition 1 (Sec. A) –Training setup (Sec. B) –Pse...