pith. machine review for the scientific record. sign in

arxiv: 2603.22153 · v3 · submitted 2026-03-23 · 💻 cs.CV · cs.AI

Recognition: no theorem link

Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UAV Navigation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:27 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords cross-view geo-localizationUAV navigationvision-based localizationheading estimationunaligned imagerysatellite viewsaerial navigationmulti-city benchmark
0
0 comments X

The pith

Bearing-UAV enables vision-only UAV navigation by jointly predicting absolute location and heading from unaligned aerial and satellite views without storing map tiles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Bearing-UAV, a method for UAV navigation that estimates both position and orientation directly from neighboring visual features in cross-view settings. Prior approaches match UAV images to fixed satellite tiles, creating a trade-off between accuracy and storage while ignoring heading. Bearing-UAV instead uses global and local structural features plus explicit encoding of relative spatial relationships to handle misalignment, viewpoint differences, and sparse features. It reports lower localization error than matching-based methods on a new multi-city benchmark covering diverse terrains. The design removes the need for onboard map storage and supplies heading data needed for practical flight control.

Core claim

Bearing-UAV is a purely vision-driven cross-view navigation method that jointly predicts UAV absolute location and heading from neighboring features, leveraging global and local structural features while explicitly encoding relative spatial relationships to remain robust under cross-view variations, misalignment, and feature-sparse conditions.

What carries the argument

Bearing-UAV, which jointly predicts location and heading by combining global-local structural features with relative spatial relationship encodings.

Load-bearing premise

That the substantial discrepancies and varying overlaps between aerial and satellite views have been adequately handled so the approach generalizes to real-world UAV flights.

What would settle it

A set of real UAV flights over feature-sparse terrain or areas with large viewpoint overlaps where the reported localization error exceeds that of tile-matching baselines would falsify the claimed advantage.

Figures

Figures reproduced from arXiv: 2603.22153 by Haofei Zhang, Haoyang Zhou, Kejia Liu, Mingli Song, Peicheng Wang, Ruoyu Xu.

Figure 1
Figure 1. Figure 1: Bearing-UAV overview. Given a UAV-view patch (UVP) [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Bearing-UAV in training mode. Given four adjacent RSTs with their relative coordinates ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Global-Local Unity Feature (GLUF). A VGG-16 extracts [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Bearing-Naver’s operating mode. At each step, we first [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Localization and heading performance both consistently [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Navigation performance comparison. The purple dashed [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Two examples of UAV-satellite cross-view UVP sampling [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Localization and orientation performance under different [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Effect of Weather Augmentation on Bearing-UAV. [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Localization and heading error distributions and scatter plots of 9k unseen test samples in satellite and UAV views. [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Heatmaps of mean localization and heading errors [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visualization of U-S cross-view localization and heading errors on real satellite images from four cities. The four satellite images [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Navigation performance comparison on seven additional routes. We further provide a comprehensive comparison between [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Visualization of cross-view navigation frames of Bearing-UAV. We present a successful case for trajectory #1 in City D from [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: The UAV cross-view navigation process corresponding to trajectory #2 in City A, as shown in Fig. [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗
read the original abstract

Recent advances in cross-view geo-localization (CVGL) methods have shown strong potential for supporting unmanned aerial vehicle (UAV) navigation in GNSS-denied environments. However, existing work predominantly focuses on matching UAV views to onboard map tiles, which introduces an inherent trade-off between accuracy and storage overhead, and overlooks the importance of the UAV's heading during navigation. Moreover, the substantial discrepancies and varying overlaps in cross-view scenarios have been insufficiently considered, limiting their generalization to real-world scenarios. In this paper, we present Bearing-UAV, a purely vision-driven cross-view navigation method that jointly predicts UAV absolute location and heading from neighboring features, enabling accurate, lightweight, and robust navigation in the wild. Our method leverages global and local structural features and explicitly encodes relative spatial relationships, making it robust to cross-view variations, misalignment, and feature-sparse conditions. We also present Bearing-UAV-90k, a multi-city benchmark for evaluating cross-view localization and navigation. Extensive experiments show encouraging results that Bearing-UAV yields lower localization error than previous matching/retrieval paradigm across diverse terrains. Our code and dataset will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces Bearing-UAV, a vision-only cross-view geo-localization method for UAV navigation that jointly predicts absolute location and heading by encoding global/local structural features and relative spatial relationships. It contrasts this with prior tile-matching approaches, which incur storage-accuracy trade-offs and ignore heading. The authors release the Bearing-UAV-90k multi-city benchmark and report that the method achieves lower localization error than matching/retrieval baselines across diverse terrains, with claimed robustness to misalignment, varying overlaps, and feature sparsity.

Significance. If the reported error reductions hold under detailed scrutiny, the work offers a lightweight, map-tile-free alternative for GNSS-denied UAV navigation with potential practical impact. The new benchmark could serve as a useful community resource for evaluating cross-view robustness. The explicit modeling of relative spatial relationships is a conceptually clean contribution that directly targets the misalignment issues highlighted in the abstract.

major comments (2)
  1. [§4, Table 2] §4 (Experiments) and Table 2: The central claim of lower localization error versus matching baselines is asserted, but the manuscript provides no error bars, statistical significance tests, or per-terrain variance; without these, it is impossible to determine whether the reported gains are reliable or driven by a few easy sequences.
  2. [§3.2, §5] §3.2 (Method) and §5 (Discussion): The robustness to 'substantial discrepancies and varying overlaps' is a load-bearing claim for real-world UAV applicability, yet the paper does not quantify overlap statistics or misalignment distributions in Bearing-UAV-90k nor include an ablation that isolates the relative-spatial-encoding module under controlled overlap reduction.
minor comments (3)
  1. [Abstract] Abstract: Replace the vague phrase 'encouraging results' with the actual mean localization error reduction (e.g., 'X% lower median error') to strengthen the summary.
  2. [§3] Notation: Define the joint loss for location and heading prediction explicitly (currently described only at high level) and ensure symbols for bearing and position are used consistently between equations and text.
  3. [Figure 3] Figure 3: The qualitative examples would benefit from overlaying predicted versus ground-truth trajectories with error vectors to visually support the quantitative claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and improve our manuscript. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [§4, Table 2] §4 (Experiments) and Table 2: The central claim of lower localization error versus matching baselines is asserted, but the manuscript provides no error bars, statistical significance tests, or per-terrain variance; without these, it is impossible to determine whether the reported gains are reliable or driven by a few easy sequences.

    Authors: We agree that including error bars, statistical significance tests, and per-terrain variance would strengthen the reliability of our claims. In the revised manuscript, we will report standard deviations across multiple evaluation runs or sequences in Table 2. We will also perform and report statistical significance tests (such as paired t-tests) to confirm that the observed improvements are significant. Furthermore, we will add per-terrain variance analysis to demonstrate that the gains hold across diverse terrains rather than being driven by specific sequences. revision: yes

  2. Referee: [§3.2, §5] §3.2 (Method) and §5 (Discussion): The robustness to 'substantial discrepancies and varying overlaps' is a load-bearing claim for real-world UAV applicability, yet the paper does not quantify overlap statistics or misalignment distributions in Bearing-UAV-90k nor include an ablation that isolates the relative-spatial-encoding module under controlled overlap reduction.

    Authors: We acknowledge the importance of quantifying the overlap and misalignment characteristics in our benchmark to support the robustness claims. In the revision, we will include detailed statistics on overlap ratios and misalignment distributions within the Bearing-UAV-90k dataset description. Additionally, we will conduct and present an ablation study that isolates the contribution of the relative-spatial-encoding module by evaluating performance on controlled subsets with reduced overlaps. These additions will be incorporated into Sections 3.2 and 5. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents Bearing-UAV as an independent neural architecture for joint location and heading prediction from cross-view features, together with a new benchmark dataset. No equations, derivations, or parameter-fitting steps are described that reduce by construction to the inputs or to self-citations. The central claims rest on experimental comparisons against matching baselines across terrains, which are externally falsifiable and do not rely on load-bearing self-referential definitions, uniqueness theorems, or ansatzes imported from prior author work. The contribution is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on standard assumptions in cross-view feature matching.

pith-pipeline@v0.9.0 · 5520 in / 937 out tokens · 26050 ms · 2026-05-15T00:27:57.944542+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

  1. [1]

    Real-time cross-view image matching and camera pose determination for unmanned aerial vehicles.Photogrammetric Engineering & Remote Sensing, 90(6):371–381, 2024

    Long Chen, Bo Wu, Ran Duan, and Zeyu Chen. Real-time cross-view image matching and camera pose determination for unmanned aerial vehicles.Photogrammetric Engineering & Remote Sensing, 90(6):371–381, 2024. 2, 3

  2. [2]

    Obtpn: A vision-based network for uav geo-localization in multi- altitude environments.Drones, 9(1):33, 2025

    Nanxing Chen, Jiqi Fan, Jiayu Yuan, and Enhui Zheng. Obtpn: A vision-based network for uav geo-localization in multi- altitude environments.Drones, 9(1):33, 2025. 2

  3. [3]

    Akhloufi

    Andy Couturier and Moulay A. Akhloufi. A review on deep learning for uav absolute visual localization.Drones, 8(11): 622, 2024. 1, 2

  4. [4]

    A novel geo- localization method for uav and satellite images using cross- view consistent attention.Remote Sensing, 15(19):4667, 2023

    Zhuofan Cui, Pengwei Zhou, Xiaolong Wang, Zilun Zhang, Yingxuan Li, Hongbo Li, and Yu Zhang. A novel geo- localization method for uav and satellite images using cross- view consistent attention.Remote Sensing, 15(19):4667, 2023. 2

  5. [5]

    A transformer-based feature segmentation and region alignment method for uav-view geo-localization.IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4376–4389,

    Ming Dai, Jianhong Hu, Jiedong Zhuang, and Enhui Zheng. A transformer-based feature segmentation and region alignment method for uav-view geo-localization.IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4376–4389,

  6. [6]

    Denseuav2:vision-based uav self-positioning in low-altitude urban environments.IEEE Transactions on Image Processing, 33:493–508, 2024

    Ming Dai, Enhui Zheng, Zhenhua Feng, Lei Qi, Jiedong Zhuang, and Wankou Yang. Denseuav2:vision-based uav self-positioning in low-altitude urban environments.IEEE Transactions on Image Processing, 33:493–508, 2024. 1, 2, 6

  7. [7]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255, 2009. 5

  8. [8]

    Ortholoc: Uav 6-dof localization and calibration using orthographic geodata.arXiv preprint arXiv:2509.18350, 2025

    Oussema Dhaouadi, Riccardo Marin, Johannes Meier, Jacques Kaiser, and Daniel Cremers. Ortholoc: Uav 6-dof localization and calibration using orthographic geodata.arXiv preprint arXiv:2509.18350, 2025. 3

  9. [9]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021. 4

  10. [10]

    Sgmnet:a scene graph encoding and matching network for uav visual localization.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:9890– 9902, 2024

    Ran Duan, Long Chen, Zhaojin Li, Zeyu Chen, and Bo Wu. Sgmnet:a scene graph encoding and matching network for uav visual localization.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:9890– 9902, 2024. 2, 3

  11. [11]

    Cross-view geo-localization: A survey.IEEE Access, 12:192028–192050, 2024

    Abhilash Durgam, Sidike Paheding, Vikas Dhiman, and Vijay Devabhaktuni. Cross-view geo-localization: A survey.IEEE Access, 12:192028–192050, 2024. 1, 2

  12. [12]

    Vision-based gnss-free localization for uavs in the wild

    Marius-Mihail Gurgu, Jorge Pe˜na Queralta, and Tomi Wester- lund. Vision-based gnss-free localization for uavs in the wild. InICMERR, pages 7–12, 2022. 1

  13. [13]

    A localization method for uav aerial im- ages based on semantic topological feature matching.Remote Sensing, 17(10):1671, 2025

    Jing He and Qian Wu. A localization method for uav aerial im- ages based on semantic topological feature matching.Remote Sensing, 17(10):1671, 2025. 1

  14. [14]

    Foundloc: Vision-based onboard aerial localization in the wild.arXiv preprint arXiv:2310.16299, 2023

    Yao He, Ivan Cisneros, Nikhil Keetha, Jay Patrikar, Zelin Ye, Ian Higgins, Yaoyu Hu, Parv Kapoor, and Sebastian Scherer. Foundloc: Vision-based onboard aerial localization in the wild.arXiv preprint arXiv:2310.16299, 2023. 3

  15. [15]

    Mcfa: Multi-scale cascade and feature adaptive alignment network for cross-view geo-localization.Sensors, 25(14): 4519, 2025

    Kaiji Hou, Qiang Tong, Na Yan, Xiulei Liu, and Shoulu Hou. Mcfa: Multi-scale cascade and feature adaptive alignment network for cross-view geo-localization.Sensors, 25(14): 4519, 2025. 2

  16. [16]

    Sixing Hu, Mengdan Feng, Rang M. H. Nguyen, and Gim Hee Lee. Cvm-net: Cross-view matching network for image- based ground-to-aerial geo-localization. InCVPR, pages 7258–7267. IEEE, 2018. 2

  17. [17]

    Gnss-denied unmanned aerial vehicle navi- gation: Analyzing computational complexity, sensor fusion, and localization methodologies.Satellite Navigation, 6(1):9,

    Imen Jarraya, Abdulrahman Al-Batati, Muhammad Bilal Kadri, Mohamed Abdelkader, Adel Ammar, Wadii Boulila, and Anis Koubaa. Gnss-denied unmanned aerial vehicle navi- gation: Analyzing computational complexity, sensor fusion, and localization methodologies.Satellite Navigation, 6(1):9,

  18. [18]

    Game4loc: A uav geo-localization benchmark from game data.AAAI, 39(4):3913–3921, 2025

    Yuxiang Ji, Boyong He, Zhuoyue Tan, and Liaoni Wu. Game4loc: A uav geo-localization benchmark from game data.AAAI, 39(4):3913–3921, 2025. 1, 2, 6

  19. [19]

    Bevloc: Cross- view localization and matching via birds-eye-view synthesis

    Christopher Klammer and Michael Kaess. Bevloc: Cross- view localization and matching via birds-eye-view synthesis. InIROS, pages 5656–5663, 2024. 1, 2

  20. [20]

    Jointly optimized global-local visual localization of uavs.arXiv preprint arXiv:2310.08082, 2023

    Haoling Li, Jiuniu Wang, Zhiwei Wei, and Wenjia Xu. Jointly optimized global-local visual localization of uavs.arXiv preprint arXiv:2310.08082, 2023. 1

  21. [21]

    Cvact:lending orientation to neural networks for cross-view geo-localization

    Liu Liu and Hongdong Li. Cvact:lending orientation to neural networks for cross-view geo-localization. InCVPR, pages 5624–5633, 2019. 1, 2

  22. [22]

    Local- ization of unmanned aerial vehicles using terrain classification from aerial images

    Andreas Masselli, Richard Hanten, and Andreas Zell. Local- ization of unmanned aerial vehicles using terrain classification from aerial images. InIntelligent Autonomous Systems 13, pages 831–842. Springer International Publishing, 2016. 1

  23. [23]

    Assisting uav localization via deep contextual image matching.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:2445– 2457, 2021

    Muhammad Hamza Mughal, Muhammad Jawad Khokhar, and Muhammad Shahzad. Assisting uav localization via deep contextual image matching.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:2445– 2457, 2021. 1

  24. [24]

    Avdelidis, and Xavier P

    Parham Nooralishahi, Clemente Ibarra-Castanedo, Shakeb Deane, Fernando L´opez, Shashank Pant, Marc Genest, Nico- las P. Avdelidis, and Xavier P. V . Maldague. Drone-based non-destructive inspection of industrial sites: A review and case studies.Drones, 5(4):106, 2021. 1

  25. [25]

    High-precision visual geo-localization of uav based on hierarchical localization.Expert Systems with Applications, 267:126064, 2025

    Xiong Qiu, Shouyi Liao, Dongfang Yang, Yongfei Li, and Shicheng Wang. High-precision visual geo-localization of uav based on hierarchical localization.Expert Systems with Applications, 267:126064, 2025. 3

  26. [26]

    Uavs-based visual localization via attention-driven image registration across varying texture levels.Drones, 8(12):739, 2024

    Yan Ren, Guohai Dong, Tianbo Zhang, Meng Zhang, Xinyu Chen, and Mingliang Xue. Uavs-based visual localization via attention-driven image registration across varying texture levels.Drones, 8(12):739, 2024. 1

  27. [27]

    Gas-drone: Portable gas sensing system on uavs for gas leakage local- ization

    Maurizio Rossi, Davide Brunelli, Andrea Adami, Leandro Lorenzelli, Fabio Menna, and Fabio Remondino. Gas-drone: Portable gas sensing system on uavs for gas leakage local- ization. In2014 IEEE SENSORS, pages 1431–1434, 2014. 1

  28. [28]

    Orienternet: Visual localization in 2d public maps with neural matching

    Paul-Edouard Sarlin, Daniel DeTone, Tsun-Yi Yang, Armen Avetisyan, Julian Straub, Tomasz Malisiewicz, Samuel Rota Bulo, Richard Newcombe, Peter Kontschieder, and Vasileios Balntas. Orienternet: Visual localization in 2d public maps with neural matching. InCVPR, pages 21632–21642. IEEE,

  29. [29]

    Vpair - aerial visual place recognition and localization in large-scale outdoor environments.arXiv preprint arXiv:2205.11567,

    Michael Schleiss, Fahmi Rouatbi, and Daniel Cremers. Vpair - aerial visual place recognition and localization in large-scale outdoor environments.arXiv preprint arXiv:2205.11567,

  30. [30]

    Cvusa:wide-area image geolocalization with aerial reference imagery

    Scott Workman, Richard Souvenir, and Nathan Jacobs. Cvusa:wide-area image geolocalization with aerial reference imagery. InICCV, pages 3961–3969, 2015. 2

  31. [31]

    Uav pose estimation using cross-view geolocalization with satellite imagery

    Akshay Shetty and Grace Xingxin Gao. Uav pose estimation using cross-view geolocalization with satellite imagery. In ICRA, pages 1827–1833, 2019. 2

  32. [32]

    Where am i looking at? joint location and orientation estima- tion by cross-view matching

    Yujiao Shi, Xin Yu, Dylan Campbell, and Hongdong Li. Where am i looking at? joint location and orientation estima- tion by cross-view matching. InCVPR, pages 4063–4071. IEEE, 2020. 2

  33. [33]

    Accurate 3-dof camera geo-localization via ground-to-satellite image matching.IEEE transactions on pattern analysis and machine intelligence, 45(3):2682–2697,

    Yujiao Shi, Xin Yu, Liu Liu, Dylan Campbell, Piotr Koniusz, and Hongdong Li. Accurate 3-dof camera geo-localization via ground-to-satellite image matching.IEEE transactions on pattern analysis and machine intelligence, 45(3):2682–2697,

  34. [34]

    Boosting 3-dof ground-to-satellite camera localiza- tion accuracy via geometry-guided cross-view transformer

    Yujiao Shi, Fei Wu, Akhil Perincherry, Ankit V ora, and Hong- dong Li. Boosting 3-dof ground-to-satellite camera localiza- tion accuracy via geometry-guided cross-view transformer. In ICCV, pages 21459–21469. IEEE, 2023. 2

  35. [35]

    Weakly-supervised camera localization by ground-to-satellite image registration

    Yujiao Shi, Hongdong Li, Akhil Perincherry, and Ankit V ora. Weakly-supervised camera localization by ground-to-satellite image registration. InECCV, pages 39–57. Springer, 2024. 3

  36. [36]

    Wind turbine surface damage detection by deep learning aided drone inspection analysis.Energies, 12(4):676, 2019

    ASM Shihavuddin, Xiao Chen, Vladimir Fedorov, Anders Nymark Christensen, Nicolai Andre Brogaard Riis, Kim Bran- ner, Anders Bjorholm Dahl, and Rasmus Reinhold Paulsen. Wind turbine surface damage detection by deep learning aided drone inspection analysis.Energies, 12(4):676, 2019. 1

  37. [37]

    Very deep convo- lutional networks for large-scale image recognition.ICLR,

    Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.ICLR,

  38. [38]

    Jian Sun, Kangdao Liu, Chi Zhang, Chuangquan Chen, Junge Shen, and Chi-Man V ong. Pfed-cross-view uav geo- localization with precision-focused efficient design: A hierar- chical distillation approach with multi-view refinement.arXiv preprint arXiv:2510.22582, 2025. 3

  39. [39]

    Absolute pose estimation of uav based on large-scale satellite image.Chinese Journal of Aeronautics, 37(6):219–231, 2024

    Hanyu Wang, Qiang Shen, Zilong Deng, Xinyi Cao, and Xiaokang Wang. Absolute pose estimation of uav based on large-scale satellite image.Chinese Journal of Aeronautics, 37(6):219–231, 2024. 3

  40. [40]

    View consistent purification for accurate cross-view localization

    Shan Wang, Yanhao Zhang, Akhil Perincherry, Ankit V ora, and Hongdong Li. View consistent purification for accurate cross-view localization. InICCV, pages 8197–8206, 2023. 3

  41. [41]

    View from above: Orthogonal-view aware cross-view localization

    Shan Wang, Chuong Nguyen, Jiawei Liu, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Kaihao Zhang, and Hongdong Li. View from above: Orthogonal-view aware cross-view localization. InCVPR, pages 14843–14852. IEEE,

  42. [42]

    Non-local neural networks

    Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. InCVPR, pages 7794–7803. IEEE, 2018. 3

  43. [43]

    Fine-grained cross-view geo-localization using a correlation-aware homography estimator

    Xiaolong Wang, Runsen Xu, Zhuofan Cui, Zeyu Wan, and Yu Zhang. Fine-grained cross-view geo-localization using a correlation-aware homography estimator. InNeurIPS, pages 5301–5319. Curran Associates, Inc., 2023. 2

  44. [44]

    Angle robustness unmanned aerial vehicle navigation in gnss-denied scenarios.AAAI, 38(9): 10386–10394, 2024

    Yuxin Wang, Zunlei Feng, Haofei Zhang, Yang Gao, Jie Lei, Li Sun, and Mingli Song. Angle robustness unmanned aerial vehicle navigation in gnss-denied scenarios.AAAI, 38(9): 10386–10394, 2024. 2

  45. [45]

    Vecmaplocnet: Vision-based uav localization using vector maps in gnss- denied environments.ISPRS Journal of Photogrammetry and Remote Sensing, 225:362–381, 2025

    Zhen Wang, Dianxi Shi, Chunping Qiu, Songchang Jin, Tongyue Li, Ziteng Qiao, and Yang Chen. Vecmaplocnet: Vision-based uav localization using vector maps in gnss- denied environments.ISPRS Journal of Photogrammetry and Remote Sensing, 225:362–381, 2025. 3

  46. [46]

    Vision-based learning for drones: A survey.IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2025

    Jiaping Xiao, Rangya Zhang, Yuhang Zhang, and Mir Fer- oskhan. Vision-based learning for drones: A survey.IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2025. 1

  47. [47]

    3d positioning of drones through images.Sensors, 24(17):5491,

    Jianxing Yang, Enhui Zheng, Jiqi Fan, and Yuwen Yao. 3d positioning of drones through images.Sensors, 24(17):5491,

  48. [48]

    Kaiqian Yang, Yujin Zhang, Li Wang, A. A. M. Muzahid, Ferdous Sohel, Fei Wu, and Qiong Wu. Vimgeo: An efficient visual model for cross-view geo-localization.Electronics, 14 (19):3906, 2025. 2

  49. [49]

    Uav geo-localization dataset and method based on cross-view matching.Sensors, 24(21):6905, 2024

    Yuwen Yao, Cheng Sun, Tao Wang, Jianxing Yang, and Enhui Zheng. Uav geo-localization dataset and method based on cross-view matching.Sensors, 24(21):6905, 2024. 1

  50. [50]

    Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

    Yibin Ye, Xichao Teng, Shuo Chen, Zhang Li, Leqi Liu, Qifeng Yu, and Tao Tan. Exploring the best way for uav visual localization under low-altitude multi-view observation condition: A benchmark.arXiv preprint arXiv:2503.10692,

  51. [51]

    Hierarchical image matching for uav absolute visual localization via semantic and structural constraints.arXiv preprint arXiv:2506.09748, 2025

    Xiangkai Zhang, Xiang Zhou, Mao Chen, Yuchen Lu, Xu Yang, and Zhiyong Liu. Hierarchical image matching for uav absolute visual localization via semantic and structural constraints.arXiv preprint arXiv:2506.09748, 2025. 2

  52. [52]

    University-1652: A multi-view multi-source benchmark for drone-based geo- localization

    Zhedong Zheng, Yunchao Wei, and Yi Yang. University-1652: A multi-view multi-source benchmark for drone-based geo- localization. InACM MM, pages 1395–1403, 2020. 1, 2, 6

  53. [53]

    Curml: A curriculum machine learning library

    Yuwei Zhou, Hong Chen, Zirui Pan, Chuanhao Yan, Fanqi Lin, Xin Wang, and Wenwu Zhu. Curml: A curriculum machine learning library. InACM MM, pages 7359–7363,

  54. [54]

    Curbench: curriculum learning benchmark

    Yuwei Zhou, Zirui Pan, Xin Wang, Hong Chen, Haoyang Li, Yanwen Huang, Zhixiao Xiong, Fangzhou Xiong, Peiyang Xu, Wenwu Zhu, et al. Curbench: curriculum learning benchmark. InICML, 2024. 2

  55. [55]

    Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite.IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4825–4839, 2023

    Runzhe Zhu, Ling Yin, Mingze Yang, Fei Wu, Yuncheng Yang, and Wenbo Hu. Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite.IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4825–4839, 2023. 1, 2, 6

  56. [56]

    Sat.” and “UA V

    Sijie Zhu, Taojiannan Yang, and Chen Chen. Vigor: Cross- view image geo-localization beyond one-to-one retrieval. In CVPR, pages 3640–3649, 2021. 1, 2 Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UA V Navigation Supplementary Material A.1. List of Acronyms For clarity, the main acronyms used in this paper are gro...