arxiv: 2603.22153 · v3 · submitted 2026-03-23 · 💻 cs.CV · cs.AI

Recognition: no theorem link

Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UAV Navigation

Kejia Liu , Haoyang Zhou , Ruoyu Xu , Peicheng Wang , Mingli Song , Haofei Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:27 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords cross-view geo-localizationUAV navigationvision-based localizationheading estimationunaligned imagerysatellite viewsaerial navigationmulti-city benchmark

0 comments

The pith

Bearing-UAV enables vision-only UAV navigation by jointly predicting absolute location and heading from unaligned aerial and satellite views without storing map tiles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Bearing-UAV, a method for UAV navigation that estimates both position and orientation directly from neighboring visual features in cross-view settings. Prior approaches match UAV images to fixed satellite tiles, creating a trade-off between accuracy and storage while ignoring heading. Bearing-UAV instead uses global and local structural features plus explicit encoding of relative spatial relationships to handle misalignment, viewpoint differences, and sparse features. It reports lower localization error than matching-based methods on a new multi-city benchmark covering diverse terrains. The design removes the need for onboard map storage and supplies heading data needed for practical flight control.

Core claim

Bearing-UAV is a purely vision-driven cross-view navigation method that jointly predicts UAV absolute location and heading from neighboring features, leveraging global and local structural features while explicitly encoding relative spatial relationships to remain robust under cross-view variations, misalignment, and feature-sparse conditions.

What carries the argument

Bearing-UAV, which jointly predicts location and heading by combining global-local structural features with relative spatial relationship encodings.

Load-bearing premise

That the substantial discrepancies and varying overlaps between aerial and satellite views have been adequately handled so the approach generalizes to real-world UAV flights.

What would settle it

A set of real UAV flights over feature-sparse terrain or areas with large viewpoint overlaps where the reported localization error exceeds that of tile-matching baselines would falsify the claimed advantage.

Figures

Figures reproduced from arXiv: 2603.22153 by Haofei Zhang, Haoyang Zhou, Kejia Liu, Mingli Song, Peicheng Wang, Ruoyu Xu.

**Figure 2.** Figure 2: Bearing-UAV in training mode. Given four adjacent RSTs with their relative coordinates ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Global-Local Unity Feature (GLUF). A VGG-16 extracts [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Bearing-Naver’s operating mode. At each step, we first [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Localization and heading performance both consistently [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Navigation performance comparison. The purple dashed [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: Two examples of UAV-satellite cross-view UVP sampling [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Localization and orientation performance under different [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 11.** Figure 11: Effect of Weather Augmentation on Bearing-UAV. [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: Localization and heading error distributions and scatter plots of 9k unseen test samples in satellite and UAV views. [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: Heatmaps of mean localization and heading errors [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization of U-S cross-view localization and heading errors on real satellite images from four cities. The four satellite images [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗

**Figure 15.** Figure 15: Navigation performance comparison on seven additional routes. We further provide a comprehensive comparison between [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗

**Figure 16.** Figure 16: Visualization of cross-view navigation frames of Bearing-UAV. We present a successful case for trajectory #1 in City D from [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗

**Figure 17.** Figure 17: The UAV cross-view navigation process corresponding to trajectory #2 in City A, as shown in Fig. [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗

read the original abstract

Recent advances in cross-view geo-localization (CVGL) methods have shown strong potential for supporting unmanned aerial vehicle (UAV) navigation in GNSS-denied environments. However, existing work predominantly focuses on matching UAV views to onboard map tiles, which introduces an inherent trade-off between accuracy and storage overhead, and overlooks the importance of the UAV's heading during navigation. Moreover, the substantial discrepancies and varying overlaps in cross-view scenarios have been insufficiently considered, limiting their generalization to real-world scenarios. In this paper, we present Bearing-UAV, a purely vision-driven cross-view navigation method that jointly predicts UAV absolute location and heading from neighboring features, enabling accurate, lightweight, and robust navigation in the wild. Our method leverages global and local structural features and explicitly encodes relative spatial relationships, making it robust to cross-view variations, misalignment, and feature-sparse conditions. We also present Bearing-UAV-90k, a multi-city benchmark for evaluating cross-view localization and navigation. Extensive experiments show encouraging results that Bearing-UAV yields lower localization error than previous matching/retrieval paradigm across diverse terrains. Our code and dataset will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bearing-UAV shifts cross-view geo-localization from tile matching to direct joint location-plus-heading prediction and ships a new multi-city benchmark.

read the letter

The core advance here is a method that predicts both absolute position and heading from UAV imagery against satellite views, skipping the storage cost of onboard map tiles. It does this by pulling global and local structural features and explicitly encoding relative spatial relationships, which the authors say improves robustness to misalignment and partial overlaps. They also release Bearing-UAV-90k, a multi-city dataset that should make evaluation more representative than single-location benchmarks common in prior CVGL work. That combination of joint prediction and new data is the part worth paying attention to for anyone building vision-only navigation stacks. The reported experiments claim lower localization error than retrieval baselines across terrains, and the stress-test note indicates the full text backs this with direct comparisons rather than circular claims. A minor soft spot is that the abstract gives no numbers or ablation breakdowns, so the strength of the robustness argument depends on how thoroughly the full paper shows the relative-encoding component actually drives the gains versus other architecture choices. Generalization to extreme real-world cases like heavy vegetation or lighting shifts is asserted but would need close referee scrutiny on the dataset statistics. This paper is for people working on practical UAV or robotics vision pipelines who already know the CVGL literature. It is not a theoretical breakthrough but supplies a concrete alternative to the matching paradigm plus usable data. I would send it to peer review; the contribution is focused enough and the evaluation setup looks reproducible enough to merit detailed feedback rather than a desk reject.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces Bearing-UAV, a vision-only cross-view geo-localization method for UAV navigation that jointly predicts absolute location and heading by encoding global/local structural features and relative spatial relationships. It contrasts this with prior tile-matching approaches, which incur storage-accuracy trade-offs and ignore heading. The authors release the Bearing-UAV-90k multi-city benchmark and report that the method achieves lower localization error than matching/retrieval baselines across diverse terrains, with claimed robustness to misalignment, varying overlaps, and feature sparsity.

Significance. If the reported error reductions hold under detailed scrutiny, the work offers a lightweight, map-tile-free alternative for GNSS-denied UAV navigation with potential practical impact. The new benchmark could serve as a useful community resource for evaluating cross-view robustness. The explicit modeling of relative spatial relationships is a conceptually clean contribution that directly targets the misalignment issues highlighted in the abstract.

major comments (2)

[§4, Table 2] §4 (Experiments) and Table 2: The central claim of lower localization error versus matching baselines is asserted, but the manuscript provides no error bars, statistical significance tests, or per-terrain variance; without these, it is impossible to determine whether the reported gains are reliable or driven by a few easy sequences.
[§3.2, §5] §3.2 (Method) and §5 (Discussion): The robustness to 'substantial discrepancies and varying overlaps' is a load-bearing claim for real-world UAV applicability, yet the paper does not quantify overlap statistics or misalignment distributions in Bearing-UAV-90k nor include an ablation that isolates the relative-spatial-encoding module under controlled overlap reduction.

minor comments (3)

[Abstract] Abstract: Replace the vague phrase 'encouraging results' with the actual mean localization error reduction (e.g., 'X% lower median error') to strengthen the summary.
[§3] Notation: Define the joint loss for location and heading prediction explicitly (currently described only at high level) and ensure symbols for bearing and position are used consistently between equations and text.
[Figure 3] Figure 3: The qualitative examples would benefit from overlaying predicted versus ground-truth trajectories with error vectors to visually support the quantitative claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and improve our manuscript. We address each major comment point by point below.

read point-by-point responses

Referee: [§4, Table 2] §4 (Experiments) and Table 2: The central claim of lower localization error versus matching baselines is asserted, but the manuscript provides no error bars, statistical significance tests, or per-terrain variance; without these, it is impossible to determine whether the reported gains are reliable or driven by a few easy sequences.

Authors: We agree that including error bars, statistical significance tests, and per-terrain variance would strengthen the reliability of our claims. In the revised manuscript, we will report standard deviations across multiple evaluation runs or sequences in Table 2. We will also perform and report statistical significance tests (such as paired t-tests) to confirm that the observed improvements are significant. Furthermore, we will add per-terrain variance analysis to demonstrate that the gains hold across diverse terrains rather than being driven by specific sequences. revision: yes
Referee: [§3.2, §5] §3.2 (Method) and §5 (Discussion): The robustness to 'substantial discrepancies and varying overlaps' is a load-bearing claim for real-world UAV applicability, yet the paper does not quantify overlap statistics or misalignment distributions in Bearing-UAV-90k nor include an ablation that isolates the relative-spatial-encoding module under controlled overlap reduction.

Authors: We acknowledge the importance of quantifying the overlap and misalignment characteristics in our benchmark to support the robustness claims. In the revision, we will include detailed statistics on overlap ratios and misalignment distributions within the Bearing-UAV-90k dataset description. Additionally, we will conduct and present an ablation study that isolates the contribution of the relative-spatial-encoding module by evaluating performance on controlled subsets with reduced overlaps. These additions will be incorporated into Sections 3.2 and 5. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents Bearing-UAV as an independent neural architecture for joint location and heading prediction from cross-view features, together with a new benchmark dataset. No equations, derivations, or parameter-fitting steps are described that reduce by construction to the inputs or to self-citations. The central claims rest on experimental comparisons against matching baselines across terrains, which are externally falsifiable and do not rely on load-bearing self-referential definitions, uniqueness theorems, or ansatzes imported from prior author work. The contribution is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on standard assumptions in cross-view feature matching.

pith-pipeline@v0.9.0 · 5520 in / 937 out tokens · 26050 ms · 2026-05-15T00:27:57.944542+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

[1]

Real-time cross-view image matching and camera pose determination for unmanned aerial vehicles.Photogrammetric Engineering & Remote Sensing, 90(6):371–381, 2024

Long Chen, Bo Wu, Ran Duan, and Zeyu Chen. Real-time cross-view image matching and camera pose determination for unmanned aerial vehicles.Photogrammetric Engineering & Remote Sensing, 90(6):371–381, 2024. 2, 3

work page 2024
[2]

Obtpn: A vision-based network for uav geo-localization in multi- altitude environments.Drones, 9(1):33, 2025

Nanxing Chen, Jiqi Fan, Jiayu Yuan, and Enhui Zheng. Obtpn: A vision-based network for uav geo-localization in multi- altitude environments.Drones, 9(1):33, 2025. 2

work page 2025
[3]

Akhloufi

Andy Couturier and Moulay A. Akhloufi. A review on deep learning for uav absolute visual localization.Drones, 8(11): 622, 2024. 1, 2

work page 2024
[4]

A novel geo- localization method for uav and satellite images using cross- view consistent attention.Remote Sensing, 15(19):4667, 2023

Zhuofan Cui, Pengwei Zhou, Xiaolong Wang, Zilun Zhang, Yingxuan Li, Hongbo Li, and Yu Zhang. A novel geo- localization method for uav and satellite images using cross- view consistent attention.Remote Sensing, 15(19):4667, 2023. 2

work page 2023
[5]

A transformer-based feature segmentation and region alignment method for uav-view geo-localization.IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4376–4389,

Ming Dai, Jianhong Hu, Jiedong Zhuang, and Enhui Zheng. A transformer-based feature segmentation and region alignment method for uav-view geo-localization.IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4376–4389,

work page
[6]

Denseuav2:vision-based uav self-positioning in low-altitude urban environments.IEEE Transactions on Image Processing, 33:493–508, 2024

Ming Dai, Enhui Zheng, Zhenhua Feng, Lei Qi, Jiedong Zhuang, and Wankou Yang. Denseuav2:vision-based uav self-positioning in low-altitude urban environments.IEEE Transactions on Image Processing, 33:493–508, 2024. 1, 2, 6

work page 2024
[7]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255, 2009. 5

work page 2009
[8]

Ortholoc: Uav 6-dof localization and calibration using orthographic geodata.arXiv preprint arXiv:2509.18350, 2025

Oussema Dhaouadi, Riccardo Marin, Johannes Meier, Jacques Kaiser, and Daniel Cremers. Ortholoc: Uav 6-dof localization and calibration using orthographic geodata.arXiv preprint arXiv:2509.18350, 2025. 3

work page arXiv 2025
[9]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021. 4

work page 2021
[10]

Sgmnet:a scene graph encoding and matching network for uav visual localization.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:9890– 9902, 2024

Ran Duan, Long Chen, Zhaojin Li, Zeyu Chen, and Bo Wu. Sgmnet:a scene graph encoding and matching network for uav visual localization.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:9890– 9902, 2024. 2, 3

work page 2024
[11]

Cross-view geo-localization: A survey.IEEE Access, 12:192028–192050, 2024

Abhilash Durgam, Sidike Paheding, Vikas Dhiman, and Vijay Devabhaktuni. Cross-view geo-localization: A survey.IEEE Access, 12:192028–192050, 2024. 1, 2

work page 2024
[12]

Vision-based gnss-free localization for uavs in the wild

Marius-Mihail Gurgu, Jorge Pe˜na Queralta, and Tomi Wester- lund. Vision-based gnss-free localization for uavs in the wild. InICMERR, pages 7–12, 2022. 1

work page 2022
[13]

A localization method for uav aerial im- ages based on semantic topological feature matching.Remote Sensing, 17(10):1671, 2025

Jing He and Qian Wu. A localization method for uav aerial im- ages based on semantic topological feature matching.Remote Sensing, 17(10):1671, 2025. 1

work page 2025
[14]

Foundloc: Vision-based onboard aerial localization in the wild.arXiv preprint arXiv:2310.16299, 2023

Yao He, Ivan Cisneros, Nikhil Keetha, Jay Patrikar, Zelin Ye, Ian Higgins, Yaoyu Hu, Parv Kapoor, and Sebastian Scherer. Foundloc: Vision-based onboard aerial localization in the wild.arXiv preprint arXiv:2310.16299, 2023. 3

work page arXiv 2023
[15]

Mcfa: Multi-scale cascade and feature adaptive alignment network for cross-view geo-localization.Sensors, 25(14): 4519, 2025

Kaiji Hou, Qiang Tong, Na Yan, Xiulei Liu, and Shoulu Hou. Mcfa: Multi-scale cascade and feature adaptive alignment network for cross-view geo-localization.Sensors, 25(14): 4519, 2025. 2

work page 2025
[16]

Sixing Hu, Mengdan Feng, Rang M. H. Nguyen, and Gim Hee Lee. Cvm-net: Cross-view matching network for image- based ground-to-aerial geo-localization. InCVPR, pages 7258–7267. IEEE, 2018. 2

work page 2018
[17]

Gnss-denied unmanned aerial vehicle navi- gation: Analyzing computational complexity, sensor fusion, and localization methodologies.Satellite Navigation, 6(1):9,

Imen Jarraya, Abdulrahman Al-Batati, Muhammad Bilal Kadri, Mohamed Abdelkader, Adel Ammar, Wadii Boulila, and Anis Koubaa. Gnss-denied unmanned aerial vehicle navi- gation: Analyzing computational complexity, sensor fusion, and localization methodologies.Satellite Navigation, 6(1):9,

work page
[18]

Game4loc: A uav geo-localization benchmark from game data.AAAI, 39(4):3913–3921, 2025

Yuxiang Ji, Boyong He, Zhuoyue Tan, and Liaoni Wu. Game4loc: A uav geo-localization benchmark from game data.AAAI, 39(4):3913–3921, 2025. 1, 2, 6

work page 2025
[19]

Bevloc: Cross- view localization and matching via birds-eye-view synthesis

Christopher Klammer and Michael Kaess. Bevloc: Cross- view localization and matching via birds-eye-view synthesis. InIROS, pages 5656–5663, 2024. 1, 2

work page 2024
[20]

Jointly optimized global-local visual localization of uavs.arXiv preprint arXiv:2310.08082, 2023

Haoling Li, Jiuniu Wang, Zhiwei Wei, and Wenjia Xu. Jointly optimized global-local visual localization of uavs.arXiv preprint arXiv:2310.08082, 2023. 1

work page arXiv 2023
[21]

Cvact:lending orientation to neural networks for cross-view geo-localization

Liu Liu and Hongdong Li. Cvact:lending orientation to neural networks for cross-view geo-localization. InCVPR, pages 5624–5633, 2019. 1, 2

work page 2019
[22]

Local- ization of unmanned aerial vehicles using terrain classification from aerial images

Andreas Masselli, Richard Hanten, and Andreas Zell. Local- ization of unmanned aerial vehicles using terrain classification from aerial images. InIntelligent Autonomous Systems 13, pages 831–842. Springer International Publishing, 2016. 1

work page 2016
[23]

Assisting uav localization via deep contextual image matching.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:2445– 2457, 2021

Muhammad Hamza Mughal, Muhammad Jawad Khokhar, and Muhammad Shahzad. Assisting uav localization via deep contextual image matching.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:2445– 2457, 2021. 1

work page 2021
[24]

Avdelidis, and Xavier P

Parham Nooralishahi, Clemente Ibarra-Castanedo, Shakeb Deane, Fernando L´opez, Shashank Pant, Marc Genest, Nico- las P. Avdelidis, and Xavier P. V . Maldague. Drone-based non-destructive inspection of industrial sites: A review and case studies.Drones, 5(4):106, 2021. 1

work page 2021
[25]

High-precision visual geo-localization of uav based on hierarchical localization.Expert Systems with Applications, 267:126064, 2025

Xiong Qiu, Shouyi Liao, Dongfang Yang, Yongfei Li, and Shicheng Wang. High-precision visual geo-localization of uav based on hierarchical localization.Expert Systems with Applications, 267:126064, 2025. 3

work page 2025
[26]

Uavs-based visual localization via attention-driven image registration across varying texture levels.Drones, 8(12):739, 2024

Yan Ren, Guohai Dong, Tianbo Zhang, Meng Zhang, Xinyu Chen, and Mingliang Xue. Uavs-based visual localization via attention-driven image registration across varying texture levels.Drones, 8(12):739, 2024. 1

work page 2024
[27]

Gas-drone: Portable gas sensing system on uavs for gas leakage local- ization

Maurizio Rossi, Davide Brunelli, Andrea Adami, Leandro Lorenzelli, Fabio Menna, and Fabio Remondino. Gas-drone: Portable gas sensing system on uavs for gas leakage local- ization. In2014 IEEE SENSORS, pages 1431–1434, 2014. 1

work page 2014
[28]

Orienternet: Visual localization in 2d public maps with neural matching

Paul-Edouard Sarlin, Daniel DeTone, Tsun-Yi Yang, Armen Avetisyan, Julian Straub, Tomasz Malisiewicz, Samuel Rota Bulo, Richard Newcombe, Peter Kontschieder, and Vasileios Balntas. Orienternet: Visual localization in 2d public maps with neural matching. InCVPR, pages 21632–21642. IEEE,

work page
[29]

Vpair - aerial visual place recognition and localization in large-scale outdoor environments.arXiv preprint arXiv:2205.11567,

Michael Schleiss, Fahmi Rouatbi, and Daniel Cremers. Vpair - aerial visual place recognition and localization in large-scale outdoor environments.arXiv preprint arXiv:2205.11567,

work page arXiv
[30]

Cvusa:wide-area image geolocalization with aerial reference imagery

Scott Workman, Richard Souvenir, and Nathan Jacobs. Cvusa:wide-area image geolocalization with aerial reference imagery. InICCV, pages 3961–3969, 2015. 2

work page 2015
[31]

Uav pose estimation using cross-view geolocalization with satellite imagery

Akshay Shetty and Grace Xingxin Gao. Uav pose estimation using cross-view geolocalization with satellite imagery. In ICRA, pages 1827–1833, 2019. 2

work page 2019
[32]

Where am i looking at? joint location and orientation estima- tion by cross-view matching

Yujiao Shi, Xin Yu, Dylan Campbell, and Hongdong Li. Where am i looking at? joint location and orientation estima- tion by cross-view matching. InCVPR, pages 4063–4071. IEEE, 2020. 2

work page 2020
[33]

Accurate 3-dof camera geo-localization via ground-to-satellite image matching.IEEE transactions on pattern analysis and machine intelligence, 45(3):2682–2697,

Yujiao Shi, Xin Yu, Liu Liu, Dylan Campbell, Piotr Koniusz, and Hongdong Li. Accurate 3-dof camera geo-localization via ground-to-satellite image matching.IEEE transactions on pattern analysis and machine intelligence, 45(3):2682–2697,

work page
[34]

Boosting 3-dof ground-to-satellite camera localiza- tion accuracy via geometry-guided cross-view transformer

Yujiao Shi, Fei Wu, Akhil Perincherry, Ankit V ora, and Hong- dong Li. Boosting 3-dof ground-to-satellite camera localiza- tion accuracy via geometry-guided cross-view transformer. In ICCV, pages 21459–21469. IEEE, 2023. 2

work page 2023
[35]

Weakly-supervised camera localization by ground-to-satellite image registration

Yujiao Shi, Hongdong Li, Akhil Perincherry, and Ankit V ora. Weakly-supervised camera localization by ground-to-satellite image registration. InECCV, pages 39–57. Springer, 2024. 3

work page 2024
[36]

Wind turbine surface damage detection by deep learning aided drone inspection analysis.Energies, 12(4):676, 2019

ASM Shihavuddin, Xiao Chen, Vladimir Fedorov, Anders Nymark Christensen, Nicolai Andre Brogaard Riis, Kim Bran- ner, Anders Bjorholm Dahl, and Rasmus Reinhold Paulsen. Wind turbine surface damage detection by deep learning aided drone inspection analysis.Energies, 12(4):676, 2019. 1

work page 2019
[37]

Very deep convo- lutional networks for large-scale image recognition.ICLR,

Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.ICLR,

work page
[38]

Jian Sun, Kangdao Liu, Chi Zhang, Chuangquan Chen, Junge Shen, and Chi-Man V ong. Pfed-cross-view uav geo- localization with precision-focused efficient design: A hierar- chical distillation approach with multi-view refinement.arXiv preprint arXiv:2510.22582, 2025. 3

work page arXiv 2025
[39]

Absolute pose estimation of uav based on large-scale satellite image.Chinese Journal of Aeronautics, 37(6):219–231, 2024

Hanyu Wang, Qiang Shen, Zilong Deng, Xinyi Cao, and Xiaokang Wang. Absolute pose estimation of uav based on large-scale satellite image.Chinese Journal of Aeronautics, 37(6):219–231, 2024. 3

work page 2024
[40]

View consistent purification for accurate cross-view localization

Shan Wang, Yanhao Zhang, Akhil Perincherry, Ankit V ora, and Hongdong Li. View consistent purification for accurate cross-view localization. InICCV, pages 8197–8206, 2023. 3

work page 2023
[41]

View from above: Orthogonal-view aware cross-view localization

Shan Wang, Chuong Nguyen, Jiawei Liu, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Kaihao Zhang, and Hongdong Li. View from above: Orthogonal-view aware cross-view localization. InCVPR, pages 14843–14852. IEEE,

work page
[42]

Non-local neural networks

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. InCVPR, pages 7794–7803. IEEE, 2018. 3

work page 2018
[43]

Fine-grained cross-view geo-localization using a correlation-aware homography estimator

Xiaolong Wang, Runsen Xu, Zhuofan Cui, Zeyu Wan, and Yu Zhang. Fine-grained cross-view geo-localization using a correlation-aware homography estimator. InNeurIPS, pages 5301–5319. Curran Associates, Inc., 2023. 2

work page 2023
[44]

Angle robustness unmanned aerial vehicle navigation in gnss-denied scenarios.AAAI, 38(9): 10386–10394, 2024

Yuxin Wang, Zunlei Feng, Haofei Zhang, Yang Gao, Jie Lei, Li Sun, and Mingli Song. Angle robustness unmanned aerial vehicle navigation in gnss-denied scenarios.AAAI, 38(9): 10386–10394, 2024. 2

work page 2024
[45]

Vecmaplocnet: Vision-based uav localization using vector maps in gnss- denied environments.ISPRS Journal of Photogrammetry and Remote Sensing, 225:362–381, 2025

Zhen Wang, Dianxi Shi, Chunping Qiu, Songchang Jin, Tongyue Li, Ziteng Qiao, and Yang Chen. Vecmaplocnet: Vision-based uav localization using vector maps in gnss- denied environments.ISPRS Journal of Photogrammetry and Remote Sensing, 225:362–381, 2025. 3

work page 2025
[46]

Vision-based learning for drones: A survey.IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2025

Jiaping Xiao, Rangya Zhang, Yuhang Zhang, and Mir Fer- oskhan. Vision-based learning for drones: A survey.IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2025. 1

work page 2025
[47]

3d positioning of drones through images.Sensors, 24(17):5491,

Jianxing Yang, Enhui Zheng, Jiqi Fan, and Yuwen Yao. 3d positioning of drones through images.Sensors, 24(17):5491,

work page
[48]

Kaiqian Yang, Yujin Zhang, Li Wang, A. A. M. Muzahid, Ferdous Sohel, Fei Wu, and Qiong Wu. Vimgeo: An efficient visual model for cross-view geo-localization.Electronics, 14 (19):3906, 2025. 2

work page 2025
[49]

Uav geo-localization dataset and method based on cross-view matching.Sensors, 24(21):6905, 2024

Yuwen Yao, Cheng Sun, Tao Wang, Jianxing Yang, and Enhui Zheng. Uav geo-localization dataset and method based on cross-view matching.Sensors, 24(21):6905, 2024. 1

work page 2024
[50]

Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

Yibin Ye, Xichao Teng, Shuo Chen, Zhang Li, Leqi Liu, Qifeng Yu, and Tao Tan. Exploring the best way for uav visual localization under low-altitude multi-view observation condition: A benchmark.arXiv preprint arXiv:2503.10692,

work page internal anchor Pith review Pith/arXiv arXiv
[51]

Hierarchical image matching for uav absolute visual localization via semantic and structural constraints.arXiv preprint arXiv:2506.09748, 2025

Xiangkai Zhang, Xiang Zhou, Mao Chen, Yuchen Lu, Xu Yang, and Zhiyong Liu. Hierarchical image matching for uav absolute visual localization via semantic and structural constraints.arXiv preprint arXiv:2506.09748, 2025. 2

work page arXiv 2025
[52]

University-1652: A multi-view multi-source benchmark for drone-based geo- localization

Zhedong Zheng, Yunchao Wei, and Yi Yang. University-1652: A multi-view multi-source benchmark for drone-based geo- localization. InACM MM, pages 1395–1403, 2020. 1, 2, 6

work page 2020
[53]

Curml: A curriculum machine learning library

Yuwei Zhou, Hong Chen, Zirui Pan, Chuanhao Yan, Fanqi Lin, Xin Wang, and Wenwu Zhu. Curml: A curriculum machine learning library. InACM MM, pages 7359–7363,

work page
[54]

Curbench: curriculum learning benchmark

Yuwei Zhou, Zirui Pan, Xin Wang, Hong Chen, Haoyang Li, Yanwen Huang, Zhixiao Xiong, Fangzhou Xiong, Peiyang Xu, Wenwu Zhu, et al. Curbench: curriculum learning benchmark. InICML, 2024. 2

work page 2024
[55]

Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite.IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4825–4839, 2023

Runzhe Zhu, Ling Yin, Mingze Yang, Fei Wu, Yuncheng Yang, and Wenbo Hu. Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite.IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4825–4839, 2023. 1, 2, 6

work page 2023
[56]

Sat.” and “UA V

Sijie Zhu, Taojiannan Yang, and Chen Chen. Vigor: Cross- view image geo-localization beyond one-to-one retrieval. In CVPR, pages 3640–3649, 2021. 1, 2 Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UA V Navigation Supplementary Material A.1. List of Acronyms For clarity, the main acronyms used in this paper are gro...

work page arXiv 2021