Recognition: no theorem link
Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UAV Navigation
Pith reviewed 2026-05-15 00:27 UTC · model grok-4.3
The pith
Bearing-UAV enables vision-only UAV navigation by jointly predicting absolute location and heading from unaligned aerial and satellite views without storing map tiles.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Bearing-UAV is a purely vision-driven cross-view navigation method that jointly predicts UAV absolute location and heading from neighboring features, leveraging global and local structural features while explicitly encoding relative spatial relationships to remain robust under cross-view variations, misalignment, and feature-sparse conditions.
What carries the argument
Bearing-UAV, which jointly predicts location and heading by combining global-local structural features with relative spatial relationship encodings.
Load-bearing premise
That the substantial discrepancies and varying overlaps between aerial and satellite views have been adequately handled so the approach generalizes to real-world UAV flights.
What would settle it
A set of real UAV flights over feature-sparse terrain or areas with large viewpoint overlaps where the reported localization error exceeds that of tile-matching baselines would falsify the claimed advantage.
Figures
read the original abstract
Recent advances in cross-view geo-localization (CVGL) methods have shown strong potential for supporting unmanned aerial vehicle (UAV) navigation in GNSS-denied environments. However, existing work predominantly focuses on matching UAV views to onboard map tiles, which introduces an inherent trade-off between accuracy and storage overhead, and overlooks the importance of the UAV's heading during navigation. Moreover, the substantial discrepancies and varying overlaps in cross-view scenarios have been insufficiently considered, limiting their generalization to real-world scenarios. In this paper, we present Bearing-UAV, a purely vision-driven cross-view navigation method that jointly predicts UAV absolute location and heading from neighboring features, enabling accurate, lightweight, and robust navigation in the wild. Our method leverages global and local structural features and explicitly encodes relative spatial relationships, making it robust to cross-view variations, misalignment, and feature-sparse conditions. We also present Bearing-UAV-90k, a multi-city benchmark for evaluating cross-view localization and navigation. Extensive experiments show encouraging results that Bearing-UAV yields lower localization error than previous matching/retrieval paradigm across diverse terrains. Our code and dataset will be made publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Bearing-UAV, a vision-only cross-view geo-localization method for UAV navigation that jointly predicts absolute location and heading by encoding global/local structural features and relative spatial relationships. It contrasts this with prior tile-matching approaches, which incur storage-accuracy trade-offs and ignore heading. The authors release the Bearing-UAV-90k multi-city benchmark and report that the method achieves lower localization error than matching/retrieval baselines across diverse terrains, with claimed robustness to misalignment, varying overlaps, and feature sparsity.
Significance. If the reported error reductions hold under detailed scrutiny, the work offers a lightweight, map-tile-free alternative for GNSS-denied UAV navigation with potential practical impact. The new benchmark could serve as a useful community resource for evaluating cross-view robustness. The explicit modeling of relative spatial relationships is a conceptually clean contribution that directly targets the misalignment issues highlighted in the abstract.
major comments (2)
- [§4, Table 2] §4 (Experiments) and Table 2: The central claim of lower localization error versus matching baselines is asserted, but the manuscript provides no error bars, statistical significance tests, or per-terrain variance; without these, it is impossible to determine whether the reported gains are reliable or driven by a few easy sequences.
- [§3.2, §5] §3.2 (Method) and §5 (Discussion): The robustness to 'substantial discrepancies and varying overlaps' is a load-bearing claim for real-world UAV applicability, yet the paper does not quantify overlap statistics or misalignment distributions in Bearing-UAV-90k nor include an ablation that isolates the relative-spatial-encoding module under controlled overlap reduction.
minor comments (3)
- [Abstract] Abstract: Replace the vague phrase 'encouraging results' with the actual mean localization error reduction (e.g., 'X% lower median error') to strengthen the summary.
- [§3] Notation: Define the joint loss for location and heading prediction explicitly (currently described only at high level) and ensure symbols for bearing and position are used consistently between equations and text.
- [Figure 3] Figure 3: The qualitative examples would benefit from overlaying predicted versus ground-truth trajectories with error vectors to visually support the quantitative claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the opportunity to clarify and improve our manuscript. We address each major comment point by point below.
read point-by-point responses
-
Referee: [§4, Table 2] §4 (Experiments) and Table 2: The central claim of lower localization error versus matching baselines is asserted, but the manuscript provides no error bars, statistical significance tests, or per-terrain variance; without these, it is impossible to determine whether the reported gains are reliable or driven by a few easy sequences.
Authors: We agree that including error bars, statistical significance tests, and per-terrain variance would strengthen the reliability of our claims. In the revised manuscript, we will report standard deviations across multiple evaluation runs or sequences in Table 2. We will also perform and report statistical significance tests (such as paired t-tests) to confirm that the observed improvements are significant. Furthermore, we will add per-terrain variance analysis to demonstrate that the gains hold across diverse terrains rather than being driven by specific sequences. revision: yes
-
Referee: [§3.2, §5] §3.2 (Method) and §5 (Discussion): The robustness to 'substantial discrepancies and varying overlaps' is a load-bearing claim for real-world UAV applicability, yet the paper does not quantify overlap statistics or misalignment distributions in Bearing-UAV-90k nor include an ablation that isolates the relative-spatial-encoding module under controlled overlap reduction.
Authors: We acknowledge the importance of quantifying the overlap and misalignment characteristics in our benchmark to support the robustness claims. In the revision, we will include detailed statistics on overlap ratios and misalignment distributions within the Bearing-UAV-90k dataset description. Additionally, we will conduct and present an ablation study that isolates the contribution of the relative-spatial-encoding module by evaluating performance on controlled subsets with reduced overlaps. These additions will be incorporated into Sections 3.2 and 5. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents Bearing-UAV as an independent neural architecture for joint location and heading prediction from cross-view features, together with a new benchmark dataset. No equations, derivations, or parameter-fitting steps are described that reduce by construction to the inputs or to self-citations. The central claims rest on experimental comparisons against matching baselines across terrains, which are externally falsifiable and do not rely on load-bearing self-referential definitions, uniqueness theorems, or ansatzes imported from prior author work. The contribution is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Long Chen, Bo Wu, Ran Duan, and Zeyu Chen. Real-time cross-view image matching and camera pose determination for unmanned aerial vehicles.Photogrammetric Engineering & Remote Sensing, 90(6):371–381, 2024. 2, 3
work page 2024
-
[2]
Nanxing Chen, Jiqi Fan, Jiayu Yuan, and Enhui Zheng. Obtpn: A vision-based network for uav geo-localization in multi- altitude environments.Drones, 9(1):33, 2025. 2
work page 2025
- [3]
-
[4]
Zhuofan Cui, Pengwei Zhou, Xiaolong Wang, Zilun Zhang, Yingxuan Li, Hongbo Li, and Yu Zhang. A novel geo- localization method for uav and satellite images using cross- view consistent attention.Remote Sensing, 15(19):4667, 2023. 2
work page 2023
-
[5]
Ming Dai, Jianhong Hu, Jiedong Zhuang, and Enhui Zheng. A transformer-based feature segmentation and region alignment method for uav-view geo-localization.IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4376–4389,
-
[6]
Ming Dai, Enhui Zheng, Zhenhua Feng, Lei Qi, Jiedong Zhuang, and Wankou Yang. Denseuav2:vision-based uav self-positioning in low-altitude urban environments.IEEE Transactions on Image Processing, 33:493–508, 2024. 1, 2, 6
work page 2024
-
[7]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255, 2009. 5
work page 2009
-
[8]
Oussema Dhaouadi, Riccardo Marin, Johannes Meier, Jacques Kaiser, and Daniel Cremers. Ortholoc: Uav 6-dof localization and calibration using orthographic geodata.arXiv preprint arXiv:2509.18350, 2025. 3
-
[9]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021. 4
work page 2021
-
[10]
Ran Duan, Long Chen, Zhaojin Li, Zeyu Chen, and Bo Wu. Sgmnet:a scene graph encoding and matching network for uav visual localization.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17:9890– 9902, 2024. 2, 3
work page 2024
-
[11]
Cross-view geo-localization: A survey.IEEE Access, 12:192028–192050, 2024
Abhilash Durgam, Sidike Paheding, Vikas Dhiman, and Vijay Devabhaktuni. Cross-view geo-localization: A survey.IEEE Access, 12:192028–192050, 2024. 1, 2
work page 2024
-
[12]
Vision-based gnss-free localization for uavs in the wild
Marius-Mihail Gurgu, Jorge Pe˜na Queralta, and Tomi Wester- lund. Vision-based gnss-free localization for uavs in the wild. InICMERR, pages 7–12, 2022. 1
work page 2022
-
[13]
Jing He and Qian Wu. A localization method for uav aerial im- ages based on semantic topological feature matching.Remote Sensing, 17(10):1671, 2025. 1
work page 2025
-
[14]
Foundloc: Vision-based onboard aerial localization in the wild.arXiv preprint arXiv:2310.16299, 2023
Yao He, Ivan Cisneros, Nikhil Keetha, Jay Patrikar, Zelin Ye, Ian Higgins, Yaoyu Hu, Parv Kapoor, and Sebastian Scherer. Foundloc: Vision-based onboard aerial localization in the wild.arXiv preprint arXiv:2310.16299, 2023. 3
-
[15]
Kaiji Hou, Qiang Tong, Na Yan, Xiulei Liu, and Shoulu Hou. Mcfa: Multi-scale cascade and feature adaptive alignment network for cross-view geo-localization.Sensors, 25(14): 4519, 2025. 2
work page 2025
-
[16]
Sixing Hu, Mengdan Feng, Rang M. H. Nguyen, and Gim Hee Lee. Cvm-net: Cross-view matching network for image- based ground-to-aerial geo-localization. InCVPR, pages 7258–7267. IEEE, 2018. 2
work page 2018
-
[17]
Imen Jarraya, Abdulrahman Al-Batati, Muhammad Bilal Kadri, Mohamed Abdelkader, Adel Ammar, Wadii Boulila, and Anis Koubaa. Gnss-denied unmanned aerial vehicle navi- gation: Analyzing computational complexity, sensor fusion, and localization methodologies.Satellite Navigation, 6(1):9,
-
[18]
Game4loc: A uav geo-localization benchmark from game data.AAAI, 39(4):3913–3921, 2025
Yuxiang Ji, Boyong He, Zhuoyue Tan, and Liaoni Wu. Game4loc: A uav geo-localization benchmark from game data.AAAI, 39(4):3913–3921, 2025. 1, 2, 6
work page 2025
-
[19]
Bevloc: Cross- view localization and matching via birds-eye-view synthesis
Christopher Klammer and Michael Kaess. Bevloc: Cross- view localization and matching via birds-eye-view synthesis. InIROS, pages 5656–5663, 2024. 1, 2
work page 2024
-
[20]
Jointly optimized global-local visual localization of uavs.arXiv preprint arXiv:2310.08082, 2023
Haoling Li, Jiuniu Wang, Zhiwei Wei, and Wenjia Xu. Jointly optimized global-local visual localization of uavs.arXiv preprint arXiv:2310.08082, 2023. 1
-
[21]
Cvact:lending orientation to neural networks for cross-view geo-localization
Liu Liu and Hongdong Li. Cvact:lending orientation to neural networks for cross-view geo-localization. InCVPR, pages 5624–5633, 2019. 1, 2
work page 2019
-
[22]
Local- ization of unmanned aerial vehicles using terrain classification from aerial images
Andreas Masselli, Richard Hanten, and Andreas Zell. Local- ization of unmanned aerial vehicles using terrain classification from aerial images. InIntelligent Autonomous Systems 13, pages 831–842. Springer International Publishing, 2016. 1
work page 2016
-
[23]
Muhammad Hamza Mughal, Muhammad Jawad Khokhar, and Muhammad Shahzad. Assisting uav localization via deep contextual image matching.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:2445– 2457, 2021. 1
work page 2021
-
[24]
Parham Nooralishahi, Clemente Ibarra-Castanedo, Shakeb Deane, Fernando L´opez, Shashank Pant, Marc Genest, Nico- las P. Avdelidis, and Xavier P. V . Maldague. Drone-based non-destructive inspection of industrial sites: A review and case studies.Drones, 5(4):106, 2021. 1
work page 2021
-
[25]
Xiong Qiu, Shouyi Liao, Dongfang Yang, Yongfei Li, and Shicheng Wang. High-precision visual geo-localization of uav based on hierarchical localization.Expert Systems with Applications, 267:126064, 2025. 3
work page 2025
-
[26]
Yan Ren, Guohai Dong, Tianbo Zhang, Meng Zhang, Xinyu Chen, and Mingliang Xue. Uavs-based visual localization via attention-driven image registration across varying texture levels.Drones, 8(12):739, 2024. 1
work page 2024
-
[27]
Gas-drone: Portable gas sensing system on uavs for gas leakage local- ization
Maurizio Rossi, Davide Brunelli, Andrea Adami, Leandro Lorenzelli, Fabio Menna, and Fabio Remondino. Gas-drone: Portable gas sensing system on uavs for gas leakage local- ization. In2014 IEEE SENSORS, pages 1431–1434, 2014. 1
work page 2014
-
[28]
Orienternet: Visual localization in 2d public maps with neural matching
Paul-Edouard Sarlin, Daniel DeTone, Tsun-Yi Yang, Armen Avetisyan, Julian Straub, Tomasz Malisiewicz, Samuel Rota Bulo, Richard Newcombe, Peter Kontschieder, and Vasileios Balntas. Orienternet: Visual localization in 2d public maps with neural matching. InCVPR, pages 21632–21642. IEEE,
-
[29]
Michael Schleiss, Fahmi Rouatbi, and Daniel Cremers. Vpair - aerial visual place recognition and localization in large-scale outdoor environments.arXiv preprint arXiv:2205.11567,
-
[30]
Cvusa:wide-area image geolocalization with aerial reference imagery
Scott Workman, Richard Souvenir, and Nathan Jacobs. Cvusa:wide-area image geolocalization with aerial reference imagery. InICCV, pages 3961–3969, 2015. 2
work page 2015
-
[31]
Uav pose estimation using cross-view geolocalization with satellite imagery
Akshay Shetty and Grace Xingxin Gao. Uav pose estimation using cross-view geolocalization with satellite imagery. In ICRA, pages 1827–1833, 2019. 2
work page 2019
-
[32]
Where am i looking at? joint location and orientation estima- tion by cross-view matching
Yujiao Shi, Xin Yu, Dylan Campbell, and Hongdong Li. Where am i looking at? joint location and orientation estima- tion by cross-view matching. InCVPR, pages 4063–4071. IEEE, 2020. 2
work page 2020
-
[33]
Yujiao Shi, Xin Yu, Liu Liu, Dylan Campbell, Piotr Koniusz, and Hongdong Li. Accurate 3-dof camera geo-localization via ground-to-satellite image matching.IEEE transactions on pattern analysis and machine intelligence, 45(3):2682–2697,
-
[34]
Yujiao Shi, Fei Wu, Akhil Perincherry, Ankit V ora, and Hong- dong Li. Boosting 3-dof ground-to-satellite camera localiza- tion accuracy via geometry-guided cross-view transformer. In ICCV, pages 21459–21469. IEEE, 2023. 2
work page 2023
-
[35]
Weakly-supervised camera localization by ground-to-satellite image registration
Yujiao Shi, Hongdong Li, Akhil Perincherry, and Ankit V ora. Weakly-supervised camera localization by ground-to-satellite image registration. InECCV, pages 39–57. Springer, 2024. 3
work page 2024
-
[36]
ASM Shihavuddin, Xiao Chen, Vladimir Fedorov, Anders Nymark Christensen, Nicolai Andre Brogaard Riis, Kim Bran- ner, Anders Bjorholm Dahl, and Rasmus Reinhold Paulsen. Wind turbine surface damage detection by deep learning aided drone inspection analysis.Energies, 12(4):676, 2019. 1
work page 2019
-
[37]
Very deep convo- lutional networks for large-scale image recognition.ICLR,
Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.ICLR,
- [38]
-
[39]
Hanyu Wang, Qiang Shen, Zilong Deng, Xinyi Cao, and Xiaokang Wang. Absolute pose estimation of uav based on large-scale satellite image.Chinese Journal of Aeronautics, 37(6):219–231, 2024. 3
work page 2024
-
[40]
View consistent purification for accurate cross-view localization
Shan Wang, Yanhao Zhang, Akhil Perincherry, Ankit V ora, and Hongdong Li. View consistent purification for accurate cross-view localization. InICCV, pages 8197–8206, 2023. 3
work page 2023
-
[41]
View from above: Orthogonal-view aware cross-view localization
Shan Wang, Chuong Nguyen, Jiawei Liu, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Kaihao Zhang, and Hongdong Li. View from above: Orthogonal-view aware cross-view localization. InCVPR, pages 14843–14852. IEEE,
-
[42]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. InCVPR, pages 7794–7803. IEEE, 2018. 3
work page 2018
-
[43]
Fine-grained cross-view geo-localization using a correlation-aware homography estimator
Xiaolong Wang, Runsen Xu, Zhuofan Cui, Zeyu Wan, and Yu Zhang. Fine-grained cross-view geo-localization using a correlation-aware homography estimator. InNeurIPS, pages 5301–5319. Curran Associates, Inc., 2023. 2
work page 2023
-
[44]
Yuxin Wang, Zunlei Feng, Haofei Zhang, Yang Gao, Jie Lei, Li Sun, and Mingli Song. Angle robustness unmanned aerial vehicle navigation in gnss-denied scenarios.AAAI, 38(9): 10386–10394, 2024. 2
work page 2024
-
[45]
Zhen Wang, Dianxi Shi, Chunping Qiu, Songchang Jin, Tongyue Li, Ziteng Qiao, and Yang Chen. Vecmaplocnet: Vision-based uav localization using vector maps in gnss- denied environments.ISPRS Journal of Photogrammetry and Remote Sensing, 225:362–381, 2025. 3
work page 2025
-
[46]
Jiaping Xiao, Rangya Zhang, Yuhang Zhang, and Mir Fer- oskhan. Vision-based learning for drones: A survey.IEEE Transactions on Neural Networks and Learning Systems, pages 1–21, 2025. 1
work page 2025
-
[47]
3d positioning of drones through images.Sensors, 24(17):5491,
Jianxing Yang, Enhui Zheng, Jiqi Fan, and Yuwen Yao. 3d positioning of drones through images.Sensors, 24(17):5491,
-
[48]
Kaiqian Yang, Yujin Zhang, Li Wang, A. A. M. Muzahid, Ferdous Sohel, Fei Wu, and Qiong Wu. Vimgeo: An efficient visual model for cross-view geo-localization.Electronics, 14 (19):3906, 2025. 2
work page 2025
-
[49]
Uav geo-localization dataset and method based on cross-view matching.Sensors, 24(21):6905, 2024
Yuwen Yao, Cheng Sun, Tao Wang, Jianxing Yang, and Enhui Zheng. Uav geo-localization dataset and method based on cross-view matching.Sensors, 24(21):6905, 2024. 1
work page 2024
-
[50]
Yibin Ye, Xichao Teng, Shuo Chen, Zhang Li, Leqi Liu, Qifeng Yu, and Tao Tan. Exploring the best way for uav visual localization under low-altitude multi-view observation condition: A benchmark.arXiv preprint arXiv:2503.10692,
work page internal anchor Pith review Pith/arXiv arXiv
-
[51]
Xiangkai Zhang, Xiang Zhou, Mao Chen, Yuchen Lu, Xu Yang, and Zhiyong Liu. Hierarchical image matching for uav absolute visual localization via semantic and structural constraints.arXiv preprint arXiv:2506.09748, 2025. 2
-
[52]
University-1652: A multi-view multi-source benchmark for drone-based geo- localization
Zhedong Zheng, Yunchao Wei, and Yi Yang. University-1652: A multi-view multi-source benchmark for drone-based geo- localization. InACM MM, pages 1395–1403, 2020. 1, 2, 6
work page 2020
-
[53]
Curml: A curriculum machine learning library
Yuwei Zhou, Hong Chen, Zirui Pan, Chuanhao Yan, Fanqi Lin, Xin Wang, and Wenwu Zhu. Curml: A curriculum machine learning library. InACM MM, pages 7359–7363,
-
[54]
Curbench: curriculum learning benchmark
Yuwei Zhou, Zirui Pan, Xin Wang, Hong Chen, Haoyang Li, Yanwen Huang, Zhixiao Xiong, Fangzhou Xiong, Peiyang Xu, Wenwu Zhu, et al. Curbench: curriculum learning benchmark. InICML, 2024. 2
work page 2024
-
[55]
Runzhe Zhu, Ling Yin, Mingze Yang, Fei Wu, Yuncheng Yang, and Wenbo Hu. Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite.IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4825–4839, 2023. 1, 2, 6
work page 2023
-
[56]
Sijie Zhu, Taojiannan Yang, and Chen Chen. Vigor: Cross- view image geo-localization beyond one-to-one retrieval. In CVPR, pages 3640–3649, 2021. 1, 2 Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UA V Navigation Supplementary Material A.1. List of Acronyms For clarity, the main acronyms used in this paper are gro...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.