BGG: Bridging the Geometric Gap between Cross-View images by Vision Foundation Model Adaptation for Geo-Localization
Pith reviewed 2026-05-12 04:53 UTC · model grok-4.3
The pith
Adapting a vision foundation model with multi-granularity and frequency modules bridges geometric gaps between drone and satellite views for better geo-localization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BGG adapts a vision foundation model through a Multi-granularity Feature Enhancement Adapter (MFEA) that employs multi-level dilated convolutions to enhance scale adaptability and viewpoint robustness, thereby bridging the cross-view geometric gap with small training costs, combined with a Frequency-Aware Structural Aggregation (FASA) module that modulates patch tokens in the frequency domain and performs adaptive aggregation to enhance local structural features. The enhanced local features are fused with the CLS token to enable more accurate cross-view geo-localization, yielding state-of-the-art performance on the University-1652 and SUES-200 datasets.
What carries the argument
BGG adaptation framework consisting of the MFEA module (multi-level dilated convolutions for multi-granularity feature enhancement) and the FASA module (frequency-domain modulation and adaptive aggregation of patch tokens to supplement the CLS token).
If this is right
- The adapted model captures robust and consistent features from cross-view images by leveraging VFM general representations.
- Fusing frequency-enhanced local features with the CLS token improves image retrieval precision for geolocation.
- The framework achieves state-of-the-art localization results on University-1652 and SUES-200 while using low training costs.
- The generalization capabilities of the VFM are utilized to handle viewpoint and scale variations without full retraining.
Where Pith is reading between the lines
- The frequency-domain handling in FASA could extend to other retrieval tasks where structural consistency across domains matters, such as medical or remote-sensing image matching.
- Parameter-efficient adapters of this form might reduce data requirements for new cross-view problems by building on existing foundation model weights.
- If the modules prove stable across datasets, the method could support real-time updates to geo-localization systems with minimal compute.
Load-bearing premise
The MFEA and FASA modules will reliably bridge geometric gaps across arbitrary cross-view image pairs without introducing new artifacts or requiring dataset-specific hyperparameter tuning that raises training costs.
What would settle it
On a held-out cross-view dataset with larger scale or viewpoint shifts than University-1652, if BGG shows no accuracy gain over a plain VFM baseline while its training cost rises above the claimed low level, the bridging claim would not hold.
Figures
read the original abstract
Geometric differences between cross-view images, such as drone and satellite views, significantly increase the challenge of Cross-View Geo-Localization (CVGL), which aims to acquire the geolocation of images by image retrieval. To further enhance the CVGL performance, this paper proposes a parameter-efficient adaptation framework for bridging the geometric gap across images based on the vision foundation model (VFM) (e.g., DINOv3), termed BGG. BGG not only effectively leverages the general visual representations of VFM and captures the robust and consistent features from cross-view images, but also utilizes the generalization capabilities of the VFM, significantly improving the CVGL performance. It mainly contains a Multi-granularity Feature Enhancement Adapter (MFEA) and a Frequency-Aware Structural Aggregation (FASA) module. Specifically, MFEA enhances the scale adaptability and viewpoint robustness of features by multi-level dilated convolutions, effectively bridging the cross-view geometric gap with small training costs. Additionally, considering the [CLS] token lacks spatial details for precise image retrieval and localization, the FASA module modulates patch tokens in the frequency domain and performs adaptive aggregation for local structural feature enhancement. Finally, BGG fuses the enhanced local features with the [CLS] token for more accurate CVGL. Extensive experiments on University-1652 and SUES-200 datasets demonstrate that BGG has significant advantages over other methods and achieves state-of-the-art localization performance with low training costs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes BGG, a parameter-efficient adaptation framework for cross-view geo-localization (CVGL) that adapts a frozen vision foundation model (DINOv3). It introduces the Multi-granularity Feature Enhancement Adapter (MFEA) using multi-level dilated convolutions to improve scale adaptability and viewpoint robustness, and the Frequency-Aware Structural Aggregation (FASA) module that modulates patch tokens in the frequency domain with adaptive aggregation to enhance local structural features. The enhanced local features are fused with the [CLS] token for image retrieval. Experiments on University-1652 and SUES-200 datasets are reported to achieve state-of-the-art performance with low training costs.
Significance. If the empirical gains hold under scrutiny and the MFEA/FASA modules generalize without dataset-specific retuning or frequency artifacts, the work would provide a practical demonstration of leveraging VFMs for geometric gap bridging in CVGL, potentially enabling more efficient adaptation with reduced compute while maintaining or improving retrieval accuracy.
major comments (2)
- [§3.2] §3.2 (FASA module description): The frequency-domain modulation and adaptive aggregation of patch tokens is asserted to enhance local structure without introducing misalignments or artifacts under extreme viewpoint/scale shifts, but no analysis, visualizations, or ablation on frequency parameter sensitivity is referenced to confirm this; this is load-bearing for the central claim that FASA reliably bridges geometric gaps.
- [§4] §4 (Experiments): The SOTA claims and 'low training costs' plus 'generalization capabilities' assertions rest on results from only University-1652 and SUES-200; no cross-dataset transfer experiments (e.g., training on one and testing on another without retuning MFEA/FASA hyperparameters) or checks for degradation on other cross-view pairs are described, weakening support for the generalization claim.
minor comments (2)
- [Abstract] Abstract and §1: The phrasing 'significantly improving the CVGL performance' and 'significant advantages' is repeated without immediate quantitative anchors; consider adding a brief reference to the reported metrics (e.g., recall@1 gains) for clarity.
- [§3] Notation in §3: The description of MFEA's multi-level dilated convolutions and FASA's frequency modulation would benefit from an explicit equation or diagram label for the aggregation step to aid reproducibility.
Simulated Author's Rebuttal
Thank you for reviewing our manuscript and providing valuable feedback. We appreciate the referee's recognition of the potential of our BGG framework. We address the major comments point-by-point below, proposing revisions where necessary to strengthen the paper.
read point-by-point responses
-
Referee: [§3.2] §3.2 (FASA module description): The frequency-domain modulation and adaptive aggregation of patch tokens is asserted to enhance local structure without introducing misalignments or artifacts under extreme viewpoint/scale shifts, but no analysis, visualizations, or ablation on frequency parameter sensitivity is referenced to confirm this; this is load-bearing for the central claim that FASA reliably bridges geometric gaps.
Authors: We thank the referee for highlighting this important aspect. While the empirical results on the benchmarks demonstrate the effectiveness of FASA in improving retrieval accuracy without apparent degradation from artifacts, we agree that explicit analysis would better support the claim. In the revised version, we will add: (1) visualizations showing the frequency spectra and reconstructed spatial features pre- and post-FASA to illustrate artifact-free enhancement; (2) an ablation study varying the frequency modulation parameters (e.g., low/high frequency emphasis) and reporting performance under controlled extreme scale and viewpoint variations. These additions will confirm that FASA bridges geometric gaps reliably. revision: yes
-
Referee: [§4] §4 (Experiments): The SOTA claims and 'low training costs' plus 'generalization capabilities' assertions rest on results from only University-1652 and SUES-200; no cross-dataset transfer experiments (e.g., training on one and testing on another without retuning MFEA/FASA hyperparameters) or checks for degradation on other cross-view pairs are described, weakening support for the generalization claim.
Authors: We acknowledge that dedicated cross-dataset transfer experiments would provide more direct evidence for the generalization capabilities. Although University-1652 and SUES-200 represent distinct environments (one university campus with drone/satellite, the other suburban with varying altitudes), and our method achieves SOTA on both using identical hyperparameters, we will include in the revision: training on University-1652 and evaluating zero-shot on SUES-200 (and the reverse) without any retuning of MFEA or FASA. This will quantify any degradation and further validate the VFM's generalization in bridging geometric gaps across different cross-view pairs. revision: yes
Circularity Check
No significant circularity; empirical module design with independent experimental validation
full rationale
The paper presents a parameter-efficient adaptation framework (BGG) consisting of MFEA (multi-level dilated convolutions for scale/viewpoint robustness) and FASA (frequency-domain modulation and adaptive aggregation of patch tokens). No equations, derivations, or 'predictions' are defined that reduce by construction to fitted inputs or self-referential definitions. Central claims rest on empirical improvements reported on University-1652 and SUES-200 datasets rather than any self-citation chain or uniqueness theorem imported from the authors' prior work. The modules are described as novel designs leveraging a frozen VFM backbone (DINOv3), with no load-bearing step that renames a known result or smuggles an ansatz via citation. This is a standard honest non-finding for an applied CV adaptation paper.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MFEA employs multi-level dilated convolutions... FASA modulates patch tokens in the frequency domain and performs adaptive aggregation
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
BGG... parameter-efficient adaptation framework... low training costs
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Localizing and orienting street views using over- head imagery,
N. N. V o and J. Hays, “Localizing and orienting street views using over- head imagery,” inEuropean conference on computer vision. Springer, 2016, pp. 494–509
work page 2016
-
[2]
Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization,
S. Hu, M. Feng, R. M. Nguyen, and G. H. Lee, “Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7258–7267
work page 2018
-
[3]
Mccg: A convnext- based multiple-classifier method for cross-view geo-localization,
T. Shen, Y . Wei, L. Kang, S. Wan, and Y .-H. Yang, “Mccg: A convnext- based multiple-classifier method for cross-view geo-localization,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 3, pp. 1456–1468, 2023
work page 2023
-
[4]
Locating target re- gions for image retrieval in an unsupervised manner,
B.-J. Zhang, G.-H. Liu, Z.-Y . Li, and S.-X. Song, “Locating target re- gions for image retrieval in an unsupervised manner,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 3, pp. 4664– 4676, 2025
work page 2025
-
[5]
University-1652: A multi-view multi- source benchmark for drone-based geo-localization,
Z. Zheng, Y . Wei, and Y . Yang, “University-1652: A multi-view multi- source benchmark for drone-based geo-localization,” inProceedings of the 28th ACM international conference on Multimedia, 2020, pp. 1395– 1403
work page 2020
-
[6]
Deductive reinforcement learning for visual autonomous urban driving navigation,
C. Huang, R. Zhang, M. Ouyang, P. Wei, J. Lin, J. Su, and L. Lin, “Deductive reinforcement learning for visual autonomous urban driving navigation,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5379–5391, 2021
work page 2021
-
[7]
Cross-view image matching for geo-localization in urban environments,
Y . Tian, C. Chen, and M. Shah, “Cross-view image matching for geo-localization in urban environments,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3608–3616
work page 2017
-
[8]
Cross-view geo-localization with layer-to- layer transformer,
H. Yang, X. Lu, and Y . Zhu, “Cross-view geo-localization with layer-to- layer transformer,”Advances in Neural Information Processing Systems, vol. 34, pp. 29 009–29 020, 2021
work page 2021
-
[9]
Transgeo: Transformer is all you need for cross-view image geo-localization,
S. Zhu, M. Shah, and C. Chen, “Transgeo: Transformer is all you need for cross-view image geo-localization,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1162–1171
work page 2022
-
[10]
Y . Zhu, H. Yang, Y . Lu, and Q. Huang, “Simple, effective and general: A new backbone for cross-view image geo-localization,”arXiv preprint arXiv:2302.01572, 2023
-
[11]
Netvlad: Cnn architecture for weakly supervised place recognition,
R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307
work page 2016
-
[12]
Fine-tuning cnn image retrieval with no human annotation,
F. Radenovi ´c, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,”IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1655–1668, 2018
work page 2018
-
[13]
Sample4geo: Hard negative sam- pling for cross-view geo-localisation,
F. Deuser, K. Habel, and N. Oswald, “Sample4geo: Hard negative sam- pling for cross-view geo-localisation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 16 847–16 856
work page 2023
-
[14]
L. Ding, J. Zhou, L. Meng, and Z. Long, “A practical cross-view image matching method between uav and satellite for uav-based geo- localization,”Remote Sensing, vol. 13, no. 1, p. 47, 2020
work page 2020
-
[15]
Sdpl: Shifting-dense partition learning for uav-view geo-localization,
Q. Chen, T. Wang, Z. Yang, H. Li, R. Lu, Y . Sun, B. Zheng, and C. Yan, “Sdpl: Shifting-dense partition learning for uav-view geo-localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 11, pp. 11 810–11 824, 2024
work page 2024
-
[16]
Game4loc: A uav geo-localization benchmark from game data,
Y . Ji, B. He, Z. Tan, and L. Wu, “Game4loc: A uav geo-localization benchmark from game data,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 4, 2025, pp. 3913–3921
work page 2025
-
[17]
Uav-satellite view syn- thesis for cross-view geo-localization,
X. Tian, J. Shao, D. Ouyang, and H. T. Shen, “Uav-satellite view syn- thesis for cross-view geo-localization,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4804–4815, 2021
work page 2021
-
[18]
Spatial-aware feature aggregation for image based cross-view geo-localization,
Y . Shi, L. Liu, X. Yu, and H. Li, “Spatial-aware feature aggregation for image based cross-view geo-localization,”Advances in Neural Informa- tion Processing Systems, vol. 32, 2019
work page 2019
-
[19]
F3-net: Multiview scene matching for drone-based geo-localization,
B. Sun, G. Liu, and Y . Yuan, “F3-net: Multiview scene matching for drone-based geo-localization,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–11, 2023
work page 2023
-
[20]
Enhancing cross-view geo-localization with domain alignment and scene consistency,
P. Xia, Y . Wan, Z. Zheng, Y . Zhang, and J. Deng, “Enhancing cross-view geo-localization with domain alignment and scene consistency,”IEEE Transactions on Circuits and Systems for Video Technology, 2024
work page 2024
-
[21]
Each part matters: Local patterns facilitate cross-view geo-localization,
T. Wang, Z. Zheng, C. Yan, J. Zhang, Y . Sun, B. Zheng, and Y . Yang, “Each part matters: Local patterns facilitate cross-view geo-localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 2, pp. 867–879, 2021. 14
work page 2021
-
[22]
Direction-guided multiscale feature fusion network for geo- localization,
H. Lv, H. Zhu, R. Zhu, F. Wu, C. Wang, M. Cai, and K. Zhang, “Direction-guided multiscale feature fusion network for geo- localization,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–13, 2024
work page 2024
-
[23]
Joint representation learning and keypoint detection for cross-view geo-localization,
J. Lin, Z. Zheng, Z. Zhong, Z. Luo, S. Li, Y . Yang, and N. Sebe, “Joint representation learning and keypoint detection for cross-view geo-localization,”IEEE Transactions on Image Processing, vol. 31, pp. 3780–3792, 2022
work page 2022
-
[24]
O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoaet al., “Dinov3,” arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
Parameter-efficient transfer learning for nlp,
N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” inInternational conference on machine learning. PMLR, 2019, pp. 2790–2799
work page 2019
-
[26]
Enhancing domain generalization in medical image segmentation with global and local prompts,
C. Zhao and X. Li, “Enhancing domain generalization in medical image segmentation with global and local prompts,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 11, pp. 19 718– 19 732, 2025
work page 2025
-
[27]
Catastrophic forgetting in connectionist networks,
R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in cognitive sciences, vol. 3, no. 4, pp. 128–135, 1999
work page 1999
-
[28]
Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite,
R. Zhu, L. Yin, M. Yang, F. Wu, Y . Yang, and W. Hu, “Sues-200: A multi-height multi-scene cross-view image benchmark across drone and satellite,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 4825–4839, 2023
work page 2023
-
[29]
Wide-area image geolo- calization with aerial reference imagery,
S. Workman, R. Souvenir, and N. Jacobs, “Wide-area image geolo- calization with aerial reference imagery,” inProceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3961–3969
work page 2015
-
[30]
Lending orientation to neural networks for cross- view geo-localization,
L. Liu and H. Li, “Lending orientation to neural networks for cross- view geo-localization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5624–5633
work page 2019
-
[31]
Vigor: Cross-view image geo-localization beyond one-to-one retrieval,
S. Zhu, T. Yang, and C. Chen, “Vigor: Cross-view image geo-localization beyond one-to-one retrieval,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2021, pp. 3640–3649
work page 2021
-
[32]
Coming down to earth: Satellite-to-street view synthesis for geo-localization,
A. Toker, Q. Zhou, M. Maximov, and L. Leal-Taix ´e, “Coming down to earth: Satellite-to-street view synthesis for geo-localization,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6488–6497
work page 2021
-
[33]
Optimal feature transport for cross-view image geo-localization,
Y . Shi, X. Yu, L. Liu, T. Zhang, and H. Li, “Optimal feature transport for cross-view image geo-localization,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 990– 11 997
work page 2020
-
[34]
Cross-view geo- localization via learning disentangled geometric layout correspondence,
X. Zhang, X. Li, W. Sultani, Y . Zhou, and S. Wshah, “Cross-view geo- localization via learning disentangled geometric layout correspondence,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 3, 2023, pp. 3480–3488
work page 2023
-
[35]
Geodtr+: To- ward generic cross-view geolocalization via geometric disentanglement,
X. Zhang, X. Li, W. Sultani, C. Chen, and S. Wshah, “Geodtr+: To- ward generic cross-view geolocalization via geometric disentanglement,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[36]
M. Dai, J. Hu, J. Zhuang, and E. Zheng, “A transformer-based fea- ture segmentation and region alignment method for uav-view geo- localization,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4376–4389, 2021
work page 2021
-
[37]
H. Zhao, K. Ren, T. Yue, C. Zhang, and S. Yuan, “Transfg: A cross-view geo-localization of satellite and uavs imagery pipeline using transformer- based feature aggregation and gradient guidance,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–12, 2024
work page 2024
-
[38]
Z. Chen, Z.-X. Yang, and H.-J. Rong, “Multi-level embedding and alignment network with consistency and invariance learning for cross- view geo-localization,”IEEE Transactions on Geoscience and Remote Sensing, 2025
work page 2025
-
[39]
Mfaf: An eva02-based multi-scale frequency attention fusion method for cross-view geo-localization,
Y . Liu, T. Liu, and Y . GU, “Mfaf: An eva02-based multi-scale frequency attention fusion method for cross-view geo-localization,”arXiv preprint arXiv:2509.12673, 2025
-
[40]
B. Sun, M. Li, B. Sun, G. Liu, C. Bi, W. Wang, X. Feng, G. Zhang, and B. Hu, “Beyond spatial domain: Multi-view geo-localization with frequency-based positive-incentive information screening,”Remote Sens- ing, vol. 18, no. 1, p. 88, 2025
work page 2025
-
[41]
Masked au- toencoders are scalable vision learners,
K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009
work page 2022
-
[42]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763
work page 2021
-
[43]
Lora: Low-rank adaptation of large language models
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022
work page 2022
-
[44]
Adaptformer: Adapting vision transformers for scalable visual recogni- tion,
S. Chen, C. Ge, Z. Tong, J. Wang, Y . Song, J. Wang, and P. Luo, “Adaptformer: Adapting vision transformers for scalable visual recogni- tion,”Advances in Neural Information Processing Systems, vol. 35, pp. 16 664–16 678, 2022
work page 2022
-
[45]
Mv-adapter: Multi-view consistent image generation made easy,
Z. Huang, Y .-C. Guo, H. Wang, R. Yi, L. Ma, Y .-P. Cao, and L. Sheng, “Mv-adapter: Multi-view consistent image generation made easy,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 16 377–16 387
work page 2025
-
[46]
Learning cross-view visual geo-localization without ground truth,
H. Li, C. Xu, W. Yang, H. Yu, and G.-S. Xia, “Learning cross-view visual geo-localization without ground truth,”IEEE Transactions on Geoscience and Remote Sensing, 2024
work page 2024
-
[47]
Elp-adapters: Parameter efficient adapter tuning for various speech processing tasks,
N. Inoue, S. Otake, T. Hirose, M. Ohi, and R. Kawakami, “Elp-adapters: Parameter efficient adapter tuning for various speech processing tasks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
work page 2024
-
[48]
Vmt-adapter: Parameter- efficient transfer learning for multi-task dense scene understanding,
Y . Xin, J. Du, Q. Wang, Z. Lin, and K. Yan, “Vmt-adapter: Parameter- efficient transfer learning for multi-task dense scene understanding,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 14, 2024, pp. 16 085–16 093
work page 2024
-
[49]
Convolutional bypasses are better vision transformer adapters,
S. Jie and Z.-H. Deng, “Convolutional bypasses are better vision transformer adapters,”arXiv preprint arXiv:2207.07039, 2022
-
[50]
Robust cross-view geo-localization via content-viewpoint disentangle- ment,
K. Li, D. Wang, X. Wang, Z. Wu, Y . Zhang, Y . Wang, and Q. Wang, “Robust cross-view geo-localization via content-viewpoint disentangle- ment,”arXiv preprint arXiv:2505.11822, 2025
-
[51]
Multiple- environment self-adaptive network for aerial-view geo-localization,
T. Wang, Z. Zheng, Y . Sun, C. Yan, Y . Yang, and T.-S. Chua, “Multiple- environment self-adaptive network for aerial-view geo-localization,” Pattern Recognition, vol. 152, p. 110363, 2024
work page 2024
-
[52]
F. Ge, Y . Zhang, Y . Liu, G. Wang, S. Coleman, D. Kerr, and L. Wang, “Multibranch joint representation learning based on information fusion strategy for cross-view geo-localization,”IEEE Transactions on Geo- science and Remote Sensing, vol. 62, pp. 1–16, 2024
work page 2024
-
[53]
F. Ge, Y . Zhang, L. Wang, W. Liu, Y . Liu, S. Coleman, and D. Kerr, “Multilevel feedback joint representation learning network based on adaptive area elimination for cross-view geo-localization,”IEEE trans- actions on geoscience and remote sensing, vol. 62, pp. 1–15, 2024
work page 2024
-
[54]
Ccr: A counterfactual causal reasoning- based method for cross-view geo-localization,
H. Du, J. He, and Y . Zhao, “Ccr: A counterfactual causal reasoning- based method for cross-view geo-localization,”IEEE Transactions on Circuits and Systems for Video Technology, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.