MMLandmarks supplies 197k aerial and 329k ground images plus text and GPS for 18,557 landmarks to benchmark multimodal geo-spatial understanding.
Simple, effective and general: A new backbone for cross-view image geo-localization
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5roles
background 1polarities
background 1representative citing papers
Parameter-efficient fine-tuning lets MLLMs serve as effective retrievers for natural-language-guided cross-view geo-localization, beating dual-encoder baselines on GeoText-1652 and CVG-Text while using far fewer trainable parameters.
GeoBridge introduces a semantic-anchor mechanism using text to bridge multi-view image features for bidirectional cross-view and language-to-image geo-localization, supported by the new GeoLoc dataset of over 50,000 aligned pairs.
BGG adapts vision foundation models using multi-granularity dilated convolutions and frequency-domain patch aggregation to achieve state-of-the-art cross-view geo-localization on University-1652 and SUES-200 with low training cost.
citing papers explorer
-
MMLANDMARKS: a Cross-View Instance-Level Benchmark for Geo-Spatial Understanding
MMLandmarks supplies 197k aerial and 329k ground images plus text and GPS for 18,557 landmarks to benchmark multimodal geo-spatial understanding.
-
Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization
Parameter-efficient fine-tuning lets MLLMs serve as effective retrievers for natural-language-guided cross-view geo-localization, beating dual-encoder baselines on GeoText-1652 and CVG-Text while using far fewer trainable parameters.
-
GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization
GeoBridge introduces a semantic-anchor mechanism using text to bridge multi-view image features for bidirectional cross-view and language-to-image geo-localization, supported by the new GeoLoc dataset of over 50,000 aligned pairs.
-
BGG: Bridging the Geometric Gap between Cross-View images by Vision Foundation Model Adaptation for Geo-Localization
BGG adapts vision foundation models using multi-granularity dilated convolutions and frequency-domain patch aggregation to achieve state-of-the-art cross-view geo-localization on University-1652 and SUES-200 with low training cost.
- InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization