Parameter-efficient fine-tuning lets MLLMs serve as effective retrievers for natural-language-guided cross-view geo-localization, beating dual-encoder baselines on GeoText-1652 and CVG-Text while using far fewer trainable parameters.
Skysense: A multi-modal remote sens- ing foundation model towards universal interpretation for earth observation imagery
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization
Parameter-efficient fine-tuning lets MLLMs serve as effective retrievers for natural-language-guided cross-view geo-localization, beating dual-encoder baselines on GeoText-1652 and CVG-Text while using far fewer trainable parameters.