GeoSkill lets vision-language models improve geolocation accuracy and reasoning by maintaining an evolving Skill-Graph that grows through autonomous analysis of successful and failed rollouts on web-scale image data.
Where on earth? a vision-language bench- mark for probing model geolocation skills across scales
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Vision-language models display large performance differences and clear limits in zero-shot country-level geolocalization from ground-view photos, with semantic cues helping coarse guesses but failing on fine details.
citing papers explorer
-
Skill-Conditioned Visual Geolocation for Vision-Language Models
GeoSkill lets vision-language models improve geolocation accuracy and reasoning by maintaining an evolving Skill-Graph that grows through autonomous analysis of successful and failed rollouts on web-scale image data.
-
Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization
Vision-language models display large performance differences and clear limits in zero-shot country-level geolocalization from ground-view photos, with semantic cues helping coarse guesses but failing on fine details.