GeoSkill lets vision-language models improve geolocation accuracy and reasoning by maintaining an evolving Skill-Graph that grows through autonomous analysis of successful and failed rollouts on web-scale image data.
InInternational conference on ma- chine learning, pages 19730–19742
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
IMAGEO-Bench evaluates 10 LLMs on image geolocalization across global street scenes, US POIs, and private images, revealing closed-source model advantages and biases favoring high-resource regions.
EVian decomposes vision-language model responses into three cognitive components and audits them along consistency, coherence, and accuracy axes, showing that a small curated subset outperforms much larger training sets.
citing papers explorer
-
Skill-Conditioned Visual Geolocation for Vision-Language Models
GeoSkill lets vision-language models improve geolocation accuracy and reasoning by maintaining an evolving Skill-Graph that grows through autonomous analysis of successful and failed rollouts on web-scale image data.
-
From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models
IMAGEO-Bench evaluates 10 LLMs on image geolocalization across global street scenes, US POIs, and private images, revealing closed-source model advantages and biases favoring high-resource regions.
-
Evian: Towards Explainable Visual Instruction-tuning Data Auditing
EVian decomposes vision-language model responses into three cognitive components and audits them along consistency, coherence, and accuracy axes, showing that a small curated subset outperforms much larger training sets.