InInternational conference on ma- chine learning, pages 19730–19742

Ling Li, Yao Zhou, Yuxuan Liang, Fugee Tsung, Jiaheng Wei · 2025 · arXiv 2506.14674

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Skill-Conditioned Visual Geolocation for Vision-Language Models

cs.CV · 2026-04-10 · unverdicted · novelty 7.0 · 2 refs

GeoSkill lets vision-language models improve geolocation accuracy and reasoning by maintaining an evolving Skill-Graph that grows through autonomous analysis of successful and failed rollouts on web-scale image data.

From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models

cs.CV · 2025-08-03 · unverdicted · novelty 7.0

IMAGEO-Bench evaluates 10 LLMs on image geolocalization across global street scenes, US POIs, and private images, revealing closed-source model advantages and biases favoring high-resource regions.

Evian: Towards Explainable Visual Instruction-tuning Data Auditing

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

EVian decomposes vision-language model responses into three cognitive components and audits them along consistency, coherence, and accuracy axes, showing that a small curated subset outperforms much larger training sets.

citing papers explorer

Showing 3 of 3 citing papers.

Skill-Conditioned Visual Geolocation for Vision-Language Models cs.CV · 2026-04-10 · unverdicted · none · ref 15 · 2 links
GeoSkill lets vision-language models improve geolocation accuracy and reasoning by maintaining an evolving Skill-Graph that grows through autonomous analysis of successful and failed rollouts on web-scale image data.
From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models cs.CV · 2025-08-03 · unverdicted · none · ref 23
IMAGEO-Bench evaluates 10 LLMs on image geolocalization across global street scenes, US POIs, and private images, revealing closed-source model advantages and biases favoring high-resource regions.
Evian: Towards Explainable Visual Instruction-tuning Data Auditing cs.CV · 2026-04-22 · unverdicted · none · ref 4
EVian decomposes vision-language model responses into three cognitive components and audits them along consistency, coherence, and accuracy axes, showing that a small curated subset outperforms much larger training sets.

InInternational conference on ma- chine learning, pages 19730–19742

fields

years

verdicts

representative citing papers

citing papers explorer