Geox-bench: Benchmarking cross-view geo-localization and pose estimation capabilities of large multimodal models

Yushuo Zheng, Jiangyong Ying, Huiyu Duan, Chunyi Li, Zicheng Zhang, Jing Liu, Xiaohong Liu, Guangtao Zhai · 2025 · arXiv 2511.13259

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

Vision-language models display large performance differences and clear limits in zero-shot country-level geolocalization from ground-view photos, with semantic cues helping coarse guesses but failing on fine details.

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

cs.LG · 2026-02-20 · conditional · novelty 6.0 · 2 refs

MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.

citing papers explorer

Showing 2 of 2 citing papers.

Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization cs.CV · 2026-04-17 · unverdicted · none · ref 44
Vision-language models display large performance differences and clear limits in zero-shot country-level geolocalization from ground-view photos, with semantic cues helping coarse guesses but failing on fine details.
MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs? cs.LG · 2026-02-20 · conditional · none · ref 104 · 2 links
MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.

Geox-bench: Benchmarking cross-view geo-localization and pose estimation capabilities of large multimodal models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer