MapReason-OSM supplies 6000 graph-verifiable instances across 12 mobility tasks on rendered OSM maps from 10 U.S. downtowns and shows that seven VLMs succeed at simple routing but perform near chance on cost-based facility placement and cross-zoom consistency.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
GeoNatureAgent Benchmark tests seven LLMs on 93 tasks via a production geospatial API, with Claude Sonnet 4 at 60.8% and DeepSeek V3.2 offering near performance at 11x lower cost while all models fail on close-value comparisons.
citing papers explorer
-
GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models
GeoNatureAgent Benchmark tests seven LLMs on 93 tasks via a production geospatial API, with Claude Sonnet 4 at 60.8% and DeepSeek V3.2 offering near performance at 11x lower cost while all models fail on close-value comparisons.