MapReason-OSM supplies 6000 graph-verifiable instances across 12 mobility tasks on rendered OSM maps from 10 U.S. downtowns and shows that seven VLMs succeed at simple routing but perform near chance on cost-based facility placement and cross-zoom consistency.
and Ma, Wei-Chiu and Krishna, Ranjay , title =
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2representative citing papers
Empirical study of five LVR variants finds cosine alignment negatively correlates with accuracy (r=-0.94), supervised latents are bypassed under corruption (max 4-point shift), and answers are decodable downstream but not at the latent.
citing papers explorer
-
MapReason-OSM: Can Vision-Language Models Make Graph-Verifiable Mobility Decisions from Street Maps ?
MapReason-OSM supplies 6000 graph-verifiable instances across 12 mobility tasks on rendered OSM maps from 10 U.S. downtowns and shows that seven VLMs succeed at simple routing but perform near chance on cost-based facility placement and cross-zoom consistency.
-
Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents
Empirical study of five LVR variants finds cosine alignment negatively correlates with accuracy (r=-0.94), supervised latents are bypassed under corruption (max 4-point shift), and answers are decodable downstream but not at the latent.