SenseBench is the first physics-based benchmark with 10K+ instances and dual protocols to evaluate VLMs on remote sensing low-level perception and diagnostic description, revealing domain bias and specific failure modes.
Earthdial: Turning multi-sensory earth observations to interactive dialogues
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
baseline 2polarities
baseline 2representative citing papers
GeoX is a self-play RL framework in which a single multimodal policy proposes and solves spatial problems as executable programs over image primitives, using verifiable rewards to improve base VLMs by up to 5.5 points without large curated data.
GeoVista introduces a planning-driven active perception framework with global exploration plans, branch-wise local inspection, and explicit evidence tracking to achieve state-of-the-art results on ultra-high-resolution remote sensing benchmarks.
SkyNative introduces an encoder-free architecture using raw patch tokens and modality-specific parameters in a unified autoregressive model to improve image-grounded reasoning in remote sensing vision-language tasks.
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.
citing papers explorer
-
SenseBench: A Benchmark for Remote Sensing Low-Level Visual Perception and Description in Large Vision-Language Models
SenseBench is the first physics-based benchmark with 10K+ instances and dual protocols to evaluate VLMs on remote sensing low-level perception and diagnostic description, revealing domain bias and specific failure modes.
-
GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards
GeoX is a self-play RL framework in which a single multimodal policy proposes and solves spatial problems as executable programs over image primitives, using verifiable rewards to improve base VLMs by up to 5.5 points without large curated data.
-
GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding
GeoVista introduces a planning-driven active perception framework with global exploration plans, branch-wise local inspection, and explicit evidence tracking to achieve state-of-the-art results on ultra-high-resolution remote sensing benchmarks.
-
SkyNative: A Native Multimodal Framework for Remote Sensing Visual Evidence Reasoning
SkyNative introduces an encoder-free architecture using raw patch tokens and modality-specific parameters in a unified autoregressive model to improve image-grounded reasoning in remote sensing vision-language tasks.
-
No One Knows the State of the Art in Geospatial Foundation Models
An audit of 152 papers reveals that geospatial foundation models lack standardized evaluations, training controls, and weight releases, so no one knows the state of the art.