VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing

· 2026 · cs.CV · arXiv 2602.07045

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Recent advancements in Multimodal Large Language Models (MLLMs) have enabled complex reasoning. However, existing remote sensing (RS) benchmarks remain heavily biased toward perception tasks, such as object recognition and scene classification. This limitation hinders the development of MLLMs for cognitively demanding RS applications. To address this, we propose a Vision Language ReaSoning Benchmark (VLRS-Bench), which is the first benchmark exclusively dedicated to complex RS reasoning. Structured across the three core dimensions of Cognition, Decision, and Prediction, VLRS-Bench comprises 2,000 question-answer pairs with an average question length of 130.19 words, spanning 14 tasks and up to eight temporal phases. VLRS-Bench is constructed via a specialized pipeline that integrates RS-specific priors and expert knowledge to ensure geospatial realism and reasoning complexity. Experimental results reveal significant bottlenecks in existing state-of-the-art MLLMs, providing critical insights for advancing multimodal reasoning within the remote sensing community. The project repository is available at https://github.com/MiliLab/VLRS-Bench.

representative citing papers

VertiCue-Bench: Diagnosing Whether MLLMs Use Height Cues to Resolve 2D Ambiguity in Remote Sensing Natural Scenes

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

VertiCue-Bench shows MLLMs can read raw CHM height cues but largely fail to integrate them into reliable semantic reasoning, often underperforming RGB-only baselines.

citing papers explorer

Showing 1 of 1 citing paper.

VertiCue-Bench: Diagnosing Whether MLLMs Use Height Cues to Resolve 2D Ambiguity in Remote Sensing Natural Scenes cs.CV · 2026-05-25 · unverdicted · none · ref 18 · internal anchor
VertiCue-Bench shows MLLMs can read raw CHM height cues but largely fail to integrate them into reliable semantic reasoning, often underperforming RGB-only baselines.

VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing

fields

years

verdicts

representative citing papers

citing papers explorer