Earth-OneVision is a unified 2B-parameter RS-MLLM supporting six modalities and nine tasks via FGVLA, SLIS, and PCMA mechanisms plus a 34M QA-pair dataset, reporting competitive or superior benchmark results versus larger models.
To- wards faithful reasoning in remote sensing: A perceptually- grounded geospatial chain-of-thought for vision-language models
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6years
2026 6verdicts
UNVERDICTED 6roles
background 2polarities
background 2representative citing papers
RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.
GeoSearcher introduces anchor-centric reasoning supervised fine-tuning and process-faithful group relative policy optimization to improve MLLM-based remote sensing visual grounding.
RemoteZero replaces coordinate supervision with intrinsic semantic verification to enable box-free GRPO training and self-evolution for geospatial reasoning.
RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.
UniReason-Med introduces a unified framework for 2D and 3D medical VQA with shared grounded reasoning, trained on a 220K dataset, claiming that joint 2D+3D supervision improves 3D performance over 3D-only training.
citing papers explorer
-
RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs
RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.
-
RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation
RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.