SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model

· 2025 · arXiv 2504.09644

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

Earth-OneVision: Extending Remote Sensing Multimodal Large Language Models to More Sensor Modalities and Tasks

cs.CV · 2026-06-09 · unverdicted · novelty 7.0

Earth-OneVision is a unified 2B-parameter RS-MLLM supporting six modalities and nine tasks via FGVLA, SLIS, and PCMA mechanisms plus a 34M QA-pair dataset, reporting competitive or superior benchmark results versus larger models.

DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments

cs.CV · 2026-06-04 · unverdicted · novelty 7.0

DisasterBench is a new multi-stage multimodal reasoning benchmark for UAV disaster response with 14 scenes and 9 tasks; the accompanying 2B DisasterVL model outperforms open-source MLLMs and approaches GPT-4o efficiency.

AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding

cs.CV · 2026-05-21 · accept · novelty 7.0

AgroVG is a new multi-source benchmark for agricultural visual grounding formulated as generalized set prediction, with protocols for box and mask grounding across single-target, multi-target, and target-absent queries from six object families.

PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.

RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.

UniGeoSeg: Towards Unified Open-World Segmentation for Geospatial Scenes

cs.CV · 2025-11-28 · conditional · novelty 7.0

UniGeoSeg releases the first million-scale dataset for instruction-driven remote sensing segmentation and a unified model that achieves state-of-the-art results with strong zero-shot generalization.

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation

cs.CV · 2026-05-31 · conditional · novelty 6.0

Introduces MTRS task, MTRefSeg-21K benchmark of 21K image-text-mask triplets, and MTRefSeg-R1 LVLM baseline that outperforms standard models via two-stage change-aware training.

RemoteZero: Geospatial Reasoning with Zero Human Annotations

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

RemoteZero replaces coordinate supervision with intrinsic semantic verification to enable box-free GRPO training and self-evolution for geospatial reasoning.

RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.

Agentic AI for Remote Sensing: Technical Challenges and Research Directions

cs.CV · 2026-04-27 · unverdicted · novelty 4.0 · 2 refs

Position paper identifies structural challenges in applying generic agentic AI to Earth Observation and outlines design principles for EO-native agents focused on geospatial state and validity.

citing papers explorer

Showing 7 of 7 citing papers after filters.

Earth-OneVision: Extending Remote Sensing Multimodal Large Language Models to More Sensor Modalities and Tasks cs.CV · 2026-06-09 · unverdicted · none · ref 125
Earth-OneVision is a unified 2B-parameter RS-MLLM supporting six modalities and nine tasks via FGVLA, SLIS, and PCMA mechanisms plus a 34M QA-pair dataset, reporting competitive or superior benchmark results versus larger models.
DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments cs.CV · 2026-06-04 · unverdicted · none · ref 7
DisasterBench is a new multi-stage multimodal reasoning benchmark for UAV disaster response with 14 scenes and 9 tasks; the accompanying 2B DisasterVL model outperforms open-source MLLMs and approaches GPT-4o efficiency.
PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation cs.CV · 2026-04-17 · unverdicted · none · ref 16
The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.
RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs cs.CV · 2026-04-09 · unverdicted · none · ref 27
RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.
RemoteZero: Geospatial Reasoning with Zero Human Annotations cs.CV · 2026-05-06 · unverdicted · none · ref 5
RemoteZero replaces coordinate supervision with intrinsic semantic verification to enable box-free GRPO training and self-evolution for geospatial reasoning.
RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation cs.CV · 2026-04-19 · unverdicted · none · ref 23
RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.
Agentic AI for Remote Sensing: Technical Challenges and Research Directions cs.CV · 2026-04-27 · unverdicted · none · ref 60 · 2 links
Position paper identifies structural challenges in applying generic agentic AI to Earth Observation and outlines design principles for EO-native agents focused on geospatial state and validity.

SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer