Geopixel: Pixel grounding large multimodal model in remote sensing

· 2025 · arXiv 2501.13925

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Evaluating Remote Sensing Image Captions Beyond Metric Biases

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

Unfine-tuned MLLMs outperform fine-tuned models on remote sensing image captioning when captions are scored by their ability to reconstruct the source image, and a training-free self-correction method achieves SOTA performance.

PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.

GeoMeld: Toward Semantically Grounded Foundation Models for Remote Sensing

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

GeoMeld provides a large-scale aligned multimodal remote sensing dataset with verified semantic captions and a joint pretraining method that improves downstream transfer and cross-sensor robustness in foundation models.

RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.

MMLANDMARKS: a Cross-View Instance-Level Benchmark for Geo-Spatial Understanding

cs.CV · 2025-12-19 · conditional · novelty 7.0

MMLandmarks supplies 197k aerial and 329k ground images plus text and GPS for 18,557 landmarks to benchmark multimodal geo-spatial understanding.

GeoSearcher: Anchor-Guided Progressive Reasoning for Remote Sensing Visual Grounding with Process Supervision

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

GeoSearcher introduces anchor-centric reasoning supervised fine-tuning and process-faithful group relative policy optimization to improve MLLM-based remote sensing visual grounding.

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation

cs.CV · 2026-05-31 · conditional · novelty 6.0

Introduces MTRS task, MTRefSeg-21K benchmark of 21K image-text-mask triplets, and MTRefSeg-R1 LVLM baseline that outperforms standard models via two-stage change-aware training.

WOW-Seg: A Word-free Open World Segmentation Model

cs.CV · 2026-05-16 · conditional · novelty 6.0

WOW-Seg proposes a word-free open-world segmentation model using Mask2Token and Cascade Attention Mask modules, reporting 89.7 semantic similarity and 82.4 semantic IoU on LVIS with one-eighth the parameters of prior SOTA plus a new 7,662-class benchmark.

B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation

cs.CV · 2026-05-22 · unverdicted · novelty 4.0 · 2 refs

B-GRTO pre-trains a segmentation tool via bootstrapped group relative optimization on GRPO rollouts, yielding substantial gains over plain GRPO on referring segmentation benchmarks.

ProtoFlow: Mitigating Forgetting in Class-Incremental Remote Sensing Segmentation via Low-Curvature Prototype Flow

cs.CV · 2026-04-03 · 2 refs

citing papers explorer

Showing 1 of 1 citing paper after filters.

MMLANDMARKS: a Cross-View Instance-Level Benchmark for Geo-Spatial Understanding cs.CV · 2025-12-19 · conditional · none · ref 71
MMLandmarks supplies 197k aerial and 329k ground images plus text and GPS for 18,557 landmarks to benchmark multimodal geo-spatial understanding.

Geopixel: Pixel grounding large multimodal model in remote sensing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer