Mixed citations

arXiv preprint arXiv:2503.11070 (2025)

Yao, K · 2025 · arXiv 2503.11070

Mixed citation behavior. Most common role is background (40%).

10 Pith papers citing it

Background 40% of classified citations

read on arXiv browse 10 citing papers

citation-role summary

background 2 baseline 2 method 1

citation-polarity summary

background 2 baseline 2 use method 1

representative citing papers

SenseBench: A Benchmark for Remote Sensing Low-Level Visual Perception and Description in Large Vision-Language Models

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

SenseBench is the first physics-based benchmark with 10K+ instances and dual protocols to evaluate VLMs on remote sensing low-level perception and diagnostic description, revealing domain bias and specific failure modes.

Evaluating Remote Sensing Image Captions Beyond Metric Biases

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

Unfine-tuned MLLMs outperform fine-tuned models on remote sensing image captioning when captions are scored by their ability to reconstruct the source image, and a training-free self-correction method achieves SOTA performance.

RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.

RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.

RSICCLLM: A Multimodal Large Language Model for Remote Sensing Image Change Captioning

cs.CV · 2026-06-26 · unverdicted · novelty 5.0

RSICCLLM introduces a post-training framework with RSICI dataset, difference-aware supervised fine-tuning, and dual-negative preference optimization that claims to outperform much larger models on remote sensing image change captioning.

DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

DiffuSAM fuses diffusion-based localization cues with SAM models to deliver over 14% higher Acc@0.5 in zero-shot object grounding for remote sensing imagery compared to prior methods.

Improving Visual Grounding in Remote Sensing via Cluster-Guided Refinement and Model Ensemble Voting

cs.CV · 2026-05-30 · unverdicted · novelty 4.0

Introduces SGR and CGR refinement pipelines plus majority-voting ensemble to improve visual grounding accuracy in remote sensing by combining RemoteSAM and SAM3.

Text-RSIR: A Text-Guided Framework for Efficient Remote Sensing Image Transmission and Reconstruction

eess.IV · 2026-05-15 · unverdicted · novelty 4.0

A text-guided framework for remote sensing image transmission uses low-res images and compact text to reduce data volume to 2%, with text-conditioned reconstruction achieving PSNRs of 16.36-27.41 dB on tested datasets.

Agentic AI for Remote Sensing: Technical Challenges and Research Directions

cs.CV · 2026-04-27 · unverdicted · novelty 4.0 · 2 refs

Position paper identifies structural challenges in applying generic agentic AI to Earth Observation and outlines design principles for EO-native agents focused on geospatial state and validity.

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

cs.RO · 2026-04-15 · unverdicted · novelty 4.0

A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.

citing papers explorer

Showing 10 of 10 citing papers.

SenseBench: A Benchmark for Remote Sensing Low-Level Visual Perception and Description in Large Vision-Language Models cs.CV · 2026-05-11 · unverdicted · none · ref 38
SenseBench is the first physics-based benchmark with 10K+ instances and dual protocols to evaluate VLMs on remote sensing low-level perception and diagnostic description, revealing domain bias and specific failure modes.
Evaluating Remote Sensing Image Captions Beyond Metric Biases cs.CV · 2026-04-22 · unverdicted · none · ref 56
Unfine-tuned MLLMs outperform fine-tuned models on remote sensing image captioning when captions are scored by their ability to reconstruct the source image, and a training-free self-correction method achieves SOTA performance.
RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs cs.CV · 2026-04-09 · unverdicted · none · ref 19
RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.
RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation cs.CV · 2026-04-19 · unverdicted · none · ref 61
RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.
RSICCLLM: A Multimodal Large Language Model for Remote Sensing Image Change Captioning cs.CV · 2026-06-26 · unverdicted · none · ref 60
RSICCLLM introduces a post-training framework with RSICI dataset, difference-aware supervised fine-tuning, and dual-negative preference optimization that claims to outperform much larger models on remote sensing image change captioning.
DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery cs.CV · 2026-04-20 · unverdicted · none · ref 7
DiffuSAM fuses diffusion-based localization cues with SAM models to deliver over 14% higher Acc@0.5 in zero-shot object grounding for remote sensing imagery compared to prior methods.
Improving Visual Grounding in Remote Sensing via Cluster-Guided Refinement and Model Ensemble Voting cs.CV · 2026-05-30 · unverdicted · none · ref 1
Introduces SGR and CGR refinement pipelines plus majority-voting ensemble to improve visual grounding accuracy in remote sensing by combining RemoteSAM and SAM3.
Text-RSIR: A Text-Guided Framework for Efficient Remote Sensing Image Transmission and Reconstruction eess.IV · 2026-05-15 · unverdicted · none · ref 52
A text-guided framework for remote sensing image transmission uses low-res images and compact text to reduce data volume to 2%, with text-conditioned reconstruction achieving PSNRs of 16.36-27.41 dB on tested datasets.
Agentic AI for Remote Sensing: Technical Challenges and Research Directions cs.CV · 2026-04-27 · unverdicted · none · ref 133 · 2 links
Position paper identifies structural challenges in applying generic agentic AI to Earth Observation and outlines design principles for EO-native agents focused on geospatial state and validity.
Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap cs.RO · 2026-04-15 · unverdicted · none · ref 145
A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.

arXiv preprint arXiv:2503.11070 (2025)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer