Earth-OneVision is a unified 2B-parameter RS-MLLM supporting six modalities and nine tasks via FGVLA, SLIS, and PCMA mechanisms plus a 34M QA-pair dataset, reporting competitive or superior benchmark results versus larger models.
Sar-text: A large-scale sar image-text dataset built with sar-narrator and progressive transfer learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Sentinel2Cap provides human-annotated captions for multimodal Sentinel satellite images, with zero-shot tests showing RGB outperforming SAR and prompts helping performance.
citing papers explorer
-
Earth-OneVision: Extending Remote Sensing Multimodal Large Language Models to More Sensor Modalities and Tasks
Earth-OneVision is a unified 2B-parameter RS-MLLM supporting six modalities and nine tasks via FGVLA, SLIS, and PCMA mechanisms plus a 34M QA-pair dataset, reporting competitive or superior benchmark results versus larger models.
-
Sentinel2Cap: A Human-Annotated Benchmark Dataset for Multimodal Remote Sensing Image Captioning
Sentinel2Cap provides human-annotated captions for multimodal Sentinel satellite images, with zero-shot tests showing RGB outperforming SAR and prompts helping performance.