Unfine-tuned MLLMs outperform fine-tuned models on remote sensing image captioning when captions are scored by their ability to reconstruct the source image, and a training-free self-correction method achieves SOTA performance.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3representative citing papers
SHARP applies a spectrum-aware dynamic RoPE scaling schedule that promotes resolution more strongly in early denoising stages and relaxes it later, outperforming static baselines on quality metrics for remote sensing images.
Delta-LLaVA adds Change-Enhanced Attention, Change-SEG with prior embeddings, and Local Causal Attention to MLLMs to overcome temporal blindness, outperforming general models on a new unified benchmark for bi- and tri-temporal remote sensing tasks.
citing papers explorer
-
Evaluating Remote Sensing Image Captions Beyond Metric Biases
Unfine-tuned MLLMs outperform fine-tuned models on remote sensing image captioning when captions are scored by their ability to reconstruct the source image, and a training-free self-correction method achieves SOTA performance.
-
SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis
SHARP applies a spectrum-aware dynamic RoPE scaling schedule that promotes resolution more strongly in early denoising stages and relaxes it later, outperforming static baselines on quality metrics for remote sensing images.
-
Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models
Delta-LLaVA adds Change-Enhanced Attention, Change-SEG with prior embeddings, and Local Causal Attention to MLLMs to overcome temporal blindness, outperforming general models on a new unified benchmark for bi- and tri-temporal remote sensing tasks.