Geo2Sound generates geographically realistic soundscapes from satellite imagery via geospatial attribute modeling, semantic hypothesis expansion, and geo-acoustic alignment, achieving SOTA FAD of 1.765 on a new 20k-pair benchmark.
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing, April 2024
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
MSD-Score introduces multi-scale distributional scoring on von Mises-Fisher mixtures to evaluate image captions without references and reports state-of-the-art correlation with human judgments.
ChangeQuery is a new multimodal framework for semantic disaster change analysis that combines optical and SAR data with a custom dataset and annotation pipeline to support interactive damage assessment.
CropVLM is a domain-adapted vision-language model that achieves 72.51% zero-shot crop classification accuracy and superior open-set detection performance on novel species without retraining.
Supervised fine-tuning with 0.1% labeled data outperforms all 60 tested prompt variants for CLIPSeg cloud segmentation on satellite imagery under domain shift.
citing papers explorer
-
Geo2Sound: A Scalable Geo-Aligned Framework for Soundscape Generation from Satellite Imagery
Geo2Sound generates geographically realistic soundscapes from satellite imagery via geospatial attribute modeling, semantic hypothesis expansion, and geo-acoustic alignment, achieving SOTA FAD of 1.765 on a new 20k-pair benchmark.
-
MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation
MSD-Score introduces multi-scale distributional scoring on von Mises-Fisher mixtures to evaluate image captions without references and reports state-of-the-art correlation with human judgments.
-
ChangeQuery: Advancing Remote Sensing Change Analysis for Natural and Human-Induced Disasters from Visual Detection to Semantic Understanding
ChangeQuery is a new multimodal framework for semantic disaster change analysis that combines optical and SAR data with a custom dataset and annotation pipeline to support interactive damage assessment.
-
CropVLM: A Domain-Adapted Vision-Language Model for Open-Set Crop Analysis
CropVLM is a domain-adapted vision-language model that achieves 72.51% zero-shot crop classification accuracy and superior open-set detection performance on novel species without retraining.
-
Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift
Supervised fine-tuning with 0.1% labeled data outperforms all 60 tested prompt variants for CLIPSeg cloud segmentation on satellite imagery under domain shift.
- RoofNet: A Global Multimodal Dataset for Roof Material Identification from Earth Observation