SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
read the original abstract
Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to $\uparrow$ 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to $\uparrow$ 14%) and semantic segmentation. Code and data are available on the project website: https://sustainlab-group.github.io/SatMAE/
This paper has not been read by Pith yet.
Forward citations
Cited by 6 Pith papers
-
Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics
A dual-encoder deep learning method disentangles intrinsic astrophysical signals from measurement artifacts by treating sensor effects as augmentations and using counterfactual generation on overlapping observations.
-
Landsat-Sentinel-2 Algal Bloom Mapping Using Vision Transformers: Model Description, Implementation, and Examples
Vision transformers trained on a new global dataset of Landsat-Sentinel-2 patches detect floating coastal algal blooms with 8-65% omission/commission error and outperform spectral indices under cloud and glint conditions.
-
Emerging Flexible Designs for Geospatial Multimodal Foundation Models
Standardized pretraining and evaluation of geospatial multimodal foundation models on GEOBench reveals design trade-offs in flexibility, modality alignment, and task performance.
-
FLORO: A Multimodal Geospatial Foundation Model for Ecological Remote Sensing Across Sensors and Scales
FLORO achieves second-best average segmentation on six PANGAEA benchmarks and competitive results on classification and regression despite pretraining on far less data than competitors by using availability-aware mult...
-
Unlocking Multi-Spectral Data for Multi-Modal Models with Guided Inputs and Chain-of-Thought Reasoning
A prompting-based adaptation technique lets RGB-trained LMMs process multi-spectral inputs and deliver strong zero-shot gains on remote-sensing benchmarks.
-
OceanMAE: A Foundation Model for Ocean Remote Sensing
OceanMAE is an ocean-adapted masked autoencoder that adds physically meaningful auxiliary descriptors during self-supervised pre-training on Sentinel-2 data and shows improved marine segmentation performance on downst...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.