pith. sign in

arxiv: 2207.08051 · v3 · pith:5TXOJLRLnew · submitted 2022-07-17 · 💻 cs.CV · cs.AI

SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery

classification 💻 cs.CV cs.AI
keywords temporalmulti-spectralpre-trainingdataimageryperformancesatellitesatmae
0
0 comments X
read the original abstract

Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to $\uparrow$ 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to $\uparrow$ 14%) and semantic segmentation. Code and data are available on the project website: https://sustainlab-group.github.io/SatMAE/

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics

    astro-ph.IM 2026-04 unverdicted novelty 7.0

    A dual-encoder deep learning method disentangles intrinsic astrophysical signals from measurement artifacts by treating sensor effects as augmentations and using counterfactual generation on overlapping observations.

  2. Landsat-Sentinel-2 Algal Bloom Mapping Using Vision Transformers: Model Description, Implementation, and Examples

    cs.CV 2026-06 unverdicted novelty 5.0

    Vision transformers trained on a new global dataset of Landsat-Sentinel-2 patches detect floating coastal algal blooms with 8-65% omission/commission error and outperform spectral indices under cloud and glint conditions.

  3. Emerging Flexible Designs for Geospatial Multimodal Foundation Models

    cs.LG 2026-06 unverdicted novelty 5.0

    Standardized pretraining and evaluation of geospatial multimodal foundation models on GEOBench reveals design trade-offs in flexibility, modality alignment, and task performance.

  4. FLORO: A Multimodal Geospatial Foundation Model for Ecological Remote Sensing Across Sensors and Scales

    cs.CV 2026-05 unverdicted novelty 5.0

    FLORO achieves second-best average segmentation on six PANGAEA benchmarks and competitive results on classification and regression despite pretraining on far less data than competitors by using availability-aware mult...

  5. Unlocking Multi-Spectral Data for Multi-Modal Models with Guided Inputs and Chain-of-Thought Reasoning

    cs.CV 2026-04 unverdicted novelty 5.0

    A prompting-based adaptation technique lets RGB-trained LMMs process multi-spectral inputs and deliver strong zero-shot gains on remote-sensing benchmarks.

  6. OceanMAE: A Foundation Model for Ocean Remote Sensing

    cs.CV 2026-04 unverdicted novelty 5.0

    OceanMAE is an ocean-adapted masked autoencoder that adds physically meaningful auxiliary descriptors during self-supervised pre-training on Sentinel-2 data and shows improved marine segmentation performance on downst...