Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Brian Clipp; Christopher Funk; Colorado J. Reed; Kurt Keutzer; Matt Uyttendaele; Ritwik Gupta; Salvatore Candido; Sarah Brockman; Shufan Li; Trevor Darrell

arxiv: 2212.14532 · v4 · pith:MZXTSK7Pnew · submitted 2022-12-30 · 💻 cs.CV

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Colorado J. Reed , Ritwik Gupta , Shufan Li , Sarah Brockman , Christopher Funk , Brian Clipp , Kurt Keutzer , Salvatore Candido

show 2 more authors

Matt Uyttendaele Trevor Darrell

This is my paper

classification 💻 cs.CV

keywords imagescale-maescalesimagerymaskedmodelsremotesensing

0 comments

read the original abstract

Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a $2.4 - 5.6\%$ non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a $0.9$ mIoU to $1.7$ mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Does Your Wildfire Prediction Model Actually Work, or Just Score Well?
cs.LG 2026-05 unverdicted novelty 7.0

WILDFIRE-FM is the first wildfire-specific Earth foundation model, paired with a fixed-contract evaluation framework that demonstrates wildfire model transfer conclusions depend strongly on evaluation design and task ...
Does Your Wildfire Prediction Model Actually Work, or Just Score Well?
cs.LG 2026-05 unverdicted novelty 6.0

Introduces WILDFIRE-FM and a fixed-contract evaluation framework demonstrating that wildfire model transfer conclusions depend strongly on evaluation design and task formulation.
Foundation Model-Driven Semantic Change Detection in Remote Sensing Imagery
cs.CV 2026-02 unverdicted novelty 6.0

PerASCD sets new state-of-the-art Sek scores on SECOND and LandsatSCD datasets by using a modular cascaded gated decoder on PerA foundation model features plus a new consistency loss.
Beyond Backscatter: AlphaEarth Land-Cover Priors for Rapid SAR Flood Segmentation Across Foundation Backbones
cs.CV 2026-06 unverdicted novelty 5.0

AlphaEarth land-cover priors improve SAR flood segmentation IoU over SAR-only and DEM baselines across CNN and ViT backbones on held-out events like Hurricane Florence.
Characterizing AlphaEarth Embedding Geometry for Agentic Environmental Reasoning
cs.CL 2026-04 unverdicted novelty 5.0

AlphaEarth embeddings form a rotating 13-dimensional manifold where local geometry predicts retrieval quality, and an agentic system using nine geometric tools outperforms parametric reasoning on environmental queries.