Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
read the original abstract
Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. Scale-MAE pretrains a network by masking an input image at a known input scale, where the area of the Earth covered by the image determines the scale of the ViT positional encoding, not the image resolution. Scale-MAE encodes the masked image with a standard ViT backbone, and then decodes the masked image through a bandpass filter to reconstruct low/high frequency images at lower/higher scales. We find that tasking the network with reconstructing both low/high frequency images leads to robust multiscale representations for remote sensing imagery. Scale-MAE achieves an average of a $2.4 - 5.6\%$ non-parametric kNN classification improvement across eight remote sensing datasets compared to current state-of-the-art and obtains a $0.9$ mIoU to $1.7$ mIoU improvement on the SpaceNet building segmentation transfer task for a range of evaluation scales.
This paper has not been read by Pith yet.
Forward citations
Cited by 5 Pith papers
-
Does Your Wildfire Prediction Model Actually Work, or Just Score Well?
WILDFIRE-FM is the first wildfire-specific Earth foundation model, paired with a fixed-contract evaluation framework that demonstrates wildfire model transfer conclusions depend strongly on evaluation design and task ...
-
Does Your Wildfire Prediction Model Actually Work, or Just Score Well?
Introduces WILDFIRE-FM and a fixed-contract evaluation framework demonstrating that wildfire model transfer conclusions depend strongly on evaluation design and task formulation.
-
Foundation Model-Driven Semantic Change Detection in Remote Sensing Imagery
PerASCD sets new state-of-the-art Sek scores on SECOND and LandsatSCD datasets by using a modular cascaded gated decoder on PerA foundation model features plus a new consistency loss.
-
Beyond Backscatter: AlphaEarth Land-Cover Priors for Rapid SAR Flood Segmentation Across Foundation Backbones
AlphaEarth land-cover priors improve SAR flood segmentation IoU over SAR-only and DEM baselines across CNN and ViT backbones on held-out events like Hurricane Florence.
-
Characterizing AlphaEarth Embedding Geometry for Agentic Environmental Reasoning
AlphaEarth embeddings form a rotating 13-dimensional manifold where local geometry predicts retrieval quality, and an agentic system using nine geometric tools outperforms parametric reasoning on environmental queries.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.