TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

Andrew Blake; Anil Madhavapeddy; Clement Atzberger; David A. Coomes; James Ball; Jovana Knezevic; Madeline C. Lisaius; Markus Immitzer; Robin Young; Sadiq Jaffer

arxiv: 2506.20380 · v7 · submitted 2025-06-25 · 💻 cs.LG

TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

Zhengpeng Feng , Clement Atzberger , Sadiq Jaffer , Jovana Knezevic , Silja Sormunen , Robin Young , Madeline C. Lisaius , Markus Immitzer

show 6 more authors

Toby Jackson James Ball David A. Coomes Anil Madhavapeddy Andrew Blake Srinivasan Keshav

This is my paper

Pith reviewed 2026-05-19 07:30 UTC · model grok-4.3

classification 💻 cs.LG

keywords satellite time seriesearth observationself-supervised embeddingsBarlow TwinsSentinel-1Sentinel-2label efficiencyfoundation model

0 comments

The pith

TESSERA learns invariant embeddings from irregular multi-modal satellite time series to deliver state-of-the-art accuracy with high label efficiency on Earth observation tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TESSERA as a pixel-wise foundation model that processes Sentinel-1 and Sentinel-2 time series by enforcing invariance to which valid observations are chosen. It combines Barlow Twins loss with sparse random temporal sampling plus two regularizers: global shuffling to break spatial correlations and mix-based regulation to handle extreme sparsity. This produces embeddings that support diverse classification, segmentation, and regression tasks using only small task heads and little computation. The model comes with released global 10m annual int8 embeddings, open weights, and code to enable large-scale use. A sympathetic reader would care because irregular satellite data often loses phenology information in compositing, and label-efficient embeddings could make planetary-scale analysis more practical.

Core claim

TESSERA is a pixel-wise foundation model for multi-modal Sentinel-1/2 Earth-observation time series that learns robust label-efficient embeddings by applying Barlow Twins loss together with sparse random temporal sampling to enforce invariance to valid-observation selection, augmented by global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve behavior under extreme sparsity, and these embeddings achieve state-of-the-art accuracy across classification, segmentation, and regression tasks while requiring only small task heads and minimal computation.

What carries the argument

Barlow Twins loss combined with sparse random temporal sampling, global shuffling, and mix-based regulation to enforce invariance to the selection of valid observations.

If this is right

Diverse Earth observation tasks can be solved with only a small task head and minimal additional computation.
Global annual 10 m int8 embeddings become available for large-scale retrieval and inference.
Open weights and lightweight adaptation heads simplify use for planetary-scale applications.
Irregular time series from orbital patterns and clouds can be handled without losing vegetation phenology information.
The same embeddings support classification, segmentation, and regression with high label efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Consistent embeddings over time could support long-term tracking of land-cover change without repeated model retraining.
The approach might reduce data requirements for monitoring applications in data-scarce regions.
Invariance to sparsity could prove useful for fusing additional irregular data sources in future extensions.

Load-bearing premise

That enforcing invariance to valid-observation selection via Barlow Twins, global shuffling, and mix-based regulation produces embeddings that actually improve downstream performance across unseen geographic regions and task distributions.

What would settle it

A benchmark on a held-out geographic region or new task distribution where TESSERA embeddings require substantially more labels or fail to exceed the accuracy of existing methods.

read the original abstract

Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR - principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. All code and data are available at: https://github.com/ucam-eo/tessera.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TESSERA adapts Barlow Twins for irregular Sentinel time series with spatial and sparsity regularizers, then releases global 10m embeddings that could cut labeling costs for EO tasks if the numbers hold up.

read the letter

The main takeaway is that TESSERA trains pixel-wise embeddings on multi-modal Sentinel-1/2 time series by pairing Barlow Twins with global spatial shuffling and mix-based regularization to stay invariant under sparse and irregular observations. They then release the resulting annual global 10m int8 embeddings plus code and lightweight heads, which is the part that actually matters for most users right now.

Referee Report

2 major / 2 minor

Summary. The paper introduces TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) Earth-observation time series. It trains embeddings via Barlow Twins loss on sparse random temporal samples, augmented by global shuffling to decorrelate spatial neighborhoods and mix-based regulation for extreme sparsity. The central claim is that these embeddings achieve state-of-the-art accuracy and high label efficiency across classification, segmentation, and regression tasks, often with only a small task head, while the authors release global 10 m annual int8 embeddings, open weights, code, and lightweight adaptation heads.

Significance. If the performance claims survive geographic separation, the work would be significant for EO foundation modeling by demonstrating that invariance to valid-observation selection can yield label-efficient representations without compositing losses. The public release of planetary-scale embeddings and reproducible code is a clear strength that supports downstream use and verification.

major comments (2)

[§4] §4 (Experimental Setup): the manuscript does not describe continent-level or other strict geographic hold-out splits between training and test pixels. Given documented spatial autocorrelation in EO data, the reported SOTA numbers on downstream tasks could reflect leakage rather than the claimed invariance; explicit geographic separation is required to substantiate the central generalization claim.
[§4.3] §4.3 and Table 2: the quantitative baselines, error bars, and full ablation results for the Barlow Twins + shuffling + mix combination are not presented with sufficient detail to verify the label-efficiency and accuracy assertions; without these, the translation from training-time invariance to out-of-region utility remains unproven.

minor comments (2)

[§3.2] Notation in §3.2: the precise definition of the mix-based regulation term should be written as an explicit equation rather than described in prose to improve reproducibility.
[Figure 3] Figure 3 caption: clarify whether the visualized embeddings are before or after the small task head, and add scale bars for the geographic examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [§4] §4 (Experimental Setup): the manuscript does not describe continent-level or other strict geographic hold-out splits between training and test pixels. Given documented spatial autocorrelation in EO data, the reported SOTA numbers on downstream tasks could reflect leakage rather than the claimed invariance; explicit geographic separation is required to substantiate the central generalization claim.

Authors: We agree that spatial autocorrelation poses a risk of data leakage in EO tasks and that continent-level or other strict geographic hold-outs provide stronger evidence for generalization. The original experiments relied on random pixel-level splits within the evaluation datasets to focus on label efficiency under the invariance training regime. In the revised manuscript we will add explicit geographic separation experiments (e.g., training on European tiles and testing on African and Asian tiles) and report the corresponding downstream metrics. These results will be incorporated into Section 4 and the supplementary material. revision: yes
Referee: [§4.3] §4.3 and Table 2: the quantitative baselines, error bars, and full ablation results for the Barlow Twins + shuffling + mix combination are not presented with sufficient detail to verify the label-efficiency and accuracy assertions; without these, the translation from training-time invariance to out-of-region utility remains unproven.

Authors: We acknowledge that additional quantitative detail would strengthen the claims. The current Table 2 and Section 4.3 present the main baseline comparisons, but we will expand the revision to include (i) standard error bars computed over multiple random seeds, (ii) a complete ablation table isolating the contribution of global shuffling and mix-based regularization, and (iii) further baseline methods. These additions will be placed in Section 4.3 with supporting figures moved to the appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation or claims

full rationale

The paper trains TESSERA using the established Barlow Twins loss combined with sparse random temporal sampling, global shuffling, and mix-based regularization to promote invariance to valid observation selection in EO time series. Downstream evaluations on classification, segmentation, and regression tasks use independent task heads and report empirical accuracies without any equations or results reducing by construction to author-fitted parameters or self-referential definitions. No self-citation chains are invoked as load-bearing uniqueness theorems, and the central invariance claim is grounded in the training objective rather than renaming or smuggling prior author ansatzes. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard self-supervised learning assumptions rather than new free parameters or invented physical entities.

axioms (1)

domain assumption Barlow Twins loss plus the stated regularizers produces embeddings invariant to the selection of valid temporal observations
Invoked in the training description to justify robustness under irregular sampling.

pith-pipeline@v0.9.0 · 5812 in / 1274 out tokens · 39078 ms · 2026-05-19T07:30:20.392751+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations... global shuffling to decorrelate spatial neighborhoods and mix-based regulation
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

dual-branch Transformer encoders... fused embedding... 128-dimensional pixel embedding

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Better Together: Evaluating the Complementarity of Earth Embedding Models
cs.CV 2026-05 unverdicted novelty 7.0

Fusing embeddings from four Earth models (AlphaEarth, Tessera, GeoCLIP, SatCLIP) outperforms the best single model on four of six tasks, with gains depending on task and location.
FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes
cs.LG 2026-05 unverdicted novelty 6.0

FLUXtrapolation is a benchmark for domain generalization in ecosystem flux upscaling using temporal, spatial, and temperature-based extrapolation scenarios, with pilot results showing model separation on tail and mult...
Agentic AI for Remote Sensing: Technical Challenges and Research Directions
cs.CV 2026-04 unverdicted novelty 6.0

Agentic AI faces structural challenges in remote sensing due to geospatial data properties and workflow constraints, requiring EO-native agents built around structured state, tool-aware reasoning, and validity-aware e...
Agentic AI for Remote Sensing: Technical Challenges and Research Directions
cs.CV 2026-04 unverdicted novelty 5.0

Agentic AI for remote sensing requires new designs centered on structured geospatial state, tool-aware reasoning, verifier-guided execution, and physical validity rather than generic extensions.
Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping
cs.CV 2026-04 unverdicted novelty 5.0

SSDM decouples global geospatial embeddings into structural modulation and semantic injection pathways to improve accuracy and consistency in high-resolution remote sensing land cover mapping.
Location Is All You Need: Continuous Spatiotemporal Neural Representations of Earth Observation Data
cs.CV 2026-04 unverdicted novelty 5.0

LIANet encodes multi-temporal Earth observation data into a coordinate-based neural field that supports label-only fine-tuning for downstream tasks without access to raw imagery.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages · cited by 5 Pith papers

[1]

Lightweight temporal self-attention for classifying satellite images time series

S AINTE FARE GARNOT , V., AND LANDRIEU , L. Lightweight temporal self-attention for classifying satellite images time series. In Lecture Notes in Computer Science (12 2020), pp. 171–181

work page 2020
[2]

Esa biomass climate change initiative (biomasscci): Global datasets of forest above-ground biomass for the years 2010, 2015, 2016, 2017, 2018, 2019, 2020 and 2021, v5.01, 2024

S ANTORO , M., AND CARTUS , O. Esa biomass climate change initiative (biomasscci): Global datasets of forest above-ground biomass for the years 2010, 2015, 2016, 2017, 2018, 2019, 2020 and 2021, v5.01, 2024

work page 2010
[3]

Self-supervised vision transformers for land-cover segmentation and classification

S CHEIBENREIF , L., H ANNA , J., M OMMERT , M., AND BORTH , D. Self-supervised vision transformers for land-cover segmentation and classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2022), pp. 1422–1431

work page 2022
[4]

Prithvi wxc: Foundation model for weather and climate.arXiv preprint arXiv:2409.13598, 2024

S CHMUDE , J., ROY, S., T ROJAK , W., JAKUBIK , J., C IVITARESE , D. S., S INGH , S., K UEHN - ERT, J., A NKUR , K., G UPTA, A., P HILLIPS , C. E., K IENZLER , R., S ZWARCMAN , D., GAUR, V., S HINDE , R., L AL, R., S ILVA, A. D., D IAZ , J. L. G., J ONES , A., P FREUND - SCHUH , S., L IN, A., S HESHADRI , A., N AIR , U., A NANTHARAJ , V., H AMANN , H., W ...

work page arXiv 2024
[5]

J., B OYD, D

S HENKIN , A., C HANDLER , C. J., B OYD, D. S., J ACKSON , T., D ISNEY , M., M AJALAP , N., NILUS , R., F OODY, G., BIN JAMI , J., R EYNOLDS , G., W ILKES , P., CUTLER , M. E. J., VAN DER HEIJDEN , G. M. F., B URSLEM , D. F. R. P., C OOMES , D. A., B ENTLEY , L. P., AND MALHI , Y. The World’s Tallest Tropical Tree in Three Dimensions.Front. For. Glob. Cha...

work page 2019
[6]

K., C OOPS , N

S KIDMORE , A. K., C OOPS , N. C., N EINAVAZ, E., A LI, A., S CHAEPMAN , M. E., P A- GANINI , M., K ISSLING , W. D., V IHERVAARA , P., D ARVISHZADEH , R., F EILHAUER , H., FERNANDEZ , M., F ERN ´ANDEZ , N., G ORELICK , N., G EIJZENDORFFER , I., H EIDEN , U., HEURICH , M., H OBERN , D., H OLZWARTH , S., M ULLER -KARGER , F. E., V AN DE KER- CHOVE , R., L A...

work page 2021
[7]

J., F LEMING , L., AND GEACH , J

S MITH , M. J., F LEMING , L., AND GEACH , J. E. Earthpt: a time series foundation model for earth observation, 2024, 2309.07207

work page arXiv 2024
[8]

S., C ARABALLO -V EGA , J

S PRADLIN , C. S., C ARABALLO -V EGA , J. A., L I, J., C ARROLL , M. L., G ONG , J., AND MONTESANO , P. M. Satvision-toa: A geospatial foundation model for coarse-resolution all- sky remote sensing imagery, 2024, 2411.17000

work page arXiv 2024
[9]

Self-supervised learning of remote sensing scene represen- tations using contrastive multiview coding

S TOJNIC , V., AND RISOJEVIC , V. Self-supervised learning of remote sensing scene represen- tations using contrastive multiview coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops(June 2021), pp. 1182–1191. 24

work page 2021
[10]

Ringmo: A remote sens- ing foundation model with masked image modeling

S UN, X., W ANG , P., L U, W., Z HU, Z., L U, X., H E, Q., L I, J., R ONG , X., Y ANG , Z., CHANG , H., H E, Q., Y ANG , G., W ANG , R., L U, J., AND FU, K. Ringmo: A remote sens- ing foundation model with masked image modeling. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–22

work page 2023
[11]

Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications.arXiv preprint arXiv:2412.02732, 2024

S ZWARCMAN , D., R OY, S., F RACCARO , P., G ISLASON , P. E., B LUMENSTIEL , B., GHOSAL , R., DE OLIVEIRA , P. H., DE SOUSA ALMEIDA , J. L., S EDONA , R., K ANG , Y., C HAKRABORTY , S., W ANG , S., G OMES , C., K UMAR , A., T RUONG , M., G ODWIN , D., L EE, H., H SU, C.-Y., A SANJAN , A. A., M UJECI , B., S HIDHAM , D., K EENAN , T., AREVALO , P., L I, W....

work page arXiv 2025
[12]

Towards privacy-preserved pre-training of re- mote sensing foundation models with federated mutual-guidance learning, 2025, 2503.11051

T AN, J., Z HANG , C., D ANG , B., AND LI, Y. Towards privacy-preserved pre-training of re- mote sensing foundation models with federated mutual-guidance learning, 2025, 2503.11051

work page arXiv 2025
[13]

A., B ELENGUER -PLOMER , M

T ANASE , M. A., B ELENGUER -PLOMER , M. A., R OTETA , E., B ASTARRIKA , A., WHEELER , J., F ERN ´ANDEZ -C ARRILLO , ´A., T ANSEY , K., W IEDEMANN , W., N AVRATIL , P., L OHBERGER , S., S IEGERT , F., AND CHUVIECO , E. Burned Area Detection and Map- ping: Intercomparison of Sentinel-1 and Sentinel-2 Based Algorithms over Tropical Africa. Remote Sensing 12...

work page 2020
[14]

Cross-scale mae: a tale of multi-scale exploitation in remote sensing

T ANG , M., C OZMA , A., G EORGIOU , K., AND QI, H. Cross-scale mae: a tale of multi-scale exploitation in remote sensing. In Proceedings of the 37th International Conference on Neural Information Processing Systems (Red Hook, NY , USA, 2023), NIPS ’23, Curran Associates Inc

work page 2023
[15]

Tov: The original vision model for optical remote sensing image understanding via self-supervised learning

T AO, C., Q I, J., Z HANG , G., Z HU, Q., L U, W., AND LI, H. Tov: The original vision model for optical remote sensing image understanding via self-supervised learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16 (2023), 4916–4930

work page 2023
[16]

Terraclass: Mapeamento de uso e ocupac ¸˜ao da terra

T ERRA CLASS . Terraclass: Mapeamento de uso e ocupac ¸˜ao da terra. Accessed: July 24, 2025

work page 2025
[17]

Swimdiff: Scene-wide matching con- trastive learning with diffusion constraint for remote sensing image, 2024, 2401.05093

T IAN , J., L EI, J., Z HANG , J., X IE, W., AND LI, Y. Swimdiff: Scene-wide matching con- trastive learning with diffusion constraint for remote sensing image, 2024, 2401.05093

work page arXiv 2024
[18]

V., B RANDT , J., S PORE , J., M AJUMDAR , S., H AZIZA , D., V AMARAJU , J., M OUTAKANNI , T., B O- JANOWSKI , P., J OHNS , T., W HITE , B., T IECKE , T., AND COUPRIE , C

T OLAN , J., Y ANG , H.-I., N OSARZEWSKI , B., C OUAIRON , G., V O, H. V., B RANDT , J., S PORE , J., M AJUMDAR , S., H AZIZA , D., V AMARAJU , J., M OUTAKANNI , T., B O- JANOWSKI , P., J OHNS , T., W HITE , B., T IECKE , T., AND COUPRIE , C. Very high reso- lution canopy height maps from RGB imagery using self-supervised vision transformer and convolutio...

work page 2024
[19]

Lightweight, pre-trained transformers for remote sensing timeseries,

T SENG , G., C ARTUYVELS , R., Z VONKOV, I., P UROHIT , M., R OLNICK , D., AND KERNER , H. Lightweight, Pre-trained Transformers for Remote Sensing Timeseries, Feb. 2024. arXiv:2304.14065 [cs]

work page arXiv 2024
[20]

R., S HELHAMER , E., K ERNER , H., AND ROLNICK , D

T SENG , G., F ULLER , A., R EIL , M., H ERZOG , H., B EUKEMA , P., B ASTANI , F., G REEN , J. R., S HELHAMER , E., K ERNER , H., AND ROLNICK , D. Galileo: Learning global & local features of many remote sensing modalities, 2025, 2502.09356

work page arXiv 2025
[21]

Pooch: A friend to fetch your data files.Journal of Open Source Software 5, 45 (Jan

U IEDA , L., S OLER , S., R AMPIN , R., VAN KEMENADE , H., T URK , M., S HAPERO , D., B AN- IHIRWE , A., AND LEEMAN , J. Pooch: A friend to fetch your data files.Journal of Open Source Software 5, 45 (Jan. 2020), 1943

work page 2020
[22]

Ucam-eo project

U NIVERSITY OF CAMBRIDGE CENTRE FOR EARTH OBSERVATION . Ucam-eo project. https://github.com/ucam-eo. Accessed: 2025-07-22

work page 2025
[23]

N., KAISER , L., AND POLOSUKHIN , I

V ASWANI , A., S HAZEER , N., P ARMAR , N., U SZKOREIT , J., J ONES , L., G OMEZ , A. N., KAISER , L., AND POLOSUKHIN , I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Red Hook, NY , USA, 2017), NIPS’17, Curran Associates Inc., p. 6000–6010

work page 2017
[24]

V ERHEGGHEN , A., E VA, H., C ECCHERINI , G., A CHARD , F., G OND , V., G OURLET - FLEURY , S., AND CERUTTI , P. O. The Potential of Sentinel Satellites for Burnt Area Mapping and Monitoring in the Congo Basin Forests. Remote Sensing 8, 12 (Dec. 2016), 986. 25

work page 2016
[25]

Vm0047 afforestation, reforestation, and reveg- etation, v1.1

V ERRA . Vm0047 afforestation, reforestation, and reveg- etation, v1.1. https://verra.org/methodologies/ vm0047-afforestation-reforestation-and-revegetation-v1-1/ , 2025. Verra Verified Carbon Standard (VCS) Program methodology

work page 2025
[26]

H., D ALAGNOL , R., C ARTER , G., H IRYE , M

W AGNER , F. H., D ALAGNOL , R., C ARTER , G., H IRYE , M. C. M., G ILL , S., T AKOUGOUM , L. B. S., F AVRICHON , S., K ELLER , M., O METTO , J. P. H. B., A LVES, L., C REZE , C., GEORGE -C HACON , S. P., L I, S., L IU, Z., M ULLISSA , A., Y ANG , Y., S ANTOS , E. G., WORDEN , S. R., B RANDT , M., C IAIS , P., H AGEN , S. C., AND SAATCHI , S. High reso- l...

work page arXiv 2025
[27]

J., X IONG , Z., Z HU, X

W ALDMANN , L., S HAH , A., WANG , Y., LEHMANN , N., S TEWART, A. J., X IONG , Z., Z HU, X. X., B AUER , S., AND CHUANG , J. Panopticon: Advancing any-sensor foundation models for earth observation, 2025, 2503.10845

work page arXiv 2025
[28]

A., W OODCOCK , C

W ANG , C., S ONG , C., S CHROEDER , T. A., W OODCOCK , C. E., P AVELSKY, T. M., H AN, Q., AND YAO, F. Interpretable Multi-Sensor Fusion of Optical and SAR Data for GEDI-Based Canopy Height Mapping in Southeastern North Carolina. Remote Sensing 17, 9 (Jan. 2025), 1536

work page 2025
[29]

Hypersigma: Hyperspectral intelligence comprehension foundation model, 2025, 2406.11519

W ANG , D., H U, M., J IN, Y., M IAO, Y., Y ANG , J., X U, Y., Q IN, X., M A, J., S UN, L., LI, C., F U, C., C HEN , H., H AN, C., Y OKOYA, N., Z HANG , J., X U, M., L IU, L., Z HANG , L., W U, C., D U, B., T AO, D., AND ZHANG , L. Hypersigma: Hyperspectral intelligence comprehension foundation model, 2025, 2406.11519

work page arXiv 2025
[30]

An empirical study of remote sensing pretraining

W ANG , D., Z HANG , J., D U, B., X IA, G.-S., AND TAO, D. An empirical study of remote sensing pretraining. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–20

work page 2023
[31]

Samrs: Scaling- up remote sensing segmentation dataset with segment anything model

W ANG , D., Z HANG , J., D U, B., X U, M., L IU, L., TAO, D., AND ZHANG , L. Samrs: Scaling- up remote sensing segmentation dataset with segment anything model. In Advances in Neural Information Processing Systems (2023), vol. 36, pp. 8815–8827

work page 2023
[32]

Mtp: Advancing remote sensing foundation model via multi-task pretraining, 2024, 2403.13430

W ANG , D., Z HANG , J., X U, M., L IU, L., WANG , D., G AO, E., H AN, C., G UO, H., D U, B., TAO, D., AND ZHANG , L. Mtp: Advancing remote sensing foundation model via multi-task pretraining, 2024, 2403.13430

work page arXiv 2024
[33]

Advanc- ing plain vision transformer toward remote sensing foundation model

W ANG , D., Z HANG , Q., X U, Y., Z HANG , J., D U, B., T AO, D., AND ZHANG , L. Advanc- ing plain vision transformer toward remote sensing foundation model. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–15

work page 2023
[34]

Harnessing massive satellite imagery with efficient masked image modeling, 2025, 2406.11933

W ANG , F., W ANG , H., W ANG , D., G UO, Z., Z HONG , Z., L AN, L., Y ANG , W., AND ZHANG , J. Harnessing massive satellite imagery with efficient masked image modeling, 2025, 2406.11933

work page arXiv 2025
[35]

Roma: Scaling up mamba-based foundation models for remote sensing, 2025, 2503.10392

W ANG , F., WANG , H., W ANG , Y., WANG , D., C HEN , M., Z HAO, H., S UN, Y., WANG , S., LAN, L., YANG , W., AND ZHANG , J. Roma: Scaling up mamba-based foundation models for remote sensing, 2025, 2503.10392

work page arXiv 2025
[36]

M., B RAHAM , N

W ANG , Y., A LBRECHT , C. M., B RAHAM , N. A. A., L IU, C., X IONG , Z., AND ZHU, X. X. Decoupling common and unique representations for multimodal self-supervised learn- ing, 2024, 2309.05300

work page arXiv 2024
[37]

M., AND ZHU, X

W ANG , Y., A LBRECHT , C. M., AND ZHU, X. X. Self-supervised vision transformers for joint sar-optical representation learning, 2022, 2204.05381

work page arXiv 2022
[38]

M., AND ZHU, X

W ANG , Y., A LBRECHT , C. M., AND ZHU, X. X. Multilabel-guided soft contrastive learning for efficient earth observation pretraining. IEEE Transactions on Geoscience and Remote Sensing 62 (2024), 1–16

work page 2024
[39]

W ANG , Y., B RAHAM , N. A. A., X IONG , Z., L IU, C., A LBRECHT , C. M., AND ZHU, X. X. Ssl4eo-s12: A large-scale multi-modal, multi-temporal dataset for self-supervised learning in earth observation. ArXiv abs/2211.07044 (2022)

work page arXiv 2022
[40]

H., A LBRECHT , C

W ANG , Y., H ERN ´ANDEZ , H. H., A LBRECHT , C. M., AND ZHU, X. X. Feature guided masked autoencoder for self-supervised learning in remote sensing, 2023, 2310.18653

work page arXiv 2023
[41]

J., D UJARDIN , T., B OUNTOS , N

W ANG , Y., X IONG , Z., L IU, C., S TEWART, A. J., D UJARDIN , T., B OUNTOS , N. I., Z A- VRAS , A., G ERKEN , F., P APOUTSIS , I., L EAL -TAIX ´E, L., AND ZHU, X. X. Towards a unified copernicus foundation model for earth vision, 2025, 2503.11849. 26

work page arXiv 2025
[42]

Ringmo-lite: A remote sensing multi-task lightweight network with cnn-transformer hybrid framework, 2023, 2309.09003

W ANG , Y., Z HANG , T., Z HAO, L., H U, L., W ANG , Z., N IU, Z., C HENG , P., C HEN , K., ZENG , X., W ANG , Z., W ANG , H., AND SUN, X. Ringmo-lite: A remote sensing multi-task lightweight network with cnn-transformer hybrid framework, 2023, 2309.09003

work page arXiv 2023
[43]

Rs-dfm: A remote sensing distributed foundation model for diverse downstream tasks, 2024, 2406.07032

W ANG , Z., C HENG , P., T IAN , P., W ANG , Y., C HEN , M., D UAN, S., W ANG , Z., L I, X., AND SUN, X. Rs-dfm: A remote sensing distributed foundation model for diverse downstream tasks, 2024, 2406.07032

work page arXiv 2024
[44]

Dino-mc: Self-supervised contrastive learn- ing for remote sensing imagery with multi-sized local crops

W ANYAN , X., S ENEVIRATNE , S., S HEN , S., AND KIRLEY , M. Extending global-local view alignment for self-supervised learning with remote sensing imagery, 2024, 2303.06670

work page arXiv 2024
[45]

Remote sensing for agricultural applications: A meta-review

W EISS , M., J ACOB , F., AND DUVEILLER , G. Remote sensing for agricultural applications: A meta-review. Remote Sensing of Environment 236 (Jan. 2020), 111402

work page 2020
[46]

S., B RANNOCK , J., D ISSEN , J., K EOWN , P., S ZURA , K., B ROWN , O

W ILLETT , D. S., B RANNOCK , J., D ISSEN , J., K EOWN , P., S ZURA , K., B ROWN , O. B., AND SIMONSON , A. Noaa open data dissemination: Petabyte-scale earth system data in the cloud. Science Advances 9, 38 (2023), eadh0032

work page 2023
[47]

Cat- sam: Conditional tuning for few-shot adaptation of segment anything model

X IAO, A., X UAN, W., Q I, H., X ING , Y., REN, R., Z HANG , X., S HAO, L., AND LU, S. Cat- sam: Conditional tuning for few-shot adaptation of segment anything model. arXiv preprint arXiv:2402.03631 (2024)

work page arXiv 2024
[48]

Founda- tion models for remote sensing and earth observation: A survey, 2025, 2410.16602

X IAO, A., X UAN, W., WANG , J., H UANG , J., T AO, D., L U, S., AND YOKOYA, N. Founda- tion models for remote sensing and earth observation: A survey, 2025, 2410.16602

work page arXiv 2025
[49]

Unified perceptual parsing for scene understanding

X IAO, T., L IU, Y., Z HOU , B., J IANG , Y., AND SUN, J. Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision (ECCV)(2018), pp. 418–434

work page 2018
[50]

Xiong, Y

X IONG , Z., W ANG , Y., Z HANG , F., S TEWART, A. J., H ANNA , J., B ORTH , D., P APOUTSIS , I., S AUX, B. L., C AMPS -VALLS , G., AND ZHU, X. X. Neural plasticity-inspired multimodal foundation model for earth observation, 2024, 2403.15356

work page arXiv 2024
[51]

X IONG , Z., W ANG , Y., ZHANG , F., AND ZHU, X. X. One for all: Toward unified foundation models for earth vision, 2024, 2401.07527

work page arXiv 2024
[52]

Analytical insight of earth: A cloud-platform of intelligent computing for geospatial big data, 2023, 2312.16385

X U, H., M AN, Y., Y ANG , M., W U, J., Z HANG , Q., AND WANG , J. Analytical insight of earth: A cloud-platform of intelligent computing for geospatial big data, 2023, 2312.16385

work page arXiv 2023
[53]

D., H AMMOND , W

Y AN, Y., H ONG , S., C HEN , A., P E ˜NUELAS , J., A LLEN , C. D., H AMMOND , W. M., M UN- SON , S. M., M YNENI , R. B., AND PIAO, S. Satellite-based evidence of recent decline in global forest recovery rate from tree mortality events. Nature Plants (2025), 1–12

work page 2025
[54]

Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images

Y AN, Z., L I, J., L I, X., Z HOU , R., Z HANG , W., F ENG , Y., D IAO, W., F U, K., AND SUN, X. Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–16

work page 2023
[55]

Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling

Y AO, F., L U, W., YANG , H., X U, L., L IU, C., H U, L., Y U, H., L IU, N., D ENG , C., T ANG , D., C HEN , C., Y U, J., S UN, X., AND FU, K. Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling. IEEE Trans- actions on Geoscience and Remote Sensing 61 (2023), 1–21

work page 2023
[56]

A global dataset of forest regrowth following wildfires

Z ANG , J., Q IU, F., AND ZHANG , Y. A global dataset of forest regrowth following wildfires. Sci Data 11, 1 (Sept. 2024), 1052

work page 2024
[57]

& Deny, S

Z BONTAR , J., J ING , L., M ISRA , I., L ECUN, Y., AND DENY, S. Barlow twins: Self- supervised learning via redundancy reduction. ArXiv abs/2103.03230 (2021)

work page arXiv 2021
[58]

A 2-mae: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder, 2024, 2406.08079

Z HANG , L., Z HAO, Y., D ONG , R., Z HANG , J., Y UAN, S., C AO, S., C HEN , M., Z HENG , J., LI, W., L IU, W., Z HANG , W., F ENG , L., AND FU, H. A 2-mae: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder, 2024, 2406.08079

work page arXiv 2024
[59]

Ctxmim: Context-enhanced masked image modeling for remote sensing image understanding, 2024, 2310.00022

Z HANG , M., L IU, Q., AND WANG , Y. Ctxmim: Context-enhanced masked image modeling for remote sensing image understanding, 2024, 2310.00022

work page arXiv 2024
[60]

Consecutive pre-training: A knowledge transfer learning strategy with relevant unlabeled data for remote sensing domain

Z HANG , T., G AO, P., D ONG , H., Z HUANG , Y., W ANG , G., Z HANG , W., AND CHEN , H. Consecutive pre-training: A knowledge transfer learning strategy with relevant unlabeled data for remote sensing domain. Remote Sensing 14, 22 (2022). 27

work page 2022
[61]

Uv-sam: adapting segment anything model for urban village identification

Z HANG , X., L IU, Y., L IN, Y., L IAO, Q., AND LI, Y. Uv-sam: adapting segment anything model for urban village identification. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances in Artificial In...

work page 2024
[62]

Review of remote sensing-based methods for forest aboveground biomass estimation: Progress, challenges, and prospects

Z HAO, Z., D ONG , L., W U, S., X IAO, X., ET AL . Review of remote sensing-based methods for forest aboveground biomass estimation: Progress, challenges, and prospects. Forests 14, 6 (2023), 1086

work page 2023
[63]

Changen2: Multi-temporal remote sensing generative change foundation model

Z HENG , Z., E RMON , S., K IM, D., Z HANG , L., AND ZHONG , Y. Changen2: Multi-temporal remote sensing generative change foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence 47, 2 (2025), 725–741

work page 2025
[64]

Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications

Z HU, Z. Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications. ISPRS Journal of Photogrammetry and Remote Sensing 130 (2017), 370–384. Acknowledgments We gratefully acknowledge help from AMD Inc., Tarides, Jane Street, the Dawn supercomputing team at Cambridge, the Aalto University Science-IT pro...

work page 2017
[65]

at a given point over time. Note that d-pixels can be sparse and are accompanied by a mask vector mi,j of size T that indicates the timesteps for which there are valid data, with a value 1 indicating that the corresponding row in Pi,j is valid. S1 The d-pixel Representation We represent each 1- m pixel in the time series of images from an- nual multispect...

work page
[66]

For each view, independent sampling of a fixed number of valid observation dates from the annual Sentinel-2 time series (10 spectral bands)

work page
[67]

For each view, independent sampling of a fixed number of valid observation dates from the annual Sentinel-1 time series (2 polarizations). These views represent different, valid, but inherently incomplete glimpses of the pixel’s true temporal-spectral evolution, akin to observing the same location through intermittent cloud cover or from different satelli...

work page 2017
[68]

Unlike pre-training, no spatial downsampling is performed at this stage

The full Sentinel-1 and Sentinel-2 time series data at 1- meter resolution are acquired and pre-processed to form d-pixels. Unlike pre-training, no spatial downsampling is performed at this stage

work page
[69]

A fixed number of 40 timesteps is sampled from the valid observations within the year for both Sentinel-1 and Sentinel-2 data, along with their DOY positional encodings

work page
[70]

These sampled time series are fed into their respective frozen TESSERA encoders

work page
[71]

This process is repeated for all land pixels globally to create an annual embeddings map of shape (H, W, 128), where H and W are the dimensions of the global 1- meter grid

The outputs from the S1 and S2 encoders are fused by the MLP, producing a 128- dimensional embedding vector for that pixel for that year. This process is repeated for all land pixels globally to create an annual embeddings map of shape (H, W, 128), where H and W are the dimensions of the global 1- meter grid. Scaling with data and network size To identify...

work page
[72]

Download Embeddings for the Region of Interest : The G EOTESSERA Python li- brary ( 154) allows users to download embeddings for a desired region and year in the form of a numpy array

work page
[73]

Prepare labelled Downstream Data: The labelled dataset for the target task (e.g., pixel- level crop-type labels, canopy height measurements, or land use change polygons) is pre- pared

work page
[74]

This head takes the extracted TESSERA embeddings as input

Design Task-Specific Head : A lightweight, task-specific neural network module (the ”head”) is designed. This head takes the extracted TESSERA embeddings as input. • For pixel-wise classification (e.g., crop classification), the head is typically a shallow MLP (1-3 layers) ending in a softmax output layer. • For pixel-wise regression (e.g., canopy height ...

work page
[75]

Train Downstream Head: Only the parameters of this newly defined task head are trained using the extracted TESSERA embeddings as input features and the corresponding labels. Standard supervised learning techniques, optimizers (e.g., Adam), and task-appropriate loss functions (e.g., Cross-Entropy for classification, Mean Squared Error for regression) are u...

work page
[76]

unburned

Evaluation: Once the head is trained, inference is performed on a test set by extracting TESSERA embeddings for the test samples and passing them through the trained head. Performance is evaluated using standard metrics relevant to the task. This workflow allows the use of TESSERA embeddings in a range of diverse applications, demon- strating its role as ...

work page 2022
[77]

TESSERA: difference in relative heights of GEDI estimates of RH90 and RH10 converted to AGB using 15.502 × + 160.5

work page
[78]

ETH canopy height converted using 3.4 ×

work page
[79]

CTrees canopy height converted using 1.806 × + 44.9

work page
[80]

ESA AGB converted using 0.407 × + 34.0

work page

Showing first 80 references.

[1] [1]

Lightweight temporal self-attention for classifying satellite images time series

S AINTE FARE GARNOT , V., AND LANDRIEU , L. Lightweight temporal self-attention for classifying satellite images time series. In Lecture Notes in Computer Science (12 2020), pp. 171–181

work page 2020

[2] [2]

Esa biomass climate change initiative (biomasscci): Global datasets of forest above-ground biomass for the years 2010, 2015, 2016, 2017, 2018, 2019, 2020 and 2021, v5.01, 2024

S ANTORO , M., AND CARTUS , O. Esa biomass climate change initiative (biomasscci): Global datasets of forest above-ground biomass for the years 2010, 2015, 2016, 2017, 2018, 2019, 2020 and 2021, v5.01, 2024

work page 2010

[3] [3]

Self-supervised vision transformers for land-cover segmentation and classification

S CHEIBENREIF , L., H ANNA , J., M OMMERT , M., AND BORTH , D. Self-supervised vision transformers for land-cover segmentation and classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2022), pp. 1422–1431

work page 2022

[4] [4]

Prithvi wxc: Foundation model for weather and climate.arXiv preprint arXiv:2409.13598, 2024

S CHMUDE , J., ROY, S., T ROJAK , W., JAKUBIK , J., C IVITARESE , D. S., S INGH , S., K UEHN - ERT, J., A NKUR , K., G UPTA, A., P HILLIPS , C. E., K IENZLER , R., S ZWARCMAN , D., GAUR, V., S HINDE , R., L AL, R., S ILVA, A. D., D IAZ , J. L. G., J ONES , A., P FREUND - SCHUH , S., L IN, A., S HESHADRI , A., N AIR , U., A NANTHARAJ , V., H AMANN , H., W ...

work page arXiv 2024

[5] [5]

J., B OYD, D

S HENKIN , A., C HANDLER , C. J., B OYD, D. S., J ACKSON , T., D ISNEY , M., M AJALAP , N., NILUS , R., F OODY, G., BIN JAMI , J., R EYNOLDS , G., W ILKES , P., CUTLER , M. E. J., VAN DER HEIJDEN , G. M. F., B URSLEM , D. F. R. P., C OOMES , D. A., B ENTLEY , L. P., AND MALHI , Y. The World’s Tallest Tropical Tree in Three Dimensions.Front. For. Glob. Cha...

work page 2019

[6] [6]

K., C OOPS , N

S KIDMORE , A. K., C OOPS , N. C., N EINAVAZ, E., A LI, A., S CHAEPMAN , M. E., P A- GANINI , M., K ISSLING , W. D., V IHERVAARA , P., D ARVISHZADEH , R., F EILHAUER , H., FERNANDEZ , M., F ERN ´ANDEZ , N., G ORELICK , N., G EIJZENDORFFER , I., H EIDEN , U., HEURICH , M., H OBERN , D., H OLZWARTH , S., M ULLER -KARGER , F. E., V AN DE KER- CHOVE , R., L A...

work page 2021

[7] [7]

J., F LEMING , L., AND GEACH , J

S MITH , M. J., F LEMING , L., AND GEACH , J. E. Earthpt: a time series foundation model for earth observation, 2024, 2309.07207

work page arXiv 2024

[8] [8]

S., C ARABALLO -V EGA , J

S PRADLIN , C. S., C ARABALLO -V EGA , J. A., L I, J., C ARROLL , M. L., G ONG , J., AND MONTESANO , P. M. Satvision-toa: A geospatial foundation model for coarse-resolution all- sky remote sensing imagery, 2024, 2411.17000

work page arXiv 2024

[9] [9]

Self-supervised learning of remote sensing scene represen- tations using contrastive multiview coding

S TOJNIC , V., AND RISOJEVIC , V. Self-supervised learning of remote sensing scene represen- tations using contrastive multiview coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops(June 2021), pp. 1182–1191. 24

work page 2021

[10] [10]

Ringmo: A remote sens- ing foundation model with masked image modeling

S UN, X., W ANG , P., L U, W., Z HU, Z., L U, X., H E, Q., L I, J., R ONG , X., Y ANG , Z., CHANG , H., H E, Q., Y ANG , G., W ANG , R., L U, J., AND FU, K. Ringmo: A remote sens- ing foundation model with masked image modeling. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–22

work page 2023

[11] [11]

Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications.arXiv preprint arXiv:2412.02732, 2024

S ZWARCMAN , D., R OY, S., F RACCARO , P., G ISLASON , P. E., B LUMENSTIEL , B., GHOSAL , R., DE OLIVEIRA , P. H., DE SOUSA ALMEIDA , J. L., S EDONA , R., K ANG , Y., C HAKRABORTY , S., W ANG , S., G OMES , C., K UMAR , A., T RUONG , M., G ODWIN , D., L EE, H., H SU, C.-Y., A SANJAN , A. A., M UJECI , B., S HIDHAM , D., K EENAN , T., AREVALO , P., L I, W....

work page arXiv 2025

[12] [12]

Towards privacy-preserved pre-training of re- mote sensing foundation models with federated mutual-guidance learning, 2025, 2503.11051

T AN, J., Z HANG , C., D ANG , B., AND LI, Y. Towards privacy-preserved pre-training of re- mote sensing foundation models with federated mutual-guidance learning, 2025, 2503.11051

work page arXiv 2025

[13] [13]

A., B ELENGUER -PLOMER , M

T ANASE , M. A., B ELENGUER -PLOMER , M. A., R OTETA , E., B ASTARRIKA , A., WHEELER , J., F ERN ´ANDEZ -C ARRILLO , ´A., T ANSEY , K., W IEDEMANN , W., N AVRATIL , P., L OHBERGER , S., S IEGERT , F., AND CHUVIECO , E. Burned Area Detection and Map- ping: Intercomparison of Sentinel-1 and Sentinel-2 Based Algorithms over Tropical Africa. Remote Sensing 12...

work page 2020

[14] [14]

Cross-scale mae: a tale of multi-scale exploitation in remote sensing

T ANG , M., C OZMA , A., G EORGIOU , K., AND QI, H. Cross-scale mae: a tale of multi-scale exploitation in remote sensing. In Proceedings of the 37th International Conference on Neural Information Processing Systems (Red Hook, NY , USA, 2023), NIPS ’23, Curran Associates Inc

work page 2023

[15] [15]

Tov: The original vision model for optical remote sensing image understanding via self-supervised learning

T AO, C., Q I, J., Z HANG , G., Z HU, Q., L U, W., AND LI, H. Tov: The original vision model for optical remote sensing image understanding via self-supervised learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16 (2023), 4916–4930

work page 2023

[16] [16]

Terraclass: Mapeamento de uso e ocupac ¸˜ao da terra

T ERRA CLASS . Terraclass: Mapeamento de uso e ocupac ¸˜ao da terra. Accessed: July 24, 2025

work page 2025

[17] [17]

Swimdiff: Scene-wide matching con- trastive learning with diffusion constraint for remote sensing image, 2024, 2401.05093

T IAN , J., L EI, J., Z HANG , J., X IE, W., AND LI, Y. Swimdiff: Scene-wide matching con- trastive learning with diffusion constraint for remote sensing image, 2024, 2401.05093

work page arXiv 2024

[18] [18]

V., B RANDT , J., S PORE , J., M AJUMDAR , S., H AZIZA , D., V AMARAJU , J., M OUTAKANNI , T., B O- JANOWSKI , P., J OHNS , T., W HITE , B., T IECKE , T., AND COUPRIE , C

T OLAN , J., Y ANG , H.-I., N OSARZEWSKI , B., C OUAIRON , G., V O, H. V., B RANDT , J., S PORE , J., M AJUMDAR , S., H AZIZA , D., V AMARAJU , J., M OUTAKANNI , T., B O- JANOWSKI , P., J OHNS , T., W HITE , B., T IECKE , T., AND COUPRIE , C. Very high reso- lution canopy height maps from RGB imagery using self-supervised vision transformer and convolutio...

work page 2024

[19] [19]

Lightweight, pre-trained transformers for remote sensing timeseries,

T SENG , G., C ARTUYVELS , R., Z VONKOV, I., P UROHIT , M., R OLNICK , D., AND KERNER , H. Lightweight, Pre-trained Transformers for Remote Sensing Timeseries, Feb. 2024. arXiv:2304.14065 [cs]

work page arXiv 2024

[20] [20]

R., S HELHAMER , E., K ERNER , H., AND ROLNICK , D

T SENG , G., F ULLER , A., R EIL , M., H ERZOG , H., B EUKEMA , P., B ASTANI , F., G REEN , J. R., S HELHAMER , E., K ERNER , H., AND ROLNICK , D. Galileo: Learning global & local features of many remote sensing modalities, 2025, 2502.09356

work page arXiv 2025

[21] [21]

Pooch: A friend to fetch your data files.Journal of Open Source Software 5, 45 (Jan

U IEDA , L., S OLER , S., R AMPIN , R., VAN KEMENADE , H., T URK , M., S HAPERO , D., B AN- IHIRWE , A., AND LEEMAN , J. Pooch: A friend to fetch your data files.Journal of Open Source Software 5, 45 (Jan. 2020), 1943

work page 2020

[22] [22]

Ucam-eo project

U NIVERSITY OF CAMBRIDGE CENTRE FOR EARTH OBSERVATION . Ucam-eo project. https://github.com/ucam-eo. Accessed: 2025-07-22

work page 2025

[23] [23]

N., KAISER , L., AND POLOSUKHIN , I

V ASWANI , A., S HAZEER , N., P ARMAR , N., U SZKOREIT , J., J ONES , L., G OMEZ , A. N., KAISER , L., AND POLOSUKHIN , I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Red Hook, NY , USA, 2017), NIPS’17, Curran Associates Inc., p. 6000–6010

work page 2017

[24] [24]

V ERHEGGHEN , A., E VA, H., C ECCHERINI , G., A CHARD , F., G OND , V., G OURLET - FLEURY , S., AND CERUTTI , P. O. The Potential of Sentinel Satellites for Burnt Area Mapping and Monitoring in the Congo Basin Forests. Remote Sensing 8, 12 (Dec. 2016), 986. 25

work page 2016

[25] [25]

Vm0047 afforestation, reforestation, and reveg- etation, v1.1

V ERRA . Vm0047 afforestation, reforestation, and reveg- etation, v1.1. https://verra.org/methodologies/ vm0047-afforestation-reforestation-and-revegetation-v1-1/ , 2025. Verra Verified Carbon Standard (VCS) Program methodology

work page 2025

[26] [26]

H., D ALAGNOL , R., C ARTER , G., H IRYE , M

W AGNER , F. H., D ALAGNOL , R., C ARTER , G., H IRYE , M. C. M., G ILL , S., T AKOUGOUM , L. B. S., F AVRICHON , S., K ELLER , M., O METTO , J. P. H. B., A LVES, L., C REZE , C., GEORGE -C HACON , S. P., L I, S., L IU, Z., M ULLISSA , A., Y ANG , Y., S ANTOS , E. G., WORDEN , S. R., B RANDT , M., C IAIS , P., H AGEN , S. C., AND SAATCHI , S. High reso- l...

work page arXiv 2025

[27] [27]

J., X IONG , Z., Z HU, X

W ALDMANN , L., S HAH , A., WANG , Y., LEHMANN , N., S TEWART, A. J., X IONG , Z., Z HU, X. X., B AUER , S., AND CHUANG , J. Panopticon: Advancing any-sensor foundation models for earth observation, 2025, 2503.10845

work page arXiv 2025

[28] [28]

A., W OODCOCK , C

W ANG , C., S ONG , C., S CHROEDER , T. A., W OODCOCK , C. E., P AVELSKY, T. M., H AN, Q., AND YAO, F. Interpretable Multi-Sensor Fusion of Optical and SAR Data for GEDI-Based Canopy Height Mapping in Southeastern North Carolina. Remote Sensing 17, 9 (Jan. 2025), 1536

work page 2025

[29] [29]

Hypersigma: Hyperspectral intelligence comprehension foundation model, 2025, 2406.11519

W ANG , D., H U, M., J IN, Y., M IAO, Y., Y ANG , J., X U, Y., Q IN, X., M A, J., S UN, L., LI, C., F U, C., C HEN , H., H AN, C., Y OKOYA, N., Z HANG , J., X U, M., L IU, L., Z HANG , L., W U, C., D U, B., T AO, D., AND ZHANG , L. Hypersigma: Hyperspectral intelligence comprehension foundation model, 2025, 2406.11519

work page arXiv 2025

[30] [30]

An empirical study of remote sensing pretraining

W ANG , D., Z HANG , J., D U, B., X IA, G.-S., AND TAO, D. An empirical study of remote sensing pretraining. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–20

work page 2023

[31] [31]

Samrs: Scaling- up remote sensing segmentation dataset with segment anything model

W ANG , D., Z HANG , J., D U, B., X U, M., L IU, L., TAO, D., AND ZHANG , L. Samrs: Scaling- up remote sensing segmentation dataset with segment anything model. In Advances in Neural Information Processing Systems (2023), vol. 36, pp. 8815–8827

work page 2023

[32] [32]

Mtp: Advancing remote sensing foundation model via multi-task pretraining, 2024, 2403.13430

W ANG , D., Z HANG , J., X U, M., L IU, L., WANG , D., G AO, E., H AN, C., G UO, H., D U, B., TAO, D., AND ZHANG , L. Mtp: Advancing remote sensing foundation model via multi-task pretraining, 2024, 2403.13430

work page arXiv 2024

[33] [33]

Advanc- ing plain vision transformer toward remote sensing foundation model

W ANG , D., Z HANG , Q., X U, Y., Z HANG , J., D U, B., T AO, D., AND ZHANG , L. Advanc- ing plain vision transformer toward remote sensing foundation model. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–15

work page 2023

[34] [34]

Harnessing massive satellite imagery with efficient masked image modeling, 2025, 2406.11933

W ANG , F., W ANG , H., W ANG , D., G UO, Z., Z HONG , Z., L AN, L., Y ANG , W., AND ZHANG , J. Harnessing massive satellite imagery with efficient masked image modeling, 2025, 2406.11933

work page arXiv 2025

[35] [35]

Roma: Scaling up mamba-based foundation models for remote sensing, 2025, 2503.10392

W ANG , F., WANG , H., W ANG , Y., WANG , D., C HEN , M., Z HAO, H., S UN, Y., WANG , S., LAN, L., YANG , W., AND ZHANG , J. Roma: Scaling up mamba-based foundation models for remote sensing, 2025, 2503.10392

work page arXiv 2025

[36] [36]

M., B RAHAM , N

W ANG , Y., A LBRECHT , C. M., B RAHAM , N. A. A., L IU, C., X IONG , Z., AND ZHU, X. X. Decoupling common and unique representations for multimodal self-supervised learn- ing, 2024, 2309.05300

work page arXiv 2024

[37] [37]

M., AND ZHU, X

W ANG , Y., A LBRECHT , C. M., AND ZHU, X. X. Self-supervised vision transformers for joint sar-optical representation learning, 2022, 2204.05381

work page arXiv 2022

[38] [38]

M., AND ZHU, X

W ANG , Y., A LBRECHT , C. M., AND ZHU, X. X. Multilabel-guided soft contrastive learning for efficient earth observation pretraining. IEEE Transactions on Geoscience and Remote Sensing 62 (2024), 1–16

work page 2024

[39] [39]

W ANG , Y., B RAHAM , N. A. A., X IONG , Z., L IU, C., A LBRECHT , C. M., AND ZHU, X. X. Ssl4eo-s12: A large-scale multi-modal, multi-temporal dataset for self-supervised learning in earth observation. ArXiv abs/2211.07044 (2022)

work page arXiv 2022

[40] [40]

H., A LBRECHT , C

W ANG , Y., H ERN ´ANDEZ , H. H., A LBRECHT , C. M., AND ZHU, X. X. Feature guided masked autoencoder for self-supervised learning in remote sensing, 2023, 2310.18653

work page arXiv 2023

[41] [41]

J., D UJARDIN , T., B OUNTOS , N

W ANG , Y., X IONG , Z., L IU, C., S TEWART, A. J., D UJARDIN , T., B OUNTOS , N. I., Z A- VRAS , A., G ERKEN , F., P APOUTSIS , I., L EAL -TAIX ´E, L., AND ZHU, X. X. Towards a unified copernicus foundation model for earth vision, 2025, 2503.11849. 26

work page arXiv 2025

[42] [42]

Ringmo-lite: A remote sensing multi-task lightweight network with cnn-transformer hybrid framework, 2023, 2309.09003

W ANG , Y., Z HANG , T., Z HAO, L., H U, L., W ANG , Z., N IU, Z., C HENG , P., C HEN , K., ZENG , X., W ANG , Z., W ANG , H., AND SUN, X. Ringmo-lite: A remote sensing multi-task lightweight network with cnn-transformer hybrid framework, 2023, 2309.09003

work page arXiv 2023

[43] [43]

Rs-dfm: A remote sensing distributed foundation model for diverse downstream tasks, 2024, 2406.07032

W ANG , Z., C HENG , P., T IAN , P., W ANG , Y., C HEN , M., D UAN, S., W ANG , Z., L I, X., AND SUN, X. Rs-dfm: A remote sensing distributed foundation model for diverse downstream tasks, 2024, 2406.07032

work page arXiv 2024

[44] [44]

Dino-mc: Self-supervised contrastive learn- ing for remote sensing imagery with multi-sized local crops

W ANYAN , X., S ENEVIRATNE , S., S HEN , S., AND KIRLEY , M. Extending global-local view alignment for self-supervised learning with remote sensing imagery, 2024, 2303.06670

work page arXiv 2024

[45] [45]

Remote sensing for agricultural applications: A meta-review

W EISS , M., J ACOB , F., AND DUVEILLER , G. Remote sensing for agricultural applications: A meta-review. Remote Sensing of Environment 236 (Jan. 2020), 111402

work page 2020

[46] [46]

S., B RANNOCK , J., D ISSEN , J., K EOWN , P., S ZURA , K., B ROWN , O

W ILLETT , D. S., B RANNOCK , J., D ISSEN , J., K EOWN , P., S ZURA , K., B ROWN , O. B., AND SIMONSON , A. Noaa open data dissemination: Petabyte-scale earth system data in the cloud. Science Advances 9, 38 (2023), eadh0032

work page 2023

[47] [47]

Cat- sam: Conditional tuning for few-shot adaptation of segment anything model

X IAO, A., X UAN, W., Q I, H., X ING , Y., REN, R., Z HANG , X., S HAO, L., AND LU, S. Cat- sam: Conditional tuning for few-shot adaptation of segment anything model. arXiv preprint arXiv:2402.03631 (2024)

work page arXiv 2024

[48] [48]

Founda- tion models for remote sensing and earth observation: A survey, 2025, 2410.16602

X IAO, A., X UAN, W., WANG , J., H UANG , J., T AO, D., L U, S., AND YOKOYA, N. Founda- tion models for remote sensing and earth observation: A survey, 2025, 2410.16602

work page arXiv 2025

[49] [49]

Unified perceptual parsing for scene understanding

X IAO, T., L IU, Y., Z HOU , B., J IANG , Y., AND SUN, J. Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision (ECCV)(2018), pp. 418–434

work page 2018

[50] [50]

Xiong, Y

X IONG , Z., W ANG , Y., Z HANG , F., S TEWART, A. J., H ANNA , J., B ORTH , D., P APOUTSIS , I., S AUX, B. L., C AMPS -VALLS , G., AND ZHU, X. X. Neural plasticity-inspired multimodal foundation model for earth observation, 2024, 2403.15356

work page arXiv 2024

[51] [51]

X IONG , Z., W ANG , Y., ZHANG , F., AND ZHU, X. X. One for all: Toward unified foundation models for earth vision, 2024, 2401.07527

work page arXiv 2024

[52] [52]

Analytical insight of earth: A cloud-platform of intelligent computing for geospatial big data, 2023, 2312.16385

X U, H., M AN, Y., Y ANG , M., W U, J., Z HANG , Q., AND WANG , J. Analytical insight of earth: A cloud-platform of intelligent computing for geospatial big data, 2023, 2312.16385

work page arXiv 2023

[53] [53]

D., H AMMOND , W

Y AN, Y., H ONG , S., C HEN , A., P E ˜NUELAS , J., A LLEN , C. D., H AMMOND , W. M., M UN- SON , S. M., M YNENI , R. B., AND PIAO, S. Satellite-based evidence of recent decline in global forest recovery rate from tree mortality events. Nature Plants (2025), 1–12

work page 2025

[54] [54]

Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images

Y AN, Z., L I, J., L I, X., Z HOU , R., Z HANG , W., F ENG , Y., D IAO, W., F U, K., AND SUN, X. Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–16

work page 2023

[55] [55]

Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling

Y AO, F., L U, W., YANG , H., X U, L., L IU, C., H U, L., Y U, H., L IU, N., D ENG , C., T ANG , D., C HEN , C., Y U, J., S UN, X., AND FU, K. Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling. IEEE Trans- actions on Geoscience and Remote Sensing 61 (2023), 1–21

work page 2023

[56] [56]

A global dataset of forest regrowth following wildfires

Z ANG , J., Q IU, F., AND ZHANG , Y. A global dataset of forest regrowth following wildfires. Sci Data 11, 1 (Sept. 2024), 1052

work page 2024

[57] [57]

& Deny, S

Z BONTAR , J., J ING , L., M ISRA , I., L ECUN, Y., AND DENY, S. Barlow twins: Self- supervised learning via redundancy reduction. ArXiv abs/2103.03230 (2021)

work page arXiv 2021

[58] [58]

A 2-mae: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder, 2024, 2406.08079

Z HANG , L., Z HAO, Y., D ONG , R., Z HANG , J., Y UAN, S., C AO, S., C HEN , M., Z HENG , J., LI, W., L IU, W., Z HANG , W., F ENG , L., AND FU, H. A 2-mae: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder, 2024, 2406.08079

work page arXiv 2024

[59] [59]

Ctxmim: Context-enhanced masked image modeling for remote sensing image understanding, 2024, 2310.00022

Z HANG , M., L IU, Q., AND WANG , Y. Ctxmim: Context-enhanced masked image modeling for remote sensing image understanding, 2024, 2310.00022

work page arXiv 2024

[60] [60]

Consecutive pre-training: A knowledge transfer learning strategy with relevant unlabeled data for remote sensing domain

Z HANG , T., G AO, P., D ONG , H., Z HUANG , Y., W ANG , G., Z HANG , W., AND CHEN , H. Consecutive pre-training: A knowledge transfer learning strategy with relevant unlabeled data for remote sensing domain. Remote Sensing 14, 22 (2022). 27

work page 2022

[61] [61]

Uv-sam: adapting segment anything model for urban village identification

Z HANG , X., L IU, Y., L IN, Y., L IAO, Q., AND LI, Y. Uv-sam: adapting segment anything model for urban village identification. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances in Artificial In...

work page 2024

[62] [62]

Review of remote sensing-based methods for forest aboveground biomass estimation: Progress, challenges, and prospects

Z HAO, Z., D ONG , L., W U, S., X IAO, X., ET AL . Review of remote sensing-based methods for forest aboveground biomass estimation: Progress, challenges, and prospects. Forests 14, 6 (2023), 1086

work page 2023

[63] [63]

Changen2: Multi-temporal remote sensing generative change foundation model

Z HENG , Z., E RMON , S., K IM, D., Z HANG , L., AND ZHONG , Y. Changen2: Multi-temporal remote sensing generative change foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence 47, 2 (2025), 725–741

work page 2025

[64] [64]

Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications

Z HU, Z. Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications. ISPRS Journal of Photogrammetry and Remote Sensing 130 (2017), 370–384. Acknowledgments We gratefully acknowledge help from AMD Inc., Tarides, Jane Street, the Dawn supercomputing team at Cambridge, the Aalto University Science-IT pro...

work page 2017

[65] [65]

at a given point over time. Note that d-pixels can be sparse and are accompanied by a mask vector mi,j of size T that indicates the timesteps for which there are valid data, with a value 1 indicating that the corresponding row in Pi,j is valid. S1 The d-pixel Representation We represent each 1- m pixel in the time series of images from an- nual multispect...

work page

[66] [66]

For each view, independent sampling of a fixed number of valid observation dates from the annual Sentinel-2 time series (10 spectral bands)

work page

[67] [67]

For each view, independent sampling of a fixed number of valid observation dates from the annual Sentinel-1 time series (2 polarizations). These views represent different, valid, but inherently incomplete glimpses of the pixel’s true temporal-spectral evolution, akin to observing the same location through intermittent cloud cover or from different satelli...

work page 2017

[68] [68]

Unlike pre-training, no spatial downsampling is performed at this stage

The full Sentinel-1 and Sentinel-2 time series data at 1- meter resolution are acquired and pre-processed to form d-pixels. Unlike pre-training, no spatial downsampling is performed at this stage

work page

[69] [69]

A fixed number of 40 timesteps is sampled from the valid observations within the year for both Sentinel-1 and Sentinel-2 data, along with their DOY positional encodings

work page

[70] [70]

These sampled time series are fed into their respective frozen TESSERA encoders

work page

[71] [71]

This process is repeated for all land pixels globally to create an annual embeddings map of shape (H, W, 128), where H and W are the dimensions of the global 1- meter grid

The outputs from the S1 and S2 encoders are fused by the MLP, producing a 128- dimensional embedding vector for that pixel for that year. This process is repeated for all land pixels globally to create an annual embeddings map of shape (H, W, 128), where H and W are the dimensions of the global 1- meter grid. Scaling with data and network size To identify...

work page

[72] [72]

Download Embeddings for the Region of Interest : The G EOTESSERA Python li- brary ( 154) allows users to download embeddings for a desired region and year in the form of a numpy array

work page

[73] [73]

Prepare labelled Downstream Data: The labelled dataset for the target task (e.g., pixel- level crop-type labels, canopy height measurements, or land use change polygons) is pre- pared

work page

[74] [74]

This head takes the extracted TESSERA embeddings as input

Design Task-Specific Head : A lightweight, task-specific neural network module (the ”head”) is designed. This head takes the extracted TESSERA embeddings as input. • For pixel-wise classification (e.g., crop classification), the head is typically a shallow MLP (1-3 layers) ending in a softmax output layer. • For pixel-wise regression (e.g., canopy height ...

work page

[75] [75]

Train Downstream Head: Only the parameters of this newly defined task head are trained using the extracted TESSERA embeddings as input features and the corresponding labels. Standard supervised learning techniques, optimizers (e.g., Adam), and task-appropriate loss functions (e.g., Cross-Entropy for classification, Mean Squared Error for regression) are u...

work page

[76] [76]

unburned

Evaluation: Once the head is trained, inference is performed on a test set by extracting TESSERA embeddings for the test samples and passing them through the trained head. Performance is evaluated using standard metrics relevant to the task. This workflow allows the use of TESSERA embeddings in a range of diverse applications, demon- strating its role as ...

work page 2022

[77] [77]

TESSERA: difference in relative heights of GEDI estimates of RH90 and RH10 converted to AGB using 15.502 × + 160.5

work page

[78] [78]

ETH canopy height converted using 3.4 ×

work page

[79] [79]

CTrees canopy height converted using 1.806 × + 44.9

work page

[80] [80]

ESA AGB converted using 0.407 × + 34.0

work page