pith. sign in

arxiv: 2506.20380 · v7 · submitted 2025-06-25 · 💻 cs.LG

TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis

Pith reviewed 2026-05-19 07:30 UTC · model grok-4.3

classification 💻 cs.LG
keywords satellite time seriesearth observationself-supervised embeddingsBarlow TwinsSentinel-1Sentinel-2label efficiencyfoundation model
0
0 comments X

The pith

TESSERA learns invariant embeddings from irregular multi-modal satellite time series to deliver state-of-the-art accuracy with high label efficiency on Earth observation tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TESSERA as a pixel-wise foundation model that processes Sentinel-1 and Sentinel-2 time series by enforcing invariance to which valid observations are chosen. It combines Barlow Twins loss with sparse random temporal sampling plus two regularizers: global shuffling to break spatial correlations and mix-based regulation to handle extreme sparsity. This produces embeddings that support diverse classification, segmentation, and regression tasks using only small task heads and little computation. The model comes with released global 10m annual int8 embeddings, open weights, and code to enable large-scale use. A sympathetic reader would care because irregular satellite data often loses phenology information in compositing, and label-efficient embeddings could make planetary-scale analysis more practical.

Core claim

TESSERA is a pixel-wise foundation model for multi-modal Sentinel-1/2 Earth-observation time series that learns robust label-efficient embeddings by applying Barlow Twins loss together with sparse random temporal sampling to enforce invariance to valid-observation selection, augmented by global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve behavior under extreme sparsity, and these embeddings achieve state-of-the-art accuracy across classification, segmentation, and regression tasks while requiring only small task heads and minimal computation.

What carries the argument

Barlow Twins loss combined with sparse random temporal sampling, global shuffling, and mix-based regulation to enforce invariance to the selection of valid observations.

If this is right

  • Diverse Earth observation tasks can be solved with only a small task head and minimal additional computation.
  • Global annual 10 m int8 embeddings become available for large-scale retrieval and inference.
  • Open weights and lightweight adaptation heads simplify use for planetary-scale applications.
  • Irregular time series from orbital patterns and clouds can be handled without losing vegetation phenology information.
  • The same embeddings support classification, segmentation, and regression with high label efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Consistent embeddings over time could support long-term tracking of land-cover change without repeated model retraining.
  • The approach might reduce data requirements for monitoring applications in data-scarce regions.
  • Invariance to sparsity could prove useful for fusing additional irregular data sources in future extensions.

Load-bearing premise

That enforcing invariance to valid-observation selection via Barlow Twins, global shuffling, and mix-based regulation produces embeddings that actually improve downstream performance across unseen geographic regions and task distributions.

What would settle it

A benchmark on a held-out geographic region or new task distribution where TESSERA embeddings require substantially more labels or fail to exceed the accuracy of existing methods.

read the original abstract

Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR - principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. All code and data are available at: https://github.com/ucam-eo/tessera.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) Earth-observation time series. It trains embeddings via Barlow Twins loss on sparse random temporal samples, augmented by global shuffling to decorrelate spatial neighborhoods and mix-based regulation for extreme sparsity. The central claim is that these embeddings achieve state-of-the-art accuracy and high label efficiency across classification, segmentation, and regression tasks, often with only a small task head, while the authors release global 10 m annual int8 embeddings, open weights, code, and lightweight adaptation heads.

Significance. If the performance claims survive geographic separation, the work would be significant for EO foundation modeling by demonstrating that invariance to valid-observation selection can yield label-efficient representations without compositing losses. The public release of planetary-scale embeddings and reproducible code is a clear strength that supports downstream use and verification.

major comments (2)
  1. [§4] §4 (Experimental Setup): the manuscript does not describe continent-level or other strict geographic hold-out splits between training and test pixels. Given documented spatial autocorrelation in EO data, the reported SOTA numbers on downstream tasks could reflect leakage rather than the claimed invariance; explicit geographic separation is required to substantiate the central generalization claim.
  2. [§4.3] §4.3 and Table 2: the quantitative baselines, error bars, and full ablation results for the Barlow Twins + shuffling + mix combination are not presented with sufficient detail to verify the label-efficiency and accuracy assertions; without these, the translation from training-time invariance to out-of-region utility remains unproven.
minor comments (2)
  1. [§3.2] Notation in §3.2: the precise definition of the mix-based regulation term should be written as an explicit equation rather than described in prose to improve reproducibility.
  2. [Figure 3] Figure 3 caption: clarify whether the visualized embeddings are before or after the small task head, and add scale bars for the geographic examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental Setup): the manuscript does not describe continent-level or other strict geographic hold-out splits between training and test pixels. Given documented spatial autocorrelation in EO data, the reported SOTA numbers on downstream tasks could reflect leakage rather than the claimed invariance; explicit geographic separation is required to substantiate the central generalization claim.

    Authors: We agree that spatial autocorrelation poses a risk of data leakage in EO tasks and that continent-level or other strict geographic hold-outs provide stronger evidence for generalization. The original experiments relied on random pixel-level splits within the evaluation datasets to focus on label efficiency under the invariance training regime. In the revised manuscript we will add explicit geographic separation experiments (e.g., training on European tiles and testing on African and Asian tiles) and report the corresponding downstream metrics. These results will be incorporated into Section 4 and the supplementary material. revision: yes

  2. Referee: [§4.3] §4.3 and Table 2: the quantitative baselines, error bars, and full ablation results for the Barlow Twins + shuffling + mix combination are not presented with sufficient detail to verify the label-efficiency and accuracy assertions; without these, the translation from training-time invariance to out-of-region utility remains unproven.

    Authors: We acknowledge that additional quantitative detail would strengthen the claims. The current Table 2 and Section 4.3 present the main baseline comparisons, but we will expand the revision to include (i) standard error bars computed over multiple random seeds, (ii) a complete ablation table isolating the contribution of global shuffling and mix-based regularization, and (iii) further baseline methods. These additions will be placed in Section 4.3 with supporting figures moved to the appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation or claims

full rationale

The paper trains TESSERA using the established Barlow Twins loss combined with sparse random temporal sampling, global shuffling, and mix-based regularization to promote invariance to valid observation selection in EO time series. Downstream evaluations on classification, segmentation, and regression tasks use independent task heads and report empirical accuracies without any equations or results reducing by construction to author-fitted parameters or self-referential definitions. No self-citation chains are invoked as load-bearing uniqueness theorems, and the central invariance claim is grounded in the training objective rather than renaming or smuggling prior author ansatzes. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard self-supervised learning assumptions rather than new free parameters or invented physical entities.

axioms (1)
  • domain assumption Barlow Twins loss plus the stated regularizers produces embeddings invariant to the selection of valid temporal observations
    Invoked in the training description to justify robustness under irregular sampling.

pith-pipeline@v0.9.0 · 5812 in / 1274 out tokens · 39078 ms · 2026-05-19T07:30:20.392751+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Better Together: Evaluating the Complementarity of Earth Embedding Models

    cs.CV 2026-05 unverdicted novelty 7.0

    Fusing embeddings from four Earth models (AlphaEarth, Tessera, GeoCLIP, SatCLIP) outperforms the best single model on four of six tasks, with gains depending on task and location.

  2. FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes

    cs.LG 2026-05 unverdicted novelty 6.0

    FLUXtrapolation is a benchmark for domain generalization in ecosystem flux upscaling using temporal, spatial, and temperature-based extrapolation scenarios, with pilot results showing model separation on tail and mult...

  3. Agentic AI for Remote Sensing: Technical Challenges and Research Directions

    cs.CV 2026-04 unverdicted novelty 6.0

    Agentic AI faces structural challenges in remote sensing due to geospatial data properties and workflow constraints, requiring EO-native agents built around structured state, tool-aware reasoning, and validity-aware e...

  4. Agentic AI for Remote Sensing: Technical Challenges and Research Directions

    cs.CV 2026-04 unverdicted novelty 5.0

    Agentic AI for remote sensing requires new designs centered on structured geospatial state, tool-aware reasoning, verifier-guided execution, and physical validity rather than generic extensions.

  5. Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping

    cs.CV 2026-04 unverdicted novelty 5.0

    SSDM decouples global geospatial embeddings into structural modulation and semantic injection pathways to improve accuracy and consistency in high-resolution remote sensing land cover mapping.

  6. Location Is All You Need: Continuous Spatiotemporal Neural Representations of Earth Observation Data

    cs.CV 2026-04 unverdicted novelty 5.0

    LIANet encodes multi-temporal Earth observation data into a coordinate-based neural field that supports label-only fine-tuning for downstream tasks without access to raw imagery.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages · cited by 5 Pith papers

  1. [1]

    Lightweight temporal self-attention for classifying satellite images time series

    S AINTE FARE GARNOT , V., AND LANDRIEU , L. Lightweight temporal self-attention for classifying satellite images time series. In Lecture Notes in Computer Science (12 2020), pp. 171–181

  2. [2]

    Esa biomass climate change initiative (biomasscci): Global datasets of forest above-ground biomass for the years 2010, 2015, 2016, 2017, 2018, 2019, 2020 and 2021, v5.01, 2024

    S ANTORO , M., AND CARTUS , O. Esa biomass climate change initiative (biomasscci): Global datasets of forest above-ground biomass for the years 2010, 2015, 2016, 2017, 2018, 2019, 2020 and 2021, v5.01, 2024

  3. [3]

    Self-supervised vision transformers for land-cover segmentation and classification

    S CHEIBENREIF , L., H ANNA , J., M OMMERT , M., AND BORTH , D. Self-supervised vision transformers for land-cover segmentation and classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2022), pp. 1422–1431

  4. [4]

    Prithvi wxc: Foundation model for weather and climate.arXiv preprint arXiv:2409.13598, 2024

    S CHMUDE , J., ROY, S., T ROJAK , W., JAKUBIK , J., C IVITARESE , D. S., S INGH , S., K UEHN - ERT, J., A NKUR , K., G UPTA, A., P HILLIPS , C. E., K IENZLER , R., S ZWARCMAN , D., GAUR, V., S HINDE , R., L AL, R., S ILVA, A. D., D IAZ , J. L. G., J ONES , A., P FREUND - SCHUH , S., L IN, A., S HESHADRI , A., N AIR , U., A NANTHARAJ , V., H AMANN , H., W ...

  5. [5]

    J., B OYD, D

    S HENKIN , A., C HANDLER , C. J., B OYD, D. S., J ACKSON , T., D ISNEY , M., M AJALAP , N., NILUS , R., F OODY, G., BIN JAMI , J., R EYNOLDS , G., W ILKES , P., CUTLER , M. E. J., VAN DER HEIJDEN , G. M. F., B URSLEM , D. F. R. P., C OOMES , D. A., B ENTLEY , L. P., AND MALHI , Y. The World’s Tallest Tropical Tree in Three Dimensions.Front. For. Glob. Cha...

  6. [6]

    K., C OOPS , N

    S KIDMORE , A. K., C OOPS , N. C., N EINAVAZ, E., A LI, A., S CHAEPMAN , M. E., P A- GANINI , M., K ISSLING , W. D., V IHERVAARA , P., D ARVISHZADEH , R., F EILHAUER , H., FERNANDEZ , M., F ERN ´ANDEZ , N., G ORELICK , N., G EIJZENDORFFER , I., H EIDEN , U., HEURICH , M., H OBERN , D., H OLZWARTH , S., M ULLER -KARGER , F. E., V AN DE KER- CHOVE , R., L A...

  7. [7]

    J., F LEMING , L., AND GEACH , J

    S MITH , M. J., F LEMING , L., AND GEACH , J. E. Earthpt: a time series foundation model for earth observation, 2024, 2309.07207

  8. [8]

    S., C ARABALLO -V EGA , J

    S PRADLIN , C. S., C ARABALLO -V EGA , J. A., L I, J., C ARROLL , M. L., G ONG , J., AND MONTESANO , P. M. Satvision-toa: A geospatial foundation model for coarse-resolution all- sky remote sensing imagery, 2024, 2411.17000

  9. [9]

    Self-supervised learning of remote sensing scene represen- tations using contrastive multiview coding

    S TOJNIC , V., AND RISOJEVIC , V. Self-supervised learning of remote sensing scene represen- tations using contrastive multiview coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops(June 2021), pp. 1182–1191. 24

  10. [10]

    Ringmo: A remote sens- ing foundation model with masked image modeling

    S UN, X., W ANG , P., L U, W., Z HU, Z., L U, X., H E, Q., L I, J., R ONG , X., Y ANG , Z., CHANG , H., H E, Q., Y ANG , G., W ANG , R., L U, J., AND FU, K. Ringmo: A remote sens- ing foundation model with masked image modeling. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–22

  11. [11]

    Prithvi-eo-2.0: A versatile multi-temporal foundation model for earth observation applications.arXiv preprint arXiv:2412.02732, 2024

    S ZWARCMAN , D., R OY, S., F RACCARO , P., G ISLASON , P. E., B LUMENSTIEL , B., GHOSAL , R., DE OLIVEIRA , P. H., DE SOUSA ALMEIDA , J. L., S EDONA , R., K ANG , Y., C HAKRABORTY , S., W ANG , S., G OMES , C., K UMAR , A., T RUONG , M., G ODWIN , D., L EE, H., H SU, C.-Y., A SANJAN , A. A., M UJECI , B., S HIDHAM , D., K EENAN , T., AREVALO , P., L I, W....

  12. [12]

    Towards privacy-preserved pre-training of re- mote sensing foundation models with federated mutual-guidance learning, 2025, 2503.11051

    T AN, J., Z HANG , C., D ANG , B., AND LI, Y. Towards privacy-preserved pre-training of re- mote sensing foundation models with federated mutual-guidance learning, 2025, 2503.11051

  13. [13]

    A., B ELENGUER -PLOMER , M

    T ANASE , M. A., B ELENGUER -PLOMER , M. A., R OTETA , E., B ASTARRIKA , A., WHEELER , J., F ERN ´ANDEZ -C ARRILLO , ´A., T ANSEY , K., W IEDEMANN , W., N AVRATIL , P., L OHBERGER , S., S IEGERT , F., AND CHUVIECO , E. Burned Area Detection and Map- ping: Intercomparison of Sentinel-1 and Sentinel-2 Based Algorithms over Tropical Africa. Remote Sensing 12...

  14. [14]

    Cross-scale mae: a tale of multi-scale exploitation in remote sensing

    T ANG , M., C OZMA , A., G EORGIOU , K., AND QI, H. Cross-scale mae: a tale of multi-scale exploitation in remote sensing. In Proceedings of the 37th International Conference on Neural Information Processing Systems (Red Hook, NY , USA, 2023), NIPS ’23, Curran Associates Inc

  15. [15]

    Tov: The original vision model for optical remote sensing image understanding via self-supervised learning

    T AO, C., Q I, J., Z HANG , G., Z HU, Q., L U, W., AND LI, H. Tov: The original vision model for optical remote sensing image understanding via self-supervised learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16 (2023), 4916–4930

  16. [16]

    Terraclass: Mapeamento de uso e ocupac ¸˜ao da terra

    T ERRA CLASS . Terraclass: Mapeamento de uso e ocupac ¸˜ao da terra. Accessed: July 24, 2025

  17. [17]

    Swimdiff: Scene-wide matching con- trastive learning with diffusion constraint for remote sensing image, 2024, 2401.05093

    T IAN , J., L EI, J., Z HANG , J., X IE, W., AND LI, Y. Swimdiff: Scene-wide matching con- trastive learning with diffusion constraint for remote sensing image, 2024, 2401.05093

  18. [18]

    V., B RANDT , J., S PORE , J., M AJUMDAR , S., H AZIZA , D., V AMARAJU , J., M OUTAKANNI , T., B O- JANOWSKI , P., J OHNS , T., W HITE , B., T IECKE , T., AND COUPRIE , C

    T OLAN , J., Y ANG , H.-I., N OSARZEWSKI , B., C OUAIRON , G., V O, H. V., B RANDT , J., S PORE , J., M AJUMDAR , S., H AZIZA , D., V AMARAJU , J., M OUTAKANNI , T., B O- JANOWSKI , P., J OHNS , T., W HITE , B., T IECKE , T., AND COUPRIE , C. Very high reso- lution canopy height maps from RGB imagery using self-supervised vision transformer and convolutio...

  19. [19]

    Lightweight, pre-trained transformers for remote sensing timeseries,

    T SENG , G., C ARTUYVELS , R., Z VONKOV, I., P UROHIT , M., R OLNICK , D., AND KERNER , H. Lightweight, Pre-trained Transformers for Remote Sensing Timeseries, Feb. 2024. arXiv:2304.14065 [cs]

  20. [20]

    R., S HELHAMER , E., K ERNER , H., AND ROLNICK , D

    T SENG , G., F ULLER , A., R EIL , M., H ERZOG , H., B EUKEMA , P., B ASTANI , F., G REEN , J. R., S HELHAMER , E., K ERNER , H., AND ROLNICK , D. Galileo: Learning global & local features of many remote sensing modalities, 2025, 2502.09356

  21. [21]

    Pooch: A friend to fetch your data files.Journal of Open Source Software 5, 45 (Jan

    U IEDA , L., S OLER , S., R AMPIN , R., VAN KEMENADE , H., T URK , M., S HAPERO , D., B AN- IHIRWE , A., AND LEEMAN , J. Pooch: A friend to fetch your data files.Journal of Open Source Software 5, 45 (Jan. 2020), 1943

  22. [22]

    Ucam-eo project

    U NIVERSITY OF CAMBRIDGE CENTRE FOR EARTH OBSERVATION . Ucam-eo project. https://github.com/ucam-eo. Accessed: 2025-07-22

  23. [23]

    N., KAISER , L., AND POLOSUKHIN , I

    V ASWANI , A., S HAZEER , N., P ARMAR , N., U SZKOREIT , J., J ONES , L., G OMEZ , A. N., KAISER , L., AND POLOSUKHIN , I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Red Hook, NY , USA, 2017), NIPS’17, Curran Associates Inc., p. 6000–6010

  24. [24]

    V ERHEGGHEN , A., E VA, H., C ECCHERINI , G., A CHARD , F., G OND , V., G OURLET - FLEURY , S., AND CERUTTI , P. O. The Potential of Sentinel Satellites for Burnt Area Mapping and Monitoring in the Congo Basin Forests. Remote Sensing 8, 12 (Dec. 2016), 986. 25

  25. [25]

    Vm0047 afforestation, reforestation, and reveg- etation, v1.1

    V ERRA . Vm0047 afforestation, reforestation, and reveg- etation, v1.1. https://verra.org/methodologies/ vm0047-afforestation-reforestation-and-revegetation-v1-1/ , 2025. Verra Verified Carbon Standard (VCS) Program methodology

  26. [26]

    H., D ALAGNOL , R., C ARTER , G., H IRYE , M

    W AGNER , F. H., D ALAGNOL , R., C ARTER , G., H IRYE , M. C. M., G ILL , S., T AKOUGOUM , L. B. S., F AVRICHON , S., K ELLER , M., O METTO , J. P. H. B., A LVES, L., C REZE , C., GEORGE -C HACON , S. P., L I, S., L IU, Z., M ULLISSA , A., Y ANG , Y., S ANTOS , E. G., WORDEN , S. R., B RANDT , M., C IAIS , P., H AGEN , S. C., AND SAATCHI , S. High reso- l...

  27. [27]

    J., X IONG , Z., Z HU, X

    W ALDMANN , L., S HAH , A., WANG , Y., LEHMANN , N., S TEWART, A. J., X IONG , Z., Z HU, X. X., B AUER , S., AND CHUANG , J. Panopticon: Advancing any-sensor foundation models for earth observation, 2025, 2503.10845

  28. [28]

    A., W OODCOCK , C

    W ANG , C., S ONG , C., S CHROEDER , T. A., W OODCOCK , C. E., P AVELSKY, T. M., H AN, Q., AND YAO, F. Interpretable Multi-Sensor Fusion of Optical and SAR Data for GEDI-Based Canopy Height Mapping in Southeastern North Carolina. Remote Sensing 17, 9 (Jan. 2025), 1536

  29. [29]

    Hypersigma: Hyperspectral intelligence comprehension foundation model, 2025, 2406.11519

    W ANG , D., H U, M., J IN, Y., M IAO, Y., Y ANG , J., X U, Y., Q IN, X., M A, J., S UN, L., LI, C., F U, C., C HEN , H., H AN, C., Y OKOYA, N., Z HANG , J., X U, M., L IU, L., Z HANG , L., W U, C., D U, B., T AO, D., AND ZHANG , L. Hypersigma: Hyperspectral intelligence comprehension foundation model, 2025, 2406.11519

  30. [30]

    An empirical study of remote sensing pretraining

    W ANG , D., Z HANG , J., D U, B., X IA, G.-S., AND TAO, D. An empirical study of remote sensing pretraining. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–20

  31. [31]

    Samrs: Scaling- up remote sensing segmentation dataset with segment anything model

    W ANG , D., Z HANG , J., D U, B., X U, M., L IU, L., TAO, D., AND ZHANG , L. Samrs: Scaling- up remote sensing segmentation dataset with segment anything model. In Advances in Neural Information Processing Systems (2023), vol. 36, pp. 8815–8827

  32. [32]

    Mtp: Advancing remote sensing foundation model via multi-task pretraining, 2024, 2403.13430

    W ANG , D., Z HANG , J., X U, M., L IU, L., WANG , D., G AO, E., H AN, C., G UO, H., D U, B., TAO, D., AND ZHANG , L. Mtp: Advancing remote sensing foundation model via multi-task pretraining, 2024, 2403.13430

  33. [33]

    Advanc- ing plain vision transformer toward remote sensing foundation model

    W ANG , D., Z HANG , Q., X U, Y., Z HANG , J., D U, B., T AO, D., AND ZHANG , L. Advanc- ing plain vision transformer toward remote sensing foundation model. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–15

  34. [34]

    Harnessing massive satellite imagery with efficient masked image modeling, 2025, 2406.11933

    W ANG , F., W ANG , H., W ANG , D., G UO, Z., Z HONG , Z., L AN, L., Y ANG , W., AND ZHANG , J. Harnessing massive satellite imagery with efficient masked image modeling, 2025, 2406.11933

  35. [35]

    Roma: Scaling up mamba-based foundation models for remote sensing, 2025, 2503.10392

    W ANG , F., WANG , H., W ANG , Y., WANG , D., C HEN , M., Z HAO, H., S UN, Y., WANG , S., LAN, L., YANG , W., AND ZHANG , J. Roma: Scaling up mamba-based foundation models for remote sensing, 2025, 2503.10392

  36. [36]

    M., B RAHAM , N

    W ANG , Y., A LBRECHT , C. M., B RAHAM , N. A. A., L IU, C., X IONG , Z., AND ZHU, X. X. Decoupling common and unique representations for multimodal self-supervised learn- ing, 2024, 2309.05300

  37. [37]

    M., AND ZHU, X

    W ANG , Y., A LBRECHT , C. M., AND ZHU, X. X. Self-supervised vision transformers for joint sar-optical representation learning, 2022, 2204.05381

  38. [38]

    M., AND ZHU, X

    W ANG , Y., A LBRECHT , C. M., AND ZHU, X. X. Multilabel-guided soft contrastive learning for efficient earth observation pretraining. IEEE Transactions on Geoscience and Remote Sensing 62 (2024), 1–16

  39. [39]

    W ANG , Y., B RAHAM , N. A. A., X IONG , Z., L IU, C., A LBRECHT , C. M., AND ZHU, X. X. Ssl4eo-s12: A large-scale multi-modal, multi-temporal dataset for self-supervised learning in earth observation. ArXiv abs/2211.07044 (2022)

  40. [40]

    H., A LBRECHT , C

    W ANG , Y., H ERN ´ANDEZ , H. H., A LBRECHT , C. M., AND ZHU, X. X. Feature guided masked autoencoder for self-supervised learning in remote sensing, 2023, 2310.18653

  41. [41]

    J., D UJARDIN , T., B OUNTOS , N

    W ANG , Y., X IONG , Z., L IU, C., S TEWART, A. J., D UJARDIN , T., B OUNTOS , N. I., Z A- VRAS , A., G ERKEN , F., P APOUTSIS , I., L EAL -TAIX ´E, L., AND ZHU, X. X. Towards a unified copernicus foundation model for earth vision, 2025, 2503.11849. 26

  42. [42]

    Ringmo-lite: A remote sensing multi-task lightweight network with cnn-transformer hybrid framework, 2023, 2309.09003

    W ANG , Y., Z HANG , T., Z HAO, L., H U, L., W ANG , Z., N IU, Z., C HENG , P., C HEN , K., ZENG , X., W ANG , Z., W ANG , H., AND SUN, X. Ringmo-lite: A remote sensing multi-task lightweight network with cnn-transformer hybrid framework, 2023, 2309.09003

  43. [43]

    Rs-dfm: A remote sensing distributed foundation model for diverse downstream tasks, 2024, 2406.07032

    W ANG , Z., C HENG , P., T IAN , P., W ANG , Y., C HEN , M., D UAN, S., W ANG , Z., L I, X., AND SUN, X. Rs-dfm: A remote sensing distributed foundation model for diverse downstream tasks, 2024, 2406.07032

  44. [44]

    Dino-mc: Self-supervised contrastive learn- ing for remote sensing imagery with multi-sized local crops

    W ANYAN , X., S ENEVIRATNE , S., S HEN , S., AND KIRLEY , M. Extending global-local view alignment for self-supervised learning with remote sensing imagery, 2024, 2303.06670

  45. [45]

    Remote sensing for agricultural applications: A meta-review

    W EISS , M., J ACOB , F., AND DUVEILLER , G. Remote sensing for agricultural applications: A meta-review. Remote Sensing of Environment 236 (Jan. 2020), 111402

  46. [46]

    S., B RANNOCK , J., D ISSEN , J., K EOWN , P., S ZURA , K., B ROWN , O

    W ILLETT , D. S., B RANNOCK , J., D ISSEN , J., K EOWN , P., S ZURA , K., B ROWN , O. B., AND SIMONSON , A. Noaa open data dissemination: Petabyte-scale earth system data in the cloud. Science Advances 9, 38 (2023), eadh0032

  47. [47]

    Cat- sam: Conditional tuning for few-shot adaptation of segment anything model

    X IAO, A., X UAN, W., Q I, H., X ING , Y., REN, R., Z HANG , X., S HAO, L., AND LU, S. Cat- sam: Conditional tuning for few-shot adaptation of segment anything model. arXiv preprint arXiv:2402.03631 (2024)

  48. [48]

    Founda- tion models for remote sensing and earth observation: A survey, 2025, 2410.16602

    X IAO, A., X UAN, W., WANG , J., H UANG , J., T AO, D., L U, S., AND YOKOYA, N. Founda- tion models for remote sensing and earth observation: A survey, 2025, 2410.16602

  49. [49]

    Unified perceptual parsing for scene understanding

    X IAO, T., L IU, Y., Z HOU , B., J IANG , Y., AND SUN, J. Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision (ECCV)(2018), pp. 418–434

  50. [50]

    Xiong, Y

    X IONG , Z., W ANG , Y., Z HANG , F., S TEWART, A. J., H ANNA , J., B ORTH , D., P APOUTSIS , I., S AUX, B. L., C AMPS -VALLS , G., AND ZHU, X. X. Neural plasticity-inspired multimodal foundation model for earth observation, 2024, 2403.15356

  51. [51]

    X IONG , Z., W ANG , Y., ZHANG , F., AND ZHU, X. X. One for all: Toward unified foundation models for earth vision, 2024, 2401.07527

  52. [52]

    Analytical insight of earth: A cloud-platform of intelligent computing for geospatial big data, 2023, 2312.16385

    X U, H., M AN, Y., Y ANG , M., W U, J., Z HANG , Q., AND WANG , J. Analytical insight of earth: A cloud-platform of intelligent computing for geospatial big data, 2023, 2312.16385

  53. [53]

    D., H AMMOND , W

    Y AN, Y., H ONG , S., C HEN , A., P E ˜NUELAS , J., A LLEN , C. D., H AMMOND , W. M., M UN- SON , S. M., M YNENI , R. B., AND PIAO, S. Satellite-based evidence of recent decline in global forest recovery rate from tree mortality events. Nature Plants (2025), 1–12

  54. [54]

    Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images

    Y AN, Z., L I, J., L I, X., Z HOU , R., Z HANG , W., F ENG , Y., D IAO, W., F U, K., AND SUN, X. Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–16

  55. [55]

    Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling

    Y AO, F., L U, W., YANG , H., X U, L., L IU, C., H U, L., Y U, H., L IU, N., D ENG , C., T ANG , D., C HEN , C., Y U, J., S UN, X., AND FU, K. Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling. IEEE Trans- actions on Geoscience and Remote Sensing 61 (2023), 1–21

  56. [56]

    A global dataset of forest regrowth following wildfires

    Z ANG , J., Q IU, F., AND ZHANG , Y. A global dataset of forest regrowth following wildfires. Sci Data 11, 1 (Sept. 2024), 1052

  57. [57]

    & Deny, S

    Z BONTAR , J., J ING , L., M ISRA , I., L ECUN, Y., AND DENY, S. Barlow twins: Self- supervised learning via redundancy reduction. ArXiv abs/2103.03230 (2021)

  58. [58]

    A 2-mae: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder, 2024, 2406.08079

    Z HANG , L., Z HAO, Y., D ONG , R., Z HANG , J., Y UAN, S., C AO, S., C HEN , M., Z HENG , J., LI, W., L IU, W., Z HANG , W., F ENG , L., AND FU, H. A 2-mae: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder, 2024, 2406.08079

  59. [59]

    Ctxmim: Context-enhanced masked image modeling for remote sensing image understanding, 2024, 2310.00022

    Z HANG , M., L IU, Q., AND WANG , Y. Ctxmim: Context-enhanced masked image modeling for remote sensing image understanding, 2024, 2310.00022

  60. [60]

    Consecutive pre-training: A knowledge transfer learning strategy with relevant unlabeled data for remote sensing domain

    Z HANG , T., G AO, P., D ONG , H., Z HUANG , Y., W ANG , G., Z HANG , W., AND CHEN , H. Consecutive pre-training: A knowledge transfer learning strategy with relevant unlabeled data for remote sensing domain. Remote Sensing 14, 22 (2022). 27

  61. [61]

    Uv-sam: adapting segment anything model for urban village identification

    Z HANG , X., L IU, Y., L IN, Y., L IAO, Q., AND LI, Y. Uv-sam: adapting segment anything model for urban village identification. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances in Artificial In...

  62. [62]

    Review of remote sensing-based methods for forest aboveground biomass estimation: Progress, challenges, and prospects

    Z HAO, Z., D ONG , L., W U, S., X IAO, X., ET AL . Review of remote sensing-based methods for forest aboveground biomass estimation: Progress, challenges, and prospects. Forests 14, 6 (2023), 1086

  63. [63]

    Changen2: Multi-temporal remote sensing generative change foundation model

    Z HENG , Z., E RMON , S., K IM, D., Z HANG , L., AND ZHONG , Y. Changen2: Multi-temporal remote sensing generative change foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence 47, 2 (2025), 725–741

  64. [64]

    Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications

    Z HU, Z. Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications. ISPRS Journal of Photogrammetry and Remote Sensing 130 (2017), 370–384. Acknowledgments We gratefully acknowledge help from AMD Inc., Tarides, Jane Street, the Dawn supercomputing team at Cambridge, the Aalto University Science-IT pro...

  65. [65]

    at a given point over time. Note that d-pixels can be sparse and are accompanied by a mask vector mi,j of size T that indicates the timesteps for which there are valid data, with a value 1 indicating that the corresponding row in Pi,j is valid. S1 The d-pixel Representation We represent each 1- m pixel in the time series of images from an- nual multispect...

  66. [66]

    For each view, independent sampling of a fixed number of valid observation dates from the annual Sentinel-2 time series (10 spectral bands)

  67. [67]

    For each view, independent sampling of a fixed number of valid observation dates from the annual Sentinel-1 time series (2 polarizations). These views represent different, valid, but inherently incomplete glimpses of the pixel’s true temporal-spectral evolution, akin to observing the same location through intermittent cloud cover or from different satelli...

  68. [68]

    Unlike pre-training, no spatial downsampling is performed at this stage

    The full Sentinel-1 and Sentinel-2 time series data at 1- meter resolution are acquired and pre-processed to form d-pixels. Unlike pre-training, no spatial downsampling is performed at this stage

  69. [69]

    A fixed number of 40 timesteps is sampled from the valid observations within the year for both Sentinel-1 and Sentinel-2 data, along with their DOY positional encodings

  70. [70]

    These sampled time series are fed into their respective frozen TESSERA encoders

  71. [71]

    This process is repeated for all land pixels globally to create an annual embeddings map of shape (H, W, 128), where H and W are the dimensions of the global 1- meter grid

    The outputs from the S1 and S2 encoders are fused by the MLP, producing a 128- dimensional embedding vector for that pixel for that year. This process is repeated for all land pixels globally to create an annual embeddings map of shape (H, W, 128), where H and W are the dimensions of the global 1- meter grid. Scaling with data and network size To identify...

  72. [72]

    Download Embeddings for the Region of Interest : The G EOTESSERA Python li- brary ( 154) allows users to download embeddings for a desired region and year in the form of a numpy array

  73. [73]

    Prepare labelled Downstream Data: The labelled dataset for the target task (e.g., pixel- level crop-type labels, canopy height measurements, or land use change polygons) is pre- pared

  74. [74]

    This head takes the extracted TESSERA embeddings as input

    Design Task-Specific Head : A lightweight, task-specific neural network module (the ”head”) is designed. This head takes the extracted TESSERA embeddings as input. • For pixel-wise classification (e.g., crop classification), the head is typically a shallow MLP (1-3 layers) ending in a softmax output layer. • For pixel-wise regression (e.g., canopy height ...

  75. [75]

    Train Downstream Head: Only the parameters of this newly defined task head are trained using the extracted TESSERA embeddings as input features and the corresponding labels. Standard supervised learning techniques, optimizers (e.g., Adam), and task-appropriate loss functions (e.g., Cross-Entropy for classification, Mean Squared Error for regression) are u...

  76. [76]

    unburned

    Evaluation: Once the head is trained, inference is performed on a test set by extracting TESSERA embeddings for the test samples and passing them through the trained head. Performance is evaluated using standard metrics relevant to the task. This workflow allows the use of TESSERA embeddings in a range of diverse applications, demon- strating its role as ...

  77. [77]

    TESSERA: difference in relative heights of GEDI estimates of RH90 and RH10 converted to AGB using 15.502 × + 160.5

  78. [78]

    ETH canopy height converted using 3.4 ×

  79. [79]

    CTrees canopy height converted using 1.806 × + 44.9

  80. [80]

    ESA AGB converted using 0.407 × + 34.0

Showing first 80 references.