TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis
Pith reviewed 2026-05-19 07:30 UTC · model grok-4.3
The pith
TESSERA learns invariant embeddings from irregular multi-modal satellite time series to deliver state-of-the-art accuracy with high label efficiency on Earth observation tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TESSERA is a pixel-wise foundation model for multi-modal Sentinel-1/2 Earth-observation time series that learns robust label-efficient embeddings by applying Barlow Twins loss together with sparse random temporal sampling to enforce invariance to valid-observation selection, augmented by global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve behavior under extreme sparsity, and these embeddings achieve state-of-the-art accuracy across classification, segmentation, and regression tasks while requiring only small task heads and minimal computation.
What carries the argument
Barlow Twins loss combined with sparse random temporal sampling, global shuffling, and mix-based regulation to enforce invariance to the selection of valid observations.
If this is right
- Diverse Earth observation tasks can be solved with only a small task head and minimal additional computation.
- Global annual 10 m int8 embeddings become available for large-scale retrieval and inference.
- Open weights and lightweight adaptation heads simplify use for planetary-scale applications.
- Irregular time series from orbital patterns and clouds can be handled without losing vegetation phenology information.
- The same embeddings support classification, segmentation, and regression with high label efficiency.
Where Pith is reading between the lines
- Consistent embeddings over time could support long-term tracking of land-cover change without repeated model retraining.
- The approach might reduce data requirements for monitoring applications in data-scarce regions.
- Invariance to sparsity could prove useful for fusing additional irregular data sources in future extensions.
Load-bearing premise
That enforcing invariance to valid-observation selection via Barlow Twins, global shuffling, and mix-based regulation produces embeddings that actually improve downstream performance across unseen geographic regions and task distributions.
What would settle it
A benchmark on a held-out geographic region or new task distribution where TESSERA embeddings require substantially more labels or fail to exceed the accuracy of existing methods.
read the original abstract
Satellite Earth-observation (EO) time series in the optical and microwave ranges of the electromagnetic spectrum are often irregular due to orbital patterns and cloud obstruction. Compositing addresses these issues but loses information with respect to vegetation phenology, which is critical for many downstream tasks. Instead, we present TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) EO time series that learns robust, label-efficient embeddings. During model training, TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations. We employ two key regularizers: global shuffling to decorrelate spatial neighborhoods and mix-based regulation to improve invariance under extreme sparsity. We find that for diverse classification, segmentation, and regression tasks, TESSERA embeddings deliver state-of-the-art accuracy with high label efficiency, often requiring only a small task head and minimal computation. To democratize access, adhere to FAIR - principles, and simplify use, we release global, annual, 10m, pixel-wise int8 embeddings together with open weights/code and lightweight adaptation heads, thus providing practical tooling for large-scale retrieval and inference at planetary scale. All code and data are available at: https://github.com/ucam-eo/tessera.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TESSERA, a pixel-wise foundation model for multi-modal (Sentinel-1/2) Earth-observation time series. It trains embeddings via Barlow Twins loss on sparse random temporal samples, augmented by global shuffling to decorrelate spatial neighborhoods and mix-based regulation for extreme sparsity. The central claim is that these embeddings achieve state-of-the-art accuracy and high label efficiency across classification, segmentation, and regression tasks, often with only a small task head, while the authors release global 10 m annual int8 embeddings, open weights, code, and lightweight adaptation heads.
Significance. If the performance claims survive geographic separation, the work would be significant for EO foundation modeling by demonstrating that invariance to valid-observation selection can yield label-efficient representations without compositing losses. The public release of planetary-scale embeddings and reproducible code is a clear strength that supports downstream use and verification.
major comments (2)
- [§4] §4 (Experimental Setup): the manuscript does not describe continent-level or other strict geographic hold-out splits between training and test pixels. Given documented spatial autocorrelation in EO data, the reported SOTA numbers on downstream tasks could reflect leakage rather than the claimed invariance; explicit geographic separation is required to substantiate the central generalization claim.
- [§4.3] §4.3 and Table 2: the quantitative baselines, error bars, and full ablation results for the Barlow Twins + shuffling + mix combination are not presented with sufficient detail to verify the label-efficiency and accuracy assertions; without these, the translation from training-time invariance to out-of-region utility remains unproven.
minor comments (2)
- [§3.2] Notation in §3.2: the precise definition of the mix-based regulation term should be written as an explicit equation rather than described in prose to improve reproducibility.
- [Figure 3] Figure 3 caption: clarify whether the visualized embeddings are before or after the small task head, and add scale bars for the geographic examples.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Experimental Setup): the manuscript does not describe continent-level or other strict geographic hold-out splits between training and test pixels. Given documented spatial autocorrelation in EO data, the reported SOTA numbers on downstream tasks could reflect leakage rather than the claimed invariance; explicit geographic separation is required to substantiate the central generalization claim.
Authors: We agree that spatial autocorrelation poses a risk of data leakage in EO tasks and that continent-level or other strict geographic hold-outs provide stronger evidence for generalization. The original experiments relied on random pixel-level splits within the evaluation datasets to focus on label efficiency under the invariance training regime. In the revised manuscript we will add explicit geographic separation experiments (e.g., training on European tiles and testing on African and Asian tiles) and report the corresponding downstream metrics. These results will be incorporated into Section 4 and the supplementary material. revision: yes
-
Referee: [§4.3] §4.3 and Table 2: the quantitative baselines, error bars, and full ablation results for the Barlow Twins + shuffling + mix combination are not presented with sufficient detail to verify the label-efficiency and accuracy assertions; without these, the translation from training-time invariance to out-of-region utility remains unproven.
Authors: We acknowledge that additional quantitative detail would strengthen the claims. The current Table 2 and Section 4.3 present the main baseline comparisons, but we will expand the revision to include (i) standard error bars computed over multiple random seeds, (ii) a complete ablation table isolating the contribution of global shuffling and mix-based regularization, and (iii) further baseline methods. These additions will be placed in Section 4.3 with supporting figures moved to the appendix. revision: yes
Circularity Check
No significant circularity detected in derivation or claims
full rationale
The paper trains TESSERA using the established Barlow Twins loss combined with sparse random temporal sampling, global shuffling, and mix-based regularization to promote invariance to valid observation selection in EO time series. Downstream evaluations on classification, segmentation, and regression tasks use independent task heads and report empirical accuracies without any equations or results reducing by construction to author-fitted parameters or self-referential definitions. No self-citation chains are invoked as load-bearing uniqueness theorems, and the central invariance claim is grounded in the training objective rather than renaming or smuggling prior author ansatzes. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Barlow Twins loss plus the stated regularizers produces embeddings invariant to the selection of valid temporal observations
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
TESSERA uses Barlow Twins and sparse random temporal sampling to enforce invariance to the selection of valid observations... global shuffling to decorrelate spatial neighborhoods and mix-based regulation
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dual-branch Transformer encoders... fused embedding... 128-dimensional pixel embedding
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 6 Pith papers
-
Better Together: Evaluating the Complementarity of Earth Embedding Models
Fusing embeddings from four Earth models (AlphaEarth, Tessera, GeoCLIP, SatCLIP) outperforms the best single model on four of six tasks, with gains depending on task and location.
-
FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes
FLUXtrapolation is a benchmark for domain generalization in ecosystem flux upscaling using temporal, spatial, and temperature-based extrapolation scenarios, with pilot results showing model separation on tail and mult...
-
Agentic AI for Remote Sensing: Technical Challenges and Research Directions
Agentic AI faces structural challenges in remote sensing due to geospatial data properties and workflow constraints, requiring EO-native agents built around structured state, tool-aware reasoning, and validity-aware e...
-
Agentic AI for Remote Sensing: Technical Challenges and Research Directions
Agentic AI for remote sensing requires new designs centered on structured geospatial state, tool-aware reasoning, verifier-guided execution, and physical validity rather than generic extensions.
-
Structure-Semantic Decoupled Modulation of Global Geospatial Embeddings for High-Resolution Remote Sensing Mapping
SSDM decouples global geospatial embeddings into structural modulation and semantic injection pathways to improve accuracy and consistency in high-resolution remote sensing land cover mapping.
-
Location Is All You Need: Continuous Spatiotemporal Neural Representations of Earth Observation Data
LIANet encodes multi-temporal Earth observation data into a coordinate-based neural field that supports label-only fine-tuning for downstream tasks without access to raw imagery.
Reference graph
Works this paper leans on
-
[1]
Lightweight temporal self-attention for classifying satellite images time series
S AINTE FARE GARNOT , V., AND LANDRIEU , L. Lightweight temporal self-attention for classifying satellite images time series. In Lecture Notes in Computer Science (12 2020), pp. 171–181
work page 2020
-
[2]
S ANTORO , M., AND CARTUS , O. Esa biomass climate change initiative (biomasscci): Global datasets of forest above-ground biomass for the years 2010, 2015, 2016, 2017, 2018, 2019, 2020 and 2021, v5.01, 2024
work page 2010
-
[3]
Self-supervised vision transformers for land-cover segmentation and classification
S CHEIBENREIF , L., H ANNA , J., M OMMERT , M., AND BORTH , D. Self-supervised vision transformers for land-cover segmentation and classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2022), pp. 1422–1431
work page 2022
-
[4]
Prithvi wxc: Foundation model for weather and climate.arXiv preprint arXiv:2409.13598, 2024
S CHMUDE , J., ROY, S., T ROJAK , W., JAKUBIK , J., C IVITARESE , D. S., S INGH , S., K UEHN - ERT, J., A NKUR , K., G UPTA, A., P HILLIPS , C. E., K IENZLER , R., S ZWARCMAN , D., GAUR, V., S HINDE , R., L AL, R., S ILVA, A. D., D IAZ , J. L. G., J ONES , A., P FREUND - SCHUH , S., L IN, A., S HESHADRI , A., N AIR , U., A NANTHARAJ , V., H AMANN , H., W ...
-
[5]
S HENKIN , A., C HANDLER , C. J., B OYD, D. S., J ACKSON , T., D ISNEY , M., M AJALAP , N., NILUS , R., F OODY, G., BIN JAMI , J., R EYNOLDS , G., W ILKES , P., CUTLER , M. E. J., VAN DER HEIJDEN , G. M. F., B URSLEM , D. F. R. P., C OOMES , D. A., B ENTLEY , L. P., AND MALHI , Y. The World’s Tallest Tropical Tree in Three Dimensions.Front. For. Glob. Cha...
work page 2019
-
[6]
S KIDMORE , A. K., C OOPS , N. C., N EINAVAZ, E., A LI, A., S CHAEPMAN , M. E., P A- GANINI , M., K ISSLING , W. D., V IHERVAARA , P., D ARVISHZADEH , R., F EILHAUER , H., FERNANDEZ , M., F ERN ´ANDEZ , N., G ORELICK , N., G EIJZENDORFFER , I., H EIDEN , U., HEURICH , M., H OBERN , D., H OLZWARTH , S., M ULLER -KARGER , F. E., V AN DE KER- CHOVE , R., L A...
work page 2021
-
[7]
J., F LEMING , L., AND GEACH , J
S MITH , M. J., F LEMING , L., AND GEACH , J. E. Earthpt: a time series foundation model for earth observation, 2024, 2309.07207
-
[8]
S PRADLIN , C. S., C ARABALLO -V EGA , J. A., L I, J., C ARROLL , M. L., G ONG , J., AND MONTESANO , P. M. Satvision-toa: A geospatial foundation model for coarse-resolution all- sky remote sensing imagery, 2024, 2411.17000
-
[9]
S TOJNIC , V., AND RISOJEVIC , V. Self-supervised learning of remote sensing scene represen- tations using contrastive multiview coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops(June 2021), pp. 1182–1191. 24
work page 2021
-
[10]
Ringmo: A remote sens- ing foundation model with masked image modeling
S UN, X., W ANG , P., L U, W., Z HU, Z., L U, X., H E, Q., L I, J., R ONG , X., Y ANG , Z., CHANG , H., H E, Q., Y ANG , G., W ANG , R., L U, J., AND FU, K. Ringmo: A remote sens- ing foundation model with masked image modeling. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–22
work page 2023
-
[11]
S ZWARCMAN , D., R OY, S., F RACCARO , P., G ISLASON , P. E., B LUMENSTIEL , B., GHOSAL , R., DE OLIVEIRA , P. H., DE SOUSA ALMEIDA , J. L., S EDONA , R., K ANG , Y., C HAKRABORTY , S., W ANG , S., G OMES , C., K UMAR , A., T RUONG , M., G ODWIN , D., L EE, H., H SU, C.-Y., A SANJAN , A. A., M UJECI , B., S HIDHAM , D., K EENAN , T., AREVALO , P., L I, W....
-
[12]
T AN, J., Z HANG , C., D ANG , B., AND LI, Y. Towards privacy-preserved pre-training of re- mote sensing foundation models with federated mutual-guidance learning, 2025, 2503.11051
-
[13]
T ANASE , M. A., B ELENGUER -PLOMER , M. A., R OTETA , E., B ASTARRIKA , A., WHEELER , J., F ERN ´ANDEZ -C ARRILLO , ´A., T ANSEY , K., W IEDEMANN , W., N AVRATIL , P., L OHBERGER , S., S IEGERT , F., AND CHUVIECO , E. Burned Area Detection and Map- ping: Intercomparison of Sentinel-1 and Sentinel-2 Based Algorithms over Tropical Africa. Remote Sensing 12...
work page 2020
-
[14]
Cross-scale mae: a tale of multi-scale exploitation in remote sensing
T ANG , M., C OZMA , A., G EORGIOU , K., AND QI, H. Cross-scale mae: a tale of multi-scale exploitation in remote sensing. In Proceedings of the 37th International Conference on Neural Information Processing Systems (Red Hook, NY , USA, 2023), NIPS ’23, Curran Associates Inc
work page 2023
-
[15]
T AO, C., Q I, J., Z HANG , G., Z HU, Q., L U, W., AND LI, H. Tov: The original vision model for optical remote sensing image understanding via self-supervised learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16 (2023), 4916–4930
work page 2023
-
[16]
Terraclass: Mapeamento de uso e ocupac ¸˜ao da terra
T ERRA CLASS . Terraclass: Mapeamento de uso e ocupac ¸˜ao da terra. Accessed: July 24, 2025
work page 2025
-
[17]
T IAN , J., L EI, J., Z HANG , J., X IE, W., AND LI, Y. Swimdiff: Scene-wide matching con- trastive learning with diffusion constraint for remote sensing image, 2024, 2401.05093
-
[18]
T OLAN , J., Y ANG , H.-I., N OSARZEWSKI , B., C OUAIRON , G., V O, H. V., B RANDT , J., S PORE , J., M AJUMDAR , S., H AZIZA , D., V AMARAJU , J., M OUTAKANNI , T., B O- JANOWSKI , P., J OHNS , T., W HITE , B., T IECKE , T., AND COUPRIE , C. Very high reso- lution canopy height maps from RGB imagery using self-supervised vision transformer and convolutio...
work page 2024
-
[19]
Lightweight, pre-trained transformers for remote sensing timeseries,
T SENG , G., C ARTUYVELS , R., Z VONKOV, I., P UROHIT , M., R OLNICK , D., AND KERNER , H. Lightweight, Pre-trained Transformers for Remote Sensing Timeseries, Feb. 2024. arXiv:2304.14065 [cs]
-
[20]
R., S HELHAMER , E., K ERNER , H., AND ROLNICK , D
T SENG , G., F ULLER , A., R EIL , M., H ERZOG , H., B EUKEMA , P., B ASTANI , F., G REEN , J. R., S HELHAMER , E., K ERNER , H., AND ROLNICK , D. Galileo: Learning global & local features of many remote sensing modalities, 2025, 2502.09356
-
[21]
Pooch: A friend to fetch your data files.Journal of Open Source Software 5, 45 (Jan
U IEDA , L., S OLER , S., R AMPIN , R., VAN KEMENADE , H., T URK , M., S HAPERO , D., B AN- IHIRWE , A., AND LEEMAN , J. Pooch: A friend to fetch your data files.Journal of Open Source Software 5, 45 (Jan. 2020), 1943
work page 2020
-
[22]
U NIVERSITY OF CAMBRIDGE CENTRE FOR EARTH OBSERVATION . Ucam-eo project. https://github.com/ucam-eo. Accessed: 2025-07-22
work page 2025
-
[23]
N., KAISER , L., AND POLOSUKHIN , I
V ASWANI , A., S HAZEER , N., P ARMAR , N., U SZKOREIT , J., J ONES , L., G OMEZ , A. N., KAISER , L., AND POLOSUKHIN , I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Red Hook, NY , USA, 2017), NIPS’17, Curran Associates Inc., p. 6000–6010
work page 2017
-
[24]
V ERHEGGHEN , A., E VA, H., C ECCHERINI , G., A CHARD , F., G OND , V., G OURLET - FLEURY , S., AND CERUTTI , P. O. The Potential of Sentinel Satellites for Burnt Area Mapping and Monitoring in the Congo Basin Forests. Remote Sensing 8, 12 (Dec. 2016), 986. 25
work page 2016
-
[25]
Vm0047 afforestation, reforestation, and reveg- etation, v1.1
V ERRA . Vm0047 afforestation, reforestation, and reveg- etation, v1.1. https://verra.org/methodologies/ vm0047-afforestation-reforestation-and-revegetation-v1-1/ , 2025. Verra Verified Carbon Standard (VCS) Program methodology
work page 2025
-
[26]
H., D ALAGNOL , R., C ARTER , G., H IRYE , M
W AGNER , F. H., D ALAGNOL , R., C ARTER , G., H IRYE , M. C. M., G ILL , S., T AKOUGOUM , L. B. S., F AVRICHON , S., K ELLER , M., O METTO , J. P. H. B., A LVES, L., C REZE , C., GEORGE -C HACON , S. P., L I, S., L IU, Z., M ULLISSA , A., Y ANG , Y., S ANTOS , E. G., WORDEN , S. R., B RANDT , M., C IAIS , P., H AGEN , S. C., AND SAATCHI , S. High reso- l...
-
[27]
W ALDMANN , L., S HAH , A., WANG , Y., LEHMANN , N., S TEWART, A. J., X IONG , Z., Z HU, X. X., B AUER , S., AND CHUANG , J. Panopticon: Advancing any-sensor foundation models for earth observation, 2025, 2503.10845
-
[28]
W ANG , C., S ONG , C., S CHROEDER , T. A., W OODCOCK , C. E., P AVELSKY, T. M., H AN, Q., AND YAO, F. Interpretable Multi-Sensor Fusion of Optical and SAR Data for GEDI-Based Canopy Height Mapping in Southeastern North Carolina. Remote Sensing 17, 9 (Jan. 2025), 1536
work page 2025
-
[29]
Hypersigma: Hyperspectral intelligence comprehension foundation model, 2025, 2406.11519
W ANG , D., H U, M., J IN, Y., M IAO, Y., Y ANG , J., X U, Y., Q IN, X., M A, J., S UN, L., LI, C., F U, C., C HEN , H., H AN, C., Y OKOYA, N., Z HANG , J., X U, M., L IU, L., Z HANG , L., W U, C., D U, B., T AO, D., AND ZHANG , L. Hypersigma: Hyperspectral intelligence comprehension foundation model, 2025, 2406.11519
-
[30]
An empirical study of remote sensing pretraining
W ANG , D., Z HANG , J., D U, B., X IA, G.-S., AND TAO, D. An empirical study of remote sensing pretraining. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–20
work page 2023
-
[31]
Samrs: Scaling- up remote sensing segmentation dataset with segment anything model
W ANG , D., Z HANG , J., D U, B., X U, M., L IU, L., TAO, D., AND ZHANG , L. Samrs: Scaling- up remote sensing segmentation dataset with segment anything model. In Advances in Neural Information Processing Systems (2023), vol. 36, pp. 8815–8827
work page 2023
-
[32]
Mtp: Advancing remote sensing foundation model via multi-task pretraining, 2024, 2403.13430
W ANG , D., Z HANG , J., X U, M., L IU, L., WANG , D., G AO, E., H AN, C., G UO, H., D U, B., TAO, D., AND ZHANG , L. Mtp: Advancing remote sensing foundation model via multi-task pretraining, 2024, 2403.13430
-
[33]
Advanc- ing plain vision transformer toward remote sensing foundation model
W ANG , D., Z HANG , Q., X U, Y., Z HANG , J., D U, B., T AO, D., AND ZHANG , L. Advanc- ing plain vision transformer toward remote sensing foundation model. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–15
work page 2023
-
[34]
Harnessing massive satellite imagery with efficient masked image modeling, 2025, 2406.11933
W ANG , F., W ANG , H., W ANG , D., G UO, Z., Z HONG , Z., L AN, L., Y ANG , W., AND ZHANG , J. Harnessing massive satellite imagery with efficient masked image modeling, 2025, 2406.11933
-
[35]
Roma: Scaling up mamba-based foundation models for remote sensing, 2025, 2503.10392
W ANG , F., WANG , H., W ANG , Y., WANG , D., C HEN , M., Z HAO, H., S UN, Y., WANG , S., LAN, L., YANG , W., AND ZHANG , J. Roma: Scaling up mamba-based foundation models for remote sensing, 2025, 2503.10392
-
[36]
W ANG , Y., A LBRECHT , C. M., B RAHAM , N. A. A., L IU, C., X IONG , Z., AND ZHU, X. X. Decoupling common and unique representations for multimodal self-supervised learn- ing, 2024, 2309.05300
-
[37]
W ANG , Y., A LBRECHT , C. M., AND ZHU, X. X. Self-supervised vision transformers for joint sar-optical representation learning, 2022, 2204.05381
-
[38]
W ANG , Y., A LBRECHT , C. M., AND ZHU, X. X. Multilabel-guided soft contrastive learning for efficient earth observation pretraining. IEEE Transactions on Geoscience and Remote Sensing 62 (2024), 1–16
work page 2024
- [39]
-
[40]
W ANG , Y., H ERN ´ANDEZ , H. H., A LBRECHT , C. M., AND ZHU, X. X. Feature guided masked autoencoder for self-supervised learning in remote sensing, 2023, 2310.18653
-
[41]
J., D UJARDIN , T., B OUNTOS , N
W ANG , Y., X IONG , Z., L IU, C., S TEWART, A. J., D UJARDIN , T., B OUNTOS , N. I., Z A- VRAS , A., G ERKEN , F., P APOUTSIS , I., L EAL -TAIX ´E, L., AND ZHU, X. X. Towards a unified copernicus foundation model for earth vision, 2025, 2503.11849. 26
-
[42]
W ANG , Y., Z HANG , T., Z HAO, L., H U, L., W ANG , Z., N IU, Z., C HENG , P., C HEN , K., ZENG , X., W ANG , Z., W ANG , H., AND SUN, X. Ringmo-lite: A remote sensing multi-task lightweight network with cnn-transformer hybrid framework, 2023, 2309.09003
-
[43]
Rs-dfm: A remote sensing distributed foundation model for diverse downstream tasks, 2024, 2406.07032
W ANG , Z., C HENG , P., T IAN , P., W ANG , Y., C HEN , M., D UAN, S., W ANG , Z., L I, X., AND SUN, X. Rs-dfm: A remote sensing distributed foundation model for diverse downstream tasks, 2024, 2406.07032
-
[44]
W ANYAN , X., S ENEVIRATNE , S., S HEN , S., AND KIRLEY , M. Extending global-local view alignment for self-supervised learning with remote sensing imagery, 2024, 2303.06670
-
[45]
Remote sensing for agricultural applications: A meta-review
W EISS , M., J ACOB , F., AND DUVEILLER , G. Remote sensing for agricultural applications: A meta-review. Remote Sensing of Environment 236 (Jan. 2020), 111402
work page 2020
-
[46]
S., B RANNOCK , J., D ISSEN , J., K EOWN , P., S ZURA , K., B ROWN , O
W ILLETT , D. S., B RANNOCK , J., D ISSEN , J., K EOWN , P., S ZURA , K., B ROWN , O. B., AND SIMONSON , A. Noaa open data dissemination: Petabyte-scale earth system data in the cloud. Science Advances 9, 38 (2023), eadh0032
work page 2023
-
[47]
Cat- sam: Conditional tuning for few-shot adaptation of segment anything model
X IAO, A., X UAN, W., Q I, H., X ING , Y., REN, R., Z HANG , X., S HAO, L., AND LU, S. Cat- sam: Conditional tuning for few-shot adaptation of segment anything model. arXiv preprint arXiv:2402.03631 (2024)
-
[48]
Founda- tion models for remote sensing and earth observation: A survey, 2025, 2410.16602
X IAO, A., X UAN, W., WANG , J., H UANG , J., T AO, D., L U, S., AND YOKOYA, N. Founda- tion models for remote sensing and earth observation: A survey, 2025, 2410.16602
-
[49]
Unified perceptual parsing for scene understanding
X IAO, T., L IU, Y., Z HOU , B., J IANG , Y., AND SUN, J. Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision (ECCV)(2018), pp. 418–434
work page 2018
- [50]
- [51]
-
[52]
X U, H., M AN, Y., Y ANG , M., W U, J., Z HANG , Q., AND WANG , J. Analytical insight of earth: A cloud-platform of intelligent computing for geospatial big data, 2023, 2312.16385
-
[53]
Y AN, Y., H ONG , S., C HEN , A., P E ˜NUELAS , J., A LLEN , C. D., H AMMOND , W. M., M UN- SON , S. M., M YNENI , R. B., AND PIAO, S. Satellite-based evidence of recent decline in global forest recovery rate from tree mortality events. Nature Plants (2025), 1–12
work page 2025
-
[54]
Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images
Y AN, Z., L I, J., L I, X., Z HOU , R., Z HANG , W., F ENG , Y., D IAO, W., F U, K., AND SUN, X. Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–16
work page 2023
-
[55]
Y AO, F., L U, W., YANG , H., X U, L., L IU, C., H U, L., Y U, H., L IU, N., D ENG , C., T ANG , D., C HEN , C., Y U, J., S UN, X., AND FU, K. Ringmo-sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling. IEEE Trans- actions on Geoscience and Remote Sensing 61 (2023), 1–21
work page 2023
-
[56]
A global dataset of forest regrowth following wildfires
Z ANG , J., Q IU, F., AND ZHANG , Y. A global dataset of forest regrowth following wildfires. Sci Data 11, 1 (Sept. 2024), 1052
work page 2024
- [57]
-
[58]
Z HANG , L., Z HAO, Y., D ONG , R., Z HANG , J., Y UAN, S., C AO, S., C HEN , M., Z HENG , J., LI, W., L IU, W., Z HANG , W., F ENG , L., AND FU, H. A 2-mae: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder, 2024, 2406.08079
-
[59]
Z HANG , M., L IU, Q., AND WANG , Y. Ctxmim: Context-enhanced masked image modeling for remote sensing image understanding, 2024, 2310.00022
-
[60]
Z HANG , T., G AO, P., D ONG , H., Z HUANG , Y., W ANG , G., Z HANG , W., AND CHEN , H. Consecutive pre-training: A knowledge transfer learning strategy with relevant unlabeled data for remote sensing domain. Remote Sensing 14, 22 (2022). 27
work page 2022
-
[61]
Uv-sam: adapting segment anything model for urban village identification
Z HANG , X., L IU, Y., L IN, Y., L IAO, Q., AND LI, Y. Uv-sam: adapting segment anything model for urban village identification. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artifi- cial Intelligence and Fourteenth Symposium on Educational Advances in Artificial In...
work page 2024
-
[62]
Z HAO, Z., D ONG , L., W U, S., X IAO, X., ET AL . Review of remote sensing-based methods for forest aboveground biomass estimation: Progress, challenges, and prospects. Forests 14, 6 (2023), 1086
work page 2023
-
[63]
Changen2: Multi-temporal remote sensing generative change foundation model
Z HENG , Z., E RMON , S., K IM, D., Z HANG , L., AND ZHONG , Y. Changen2: Multi-temporal remote sensing generative change foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence 47, 2 (2025), 725–741
work page 2025
-
[64]
Z HU, Z. Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications. ISPRS Journal of Photogrammetry and Remote Sensing 130 (2017), 370–384. Acknowledgments We gratefully acknowledge help from AMD Inc., Tarides, Jane Street, the Dawn supercomputing team at Cambridge, the Aalto University Science-IT pro...
work page 2017
-
[65]
at a given point over time. Note that d-pixels can be sparse and are accompanied by a mask vector mi,j of size T that indicates the timesteps for which there are valid data, with a value 1 indicating that the corresponding row in Pi,j is valid. S1 The d-pixel Representation We represent each 1- m pixel in the time series of images from an- nual multispect...
-
[66]
For each view, independent sampling of a fixed number of valid observation dates from the annual Sentinel-2 time series (10 spectral bands)
-
[67]
For each view, independent sampling of a fixed number of valid observation dates from the annual Sentinel-1 time series (2 polarizations). These views represent different, valid, but inherently incomplete glimpses of the pixel’s true temporal-spectral evolution, akin to observing the same location through intermittent cloud cover or from different satelli...
work page 2017
-
[68]
Unlike pre-training, no spatial downsampling is performed at this stage
The full Sentinel-1 and Sentinel-2 time series data at 1- meter resolution are acquired and pre-processed to form d-pixels. Unlike pre-training, no spatial downsampling is performed at this stage
-
[69]
A fixed number of 40 timesteps is sampled from the valid observations within the year for both Sentinel-1 and Sentinel-2 data, along with their DOY positional encodings
-
[70]
These sampled time series are fed into their respective frozen TESSERA encoders
-
[71]
The outputs from the S1 and S2 encoders are fused by the MLP, producing a 128- dimensional embedding vector for that pixel for that year. This process is repeated for all land pixels globally to create an annual embeddings map of shape (H, W, 128), where H and W are the dimensions of the global 1- meter grid. Scaling with data and network size To identify...
-
[72]
Download Embeddings for the Region of Interest : The G EOTESSERA Python li- brary ( 154) allows users to download embeddings for a desired region and year in the form of a numpy array
-
[73]
Prepare labelled Downstream Data: The labelled dataset for the target task (e.g., pixel- level crop-type labels, canopy height measurements, or land use change polygons) is pre- pared
-
[74]
This head takes the extracted TESSERA embeddings as input
Design Task-Specific Head : A lightweight, task-specific neural network module (the ”head”) is designed. This head takes the extracted TESSERA embeddings as input. • For pixel-wise classification (e.g., crop classification), the head is typically a shallow MLP (1-3 layers) ending in a softmax output layer. • For pixel-wise regression (e.g., canopy height ...
-
[75]
Train Downstream Head: Only the parameters of this newly defined task head are trained using the extracted TESSERA embeddings as input features and the corresponding labels. Standard supervised learning techniques, optimizers (e.g., Adam), and task-appropriate loss functions (e.g., Cross-Entropy for classification, Mean Squared Error for regression) are u...
-
[76]
Evaluation: Once the head is trained, inference is performed on a test set by extracting TESSERA embeddings for the test samples and passing them through the trained head. Performance is evaluated using standard metrics relevant to the task. This workflow allows the use of TESSERA embeddings in a range of diverse applications, demon- strating its role as ...
work page 2022
-
[77]
TESSERA: difference in relative heights of GEDI estimates of RH90 and RH10 converted to AGB using 15.502 × + 160.5
-
[78]
ETH canopy height converted using 3.4 ×
-
[79]
CTrees canopy height converted using 1.806 × + 44.9
-
[80]
ESA AGB converted using 0.407 × + 34.0
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.