WATCH: Wide-Area Archaeological Site Tracking for Change Detection

Allen Kim; Andrew Hassanali; Andrew Zolli; Caleb Robinson; Girmaw Abebe Tadesse; Inbal Becker-Reshef; Jonathan Chemla; Juan Lavista Ferres; Titien Bartette; Yves Ubelmann

arxiv: 2605.08160 · v1 · submitted 2026-05-04 · 💻 cs.CV · cs.AI

WATCH: Wide-Area Archaeological Site Tracking for Change Detection

Girmaw Abebe Tadesse , Titien Bartette , Andrew Hassanali , Allen Kim , Jonathan Chemla , Andrew Zolli , Yves Ubelmann , Caleb Robinson

show 2 more authors

Inbal Becker-Reshef Juan Lavista Ferres

This is my paper

Pith reviewed 2026-05-12 01:20 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords satellite imagerychange detectionarchaeological monitoringcultural heritagefoundation modelstemporal localizationunsupervised methods

0 comments

The pith

Unsupervised scoring of satellite image embeddings localizes archaeological site disturbances to the month without needing event labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WATCH to detect when cultural heritage sites are altered by comparing PlanetScope satellite mosaics month by month. It evaluates three scoring systems on 1943 sites in Afghanistan: a training-free Temporal Embedding Distance that measures deviations from a local reference, an ensemble of self-supervised reconstruction and novelty signals, and a weakly supervised model trained on sparse labels. The unsupervised methods outperform the label-dependent one, reaching 55 percent exact-month recall with one embedding model and 92.5 percent recall within a three-month window with others. Cross checks on sites in Syria, Turkey, Pakistan, and Egypt show the approach generalizes, while bias analysis reveals some methods flag changes early and others confirm them after the fact.

Core claim

WATCH performs month-level change-event localization on PlanetScope mosaics (4.7 m/px, 2017-2024) through three complementary scorers: Temporal Embedding Distance (TED) that quantifies month-to-month deviations from a local temporal reference using foundation-model embeddings, an ensemble of Self-Supervised Change Detection (SSCD) signals from reconstruction, forecasting, and latent novelty, and a Weakly Supervised (WS) temporal localization model. On 1943 Afghan sites, TED with SatMAE embeddings achieves 55 percent exact-month recall while TED with GeoRSCLIP, CLIP, or Satlas-Pretrain reaches 92.5 percent recall at three-month tolerance; unsupervised TED and SSCD consistently beat the weakly

What carries the argument

Temporal Embedding Distance (TED), a training-free scorer that measures deviations between monthly satellite patch embeddings and a local temporal reference built from the same site.

If this is right

Unsupervised TED and SSCD can be applied to new regions without collecting event-month labels.
Different foundation-model embeddings produce distinct temporal biases, allowing selection for early-warning or post-event confirmation needs.
Handcrafted spectral and texture features remain competitive for exact-month detection when weak labels are available.
The same pipeline supports scalable monitoring of cultural heritage across wide areas once the initial reference period is established.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Combining TED and SSCD could reduce both missed early signals and late confirmations in operational systems.
The cross-regional results suggest the method may transfer to monitoring other sparse, wide-area phenomena such as illegal construction or environmental damage.
If integrated with near-real-time satellite feeds, the framework could generate alerts for site managers months before conventional inspection.
The observed embedding-specific biases indicate that pre-training data domain (e.g., general vs. remote-sensing) influences whether detection leans early or late.

Load-bearing premise

The sparse event-month labels for the 1943 Afghan sites are accurate and representative of actual disturbance timing, and visual changes visible at 4.7 m per pixel resolution reliably match the recorded events.

What would settle it

Independent field verification or higher-resolution imagery for a random subset of the sites that shows the exact-month recall falling substantially below 55 percent or the three-month recall below 90 percent.

Figures

Figures reproduced from arXiv: 2605.08160 by Allen Kim, Andrew Hassanali, Andrew Zolli, Caleb Robinson, Girmaw Abebe Tadesse, Inbal Becker-Reshef, Jonathan Chemla, Juan Lavista Ferres, Titien Bartette, Yves Ubelmann.

**Figure 2.** Figure 2: (a) Overview of the archaeological sites in Afghanistan [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Recall (test split) as a function of temporal tolerance [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Block diagram of the WATCH global site inference pipeline. Monthly PlanetScope mosaics are gridded into [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Monthly GeoRSCLIP scores under three scoring regimes (TED, SSCD, WS) for representative grid cells at four global [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 10.** Figure 10: Directional recall at m = 3 (all split), showing positive (late-detection) versus negative (early-detection) margin recall for each embedding and approach. disturbance dates. TED and SSCD produce comparable, spiky score trajectories that respond to both genuine change events and nuisance variability throughout the timeline. In several cases unsupervised methods generate elevated scores near the recorded … view at source ↗

**Figure 8.** Figure 8: Examples of PlanetScope monthly mosaics for archae [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Recall at m = 3 (test split) grouped by embedding, colored by scoring approach. Satlas-Pretrain and GeoRSCLIP consistently lead across approaches (see Table IV in the main text), while DINOv3 and Handcrafted show more methoddependent variation. APPENDIX C PER-SITE SCORE TRAJECTORIES [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 11.** Figure 11: Monthly GeoRSCLIP scores under three scoring regimes (TED, SSCD, WS) for four looted sites with known [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Feature variation over time across all Afghanistan [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

read the original abstract

Monitoring archaeological sites at scale is vital for protecting cultural heritage, yet pinpointing when disturbances occur remains difficult because visual cues are subtle and ground-truth data are sparse. We introduce WATCH, a framework for month-level change-event localization over PlanetScope satellite mosaics (2017-2024, 4.7 m/px) that supports three complementary scoring approaches: (i) Temporal Embedding Distance (TED), a training-free method that scores month-to-month deviations from a local temporal reference; (ii) Self-Supervised Change Detection (SSCD), an ensemble of reconstruction, forecasting, and latent-novelty signals; and (iii) a Weakly Supervised (WS) temporal localization model trained with sparse event-month labels. We benchmark WATCH on 1,943 archaeological sites in Afghanistan using embeddings from six foundation models (CLIP, GeoRSCLIP, SatMAE, Prithvi-EO-2.0, DINOv3, and Satlas-Pretrain) alongside a handcrafted spectral and texture baseline, and assess cross-regional generalization on sites in Syria, Turkey, Pakistan, and Egypt. The unsupervised approaches (TED, SSCD) consistently outperform the weakly supervised alternative. TED with SatMAE achieves the highest exact-month recall (55% at m=0), while TED with GeoRSCLIP, CLIP, or Satlas-Pretrain reaches 92.5% within a three-month tolerance (m=3). Handcrafted features remain competitive for exact-month detection under weak supervision. Our directional margin analysis reveals systematic temporal biases: SSCD paired with GeoRSCLIP or Prithvi-EO-2.0 exhibits the strongest early-warning profile, detecting anomalies before the recorded event, while TED favors confirmation-oriented detection after a change has materialized. These results show that satellite imagery combined with foundation-model embeddings enables scalable, decision-relevant heritage monitoring. Code: https://github.com/microsoft/WATCH

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WATCH shows unsupervised embedding-based methods can localize monthly site disturbances in PlanetScope data better than weakly supervised training on the same labels, with useful cross-region tests and bias analysis, but the numbers stand or fall on label accuracy.

read the letter

The main thing to know is that this paper gives a workable pipeline for month-level change detection at archaeological sites using off-the-shelf foundation model embeddings on PlanetScope mosaics. On 1,943 Afghan sites it reports TED with SatMAE at 55% exact-month recall and 92.5% within a three-month window for several other embeddings, with unsupervised TED and SSCD beating the weakly supervised baseline across the board. The directional margin analysis also shows clear method-specific timing biases, which is a practical detail for anyone who needs early warning versus confirmation.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the WATCH framework for month-level change-event localization in archaeological sites using PlanetScope satellite mosaics (4.7 m/px, 2017-2024). It defines three scoring approaches—Temporal Embedding Distance (TED, training-free), Self-Supervised Change Detection (SSCD, ensemble of reconstruction/forecasting/novelty signals), and Weakly Supervised (WS) temporal localization—and benchmarks them on 1,943 Afghan sites with embeddings from six foundation models (CLIP, GeoRSCLIP, SatMAE, Prithvi-EO-2.0, DINOv3, Satlas-Pretrain) plus a handcrafted baseline. The central claims are that unsupervised methods (TED, SSCD) outperform WS, TED+SatMAE reaches 55% exact-month recall (m=0) and 92.5% within m=3 for several embeddings, with additional cross-regional tests and directional margin analysis of temporal biases.

Significance. If the results hold, the work demonstrates a practical, scalable approach to heritage monitoring that leverages pre-trained embeddings without requiring dense labels. The public code repository and evaluation across multiple foundation models plus cross-regional generalization on sites in Syria, Turkey, Pakistan, and Egypt are clear strengths that support reproducibility and broader applicability. The directional bias analysis further aids method selection for early-warning versus confirmation tasks.

major comments (2)

[§4 (Experiments), Table 2] §4 (Experiments), Table 2: All reported recall figures (e.g., TED+SatMAE at 55% m=0 and 92.5% m=3) are computed directly against the provided sparse event-month labels for the 1,943 sites. No validation of label accuracy, inter-annotator agreement, or sensitivity to timing offsets is presented; systematic reporting lags or attribution errors would invalidate the unsupervised-vs-WS comparison and the headline performance numbers.
[§3.2 (Data and Preprocessing)] §3.2 (Data and Preprocessing): The assumption that visual changes at 4.7 m/px resolution reliably correspond to the recorded disturbance events is load-bearing for interpreting all metrics, yet the manuscript provides no quantitative examples of confirmed visual correspondences, analysis of sub-pixel disturbances, or discussion of spectral ambiguity at this resolution.

minor comments (2)

[Abstract] Abstract: The handcrafted spectral and texture baseline is mentioned as competitive but its exact feature definitions are not listed; these should be specified in the methods for full reproducibility.
[§5 (Cross-regional generalization)] §5 (Cross-regional generalization): The performance variations across Syria, Turkey, Pakistan, and Egypt are summarized but lack per-region tables or error bars, which would clarify generalization strength.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The two major comments identify important gaps in validating label quality and demonstrating visual change correspondence. We address each point below, agree where revisions are warranted, and outline specific changes to strengthen the paper without overstating current results.

read point-by-point responses

Referee: [§4 (Experiments), Table 2] §4 (Experiments), Table 2: All reported recall figures (e.g., TED+SatMAE at 55% m=0 and 92.5% m=3) are computed directly against the provided sparse event-month labels for the 1,943 sites. No validation of label accuracy, inter-annotator agreement, or sensitivity to timing offsets is presented; systematic reporting lags or attribution errors would invalidate the unsupervised-vs-WS comparison and the headline performance numbers.

Authors: We agree that the lack of explicit label validation is a limitation for interpreting absolute performance and the unsupervised-vs-WS comparison. The sparse event-month labels are used as provided for all 1,943 sites; no inter-annotator agreement or independent accuracy check was performed in the original work. Because TED and SSCD are training-free and do not optimize against these labels, their reported alignment with event timings still provides a meaningful unsupervised signal. The weakly supervised model is more directly affected by label noise. To address sensitivity to timing offsets, we will add a sensitivity analysis in the revised §4 that re-computes recall after shifting all labels by ±1 and ±2 months. We will also add a short discussion of potential reporting lags as a source of uncertainty. These additions will qualify the headline numbers while preserving the relative method ordering. revision: partial
Referee: [§3.2 (Data and Preprocessing)] §3.2 (Data and Preprocessing): The assumption that visual changes at 4.7 m/px resolution reliably correspond to the recorded disturbance events is load-bearing for interpreting all metrics, yet the manuscript provides no quantitative examples of confirmed visual correspondences, analysis of sub-pixel disturbances, or discussion of spectral ambiguity at this resolution.

Authors: We concur that explicit evidence of visual correspondence at 4.7 m/px is needed to support metric interpretation. The current manuscript does not include before/after image pairs or quantitative analysis of sub-pixel or spectrally ambiguous cases. In the revision we will add a new subsection (or expanded §3.2) containing representative PlanetScope image pairs for several sites, illustrating visible spatial changes (e.g., new looting pits or structures) that align with the recorded event months. We will also discuss the resolution limits, noting that while many heritage disturbances produce detectable patterns at 4.7 m/px, sub-pixel or purely spectral changes may go undetected and that this constitutes a boundary condition for the approach. These additions will better contextualize both the strengths and the applicability constraints of the results. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking of pre-trained embeddings and change-detection heuristics on held-out sites

full rationale

The paper introduces three scoring approaches (TED, SSCD, WS) and reports recall figures obtained by applying them to PlanetScope imagery of 1,943 labeled Afghan sites plus cross-regional test sets. No equations, fitted parameters, or self-citation chains are used to derive the headline performance numbers; the results are direct empirical measurements against the provided event-month labels. The methods themselves are either training-free (TED), self-supervised (SSCD), or explicitly trained on the sparse labels (WS), with no step that reduces a claimed prediction back to its own inputs by construction. Self-citations, if present, are not load-bearing for the central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the assumption that foundation-model embeddings capture disturbance-relevant visual changes and that the provided event-month labels are reliable ground truth. No new free parameters are introduced beyond standard model selection and tolerance windows; no invented entities.

axioms (2)

domain assumption Foundation model embeddings from CLIP, SatMAE, etc., preserve temporal change signals relevant to archaeological disturbances.
Invoked when using these embeddings for TED and SSCD without additional fine-tuning.
domain assumption Sparse event-month labels accurately reflect the timing of actual site disturbances.
Required for all evaluation metrics including exact-month and m=3 recall.

pith-pipeline@v0.9.0 · 5694 in / 1362 out tokens · 28377 ms · 2026-05-12T01:20:12.718363+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

TED scores the deviation of month t from a robust reference computed over recent history... stemp_i,t = d(z'_i,t, B_i,t)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Satellite evidence of archaeological site looting in Egypt: 2002–2013,

S. Parcak, D. Gathings, C. Childs, G. Mumford, and E. Cline, “Satellite evidence of archaeological site looting in Egypt: 2002–2013,”Antiquity, vol. 90, no. 349, pp. 188–205, 2016

work page 2002
[2]

Monitoring of damages to cultural heritage across Europe using remote sensing and earth observation: Assessment of scientific and grey literature,

B. Cuca, F. Zaina, and D. Tapete, “Monitoring of damages to cultural heritage across Europe using remote sensing and earth observation: Assessment of scientific and grey literature,”Remote Sensing, vol. 15, no. 15, p. 3748, 2023

work page 2023
[3]

Earth observation for the world cultural and natural heritage,

I. D. Negula, R. Sofronie, A. Virsta, and A. Badea, “Earth observation for the world cultural and natural heritage,”Agriculture and Agricultural Science Procedia, vol. 6, pp. 438–445, 2015

work page 2015
[4]

World Heritage in danger: Big data and remote sensing can help protect sites in conflict zones,

N. Levin, S. Ali, D. Crandall, and S. Kark, “World Heritage in danger: Big data and remote sensing can help protect sites in conflict zones,” Global Environmental Change, vol. 55, pp. 97–104, 2019

work page 2019
[5]

Detection of archaeological looting from space: Methods, achievements and challenges,

D. Tapete and F. Cigna, “Detection of archaeological looting from space: Methods, achievements and challenges,”Remote Sensing, vol. 11, no. 20, p. 2389, 2019

work page 2019
[6]

UNESCO world heritage properties in changing and dynamic environments: change detection methods using optical and radar satellite data,

A. Agapiou, “UNESCO world heritage properties in changing and dynamic environments: change detection methods using optical and radar satellite data,”Heritage Science, vol. 9, no. 1, pp. 1–14, 2021

work page 2021
[7]

Mapping patterns of long-term settlement in Northern Mesopotamia at a large scale,

B. H. Menze and J. A. Ur, “Mapping patterns of long-term settlement in Northern Mesopotamia at a large scale,”Proceedings of the National Academy of Sciences, vol. 109, no. 14, pp. E778–E787, 2012

work page 2012
[8]

Detecting looted archaeological sites from satellite image time series,

E. Vincent, M. Saroufim, J. Chemla, Y . Ubelmann, P. Marquis, J. Ponce, and M. Aubry, “Detecting looted archaeological sites from satellite image time series,” inProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 2296–2307

work page 2025
[9]

Satellite- based detection of looted archaeological sites using machine learning,

G. A. Tadesse, T. Bartette, A. Hassanali, A. Kim, J. Chemla, A. Zolli, Y . Ubelmann, C. Robinson, I. Becker-Reshef, and J. L. Ferres, “Satellite- based detection of looted archaeological sites using machine learning,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, March 2026, pp. 840–848

work page 2026
[10]

SatMae: Pre-training transformers for temporal and multi-spectral satellite imagery,

Y . Cong, S. Khanna, C. Meng, P. Liu, E. Rozi, Y . He, M. Burke, D. Lo- bell, and S. Ermon, “SatMae: Pre-training transformers for temporal and multi-spectral satellite imagery,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 197–211

work page 2022
[11]

SatCLIP: Global, general-purpose location embeddings with satellite imagery,

K. Klemmer, E. Rolf, C. Robinson, L. Mackey, and M. Russwurm, “SatCLIP: Global, general-purpose location embeddings with satellite imagery,” inAAAI Conference on Artificial Intelligence, vol. 38, no. 12, 2024, pp. 13 156–13 164

work page 2024
[12]

Sim ´eoni, H

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa, F. Massa, D. Haz- iza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sentana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. J ´egou, P. Labatut, and P. Bojanowski, “DINOv3,” 2025

work page 2025
[13]

Jakubik, S

J. Jakubik, S. Roy, C. E. Phillips, P. Fraccaro, G. Godwin, M. Zadrozny, C. Szwarcman, S. Gomes, S. Nyirjesy, D. Edwardset al., “Foundation models for generalist geospatial artificial intelligence,” arXiv:2310.18660, 2023

work page arXiv 2023
[14]

Fully convolutional siamese networks for change detection,

R. C. Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese networks for change detection,”International Conference on Image Processing (ICIP), 2018

work page 2018
[15]

Deep learning for anomaly detection: A survey,

R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,”ACM Computing Surveys, vol. 54, no. 3, pp. 1–38, 2021

work page 2021
[16]

SatlasPretrain: A large-scale dataset for remote sensing image under- standing,

F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi, “SatlasPretrain: A large-scale dataset for remote sensing image under- standing,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 16 772–16 782

work page 2023
[17]

RS5M and GeoRSClip: A large scale vision-language dataset and a large vision-language model for remote sensing,

Z. Zhang, T. Zhao, Y . Guo, and J. Yin, “RS5M and GeoRSClip: A large scale vision-language dataset and a large vision-language model for remote sensing,”IEEE Transactions on Geoscience and Remote Sensing, 2024

work page 2024
[18]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning (ICML), 2021, pp. 8748–8763

work page 2021
[19]

Prithvi-EO-2.0: A versatile multi-temporal foundation model for earth observation applications,

D. Szwarcman, S. Roy, P. Fraccaro, O. E. G ´ıslason, B. Blumenstiel, R. Ghosal, P. H. De Oliveira, J. L. de Sousa Almeida, R. Sedona, Y . Kang et al., “Prithvi-EO-2.0: A versatile multi-temporal foundation model for earth observation applications,”IEEE Transactions on Geoscience and Remote Sensing, 2025

work page 2025
[20]

Ball,Archaeological gazetteer of Afghanistan

W. Ball,Archaeological gazetteer of Afghanistan. Oxford University Press, 2019

work page 2019
[21]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inInternational Conference on Learning Representations (ICLR), 2015. 10 APPENDIXA DATASET: GLOBALSITEMAP ANDPER-COUNTRYSITE EXAMPLES Fig. 7: Countries with archaeological sites selected for evalua- tion in this study to assess the applicability and generalizability of the proposed frame...

work page 2015

[1] [1]

Satellite evidence of archaeological site looting in Egypt: 2002–2013,

S. Parcak, D. Gathings, C. Childs, G. Mumford, and E. Cline, “Satellite evidence of archaeological site looting in Egypt: 2002–2013,”Antiquity, vol. 90, no. 349, pp. 188–205, 2016

work page 2002

[2] [2]

Monitoring of damages to cultural heritage across Europe using remote sensing and earth observation: Assessment of scientific and grey literature,

B. Cuca, F. Zaina, and D. Tapete, “Monitoring of damages to cultural heritage across Europe using remote sensing and earth observation: Assessment of scientific and grey literature,”Remote Sensing, vol. 15, no. 15, p. 3748, 2023

work page 2023

[3] [3]

Earth observation for the world cultural and natural heritage,

I. D. Negula, R. Sofronie, A. Virsta, and A. Badea, “Earth observation for the world cultural and natural heritage,”Agriculture and Agricultural Science Procedia, vol. 6, pp. 438–445, 2015

work page 2015

[4] [4]

World Heritage in danger: Big data and remote sensing can help protect sites in conflict zones,

N. Levin, S. Ali, D. Crandall, and S. Kark, “World Heritage in danger: Big data and remote sensing can help protect sites in conflict zones,” Global Environmental Change, vol. 55, pp. 97–104, 2019

work page 2019

[5] [5]

Detection of archaeological looting from space: Methods, achievements and challenges,

D. Tapete and F. Cigna, “Detection of archaeological looting from space: Methods, achievements and challenges,”Remote Sensing, vol. 11, no. 20, p. 2389, 2019

work page 2019

[6] [6]

UNESCO world heritage properties in changing and dynamic environments: change detection methods using optical and radar satellite data,

A. Agapiou, “UNESCO world heritage properties in changing and dynamic environments: change detection methods using optical and radar satellite data,”Heritage Science, vol. 9, no. 1, pp. 1–14, 2021

work page 2021

[7] [7]

Mapping patterns of long-term settlement in Northern Mesopotamia at a large scale,

B. H. Menze and J. A. Ur, “Mapping patterns of long-term settlement in Northern Mesopotamia at a large scale,”Proceedings of the National Academy of Sciences, vol. 109, no. 14, pp. E778–E787, 2012

work page 2012

[8] [8]

Detecting looted archaeological sites from satellite image time series,

E. Vincent, M. Saroufim, J. Chemla, Y . Ubelmann, P. Marquis, J. Ponce, and M. Aubry, “Detecting looted archaeological sites from satellite image time series,” inProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 2296–2307

work page 2025

[9] [9]

Satellite- based detection of looted archaeological sites using machine learning,

G. A. Tadesse, T. Bartette, A. Hassanali, A. Kim, J. Chemla, A. Zolli, Y . Ubelmann, C. Robinson, I. Becker-Reshef, and J. L. Ferres, “Satellite- based detection of looted archaeological sites using machine learning,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, March 2026, pp. 840–848

work page 2026

[10] [10]

SatMae: Pre-training transformers for temporal and multi-spectral satellite imagery,

Y . Cong, S. Khanna, C. Meng, P. Liu, E. Rozi, Y . He, M. Burke, D. Lo- bell, and S. Ermon, “SatMae: Pre-training transformers for temporal and multi-spectral satellite imagery,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 197–211

work page 2022

[11] [11]

SatCLIP: Global, general-purpose location embeddings with satellite imagery,

K. Klemmer, E. Rolf, C. Robinson, L. Mackey, and M. Russwurm, “SatCLIP: Global, general-purpose location embeddings with satellite imagery,” inAAAI Conference on Artificial Intelligence, vol. 38, no. 12, 2024, pp. 13 156–13 164

work page 2024

[12] [12]

Sim ´eoni, H

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa, F. Massa, D. Haz- iza, L. Wehrstedt, J. Wang, T. Darcet, T. Moutakanni, L. Sentana, C. Roberts, A. Vedaldi, J. Tolan, J. Brandt, C. Couprie, J. Mairal, H. J ´egou, P. Labatut, and P. Bojanowski, “DINOv3,” 2025

work page 2025

[13] [13]

Jakubik, S

J. Jakubik, S. Roy, C. E. Phillips, P. Fraccaro, G. Godwin, M. Zadrozny, C. Szwarcman, S. Gomes, S. Nyirjesy, D. Edwardset al., “Foundation models for generalist geospatial artificial intelligence,” arXiv:2310.18660, 2023

work page arXiv 2023

[14] [14]

Fully convolutional siamese networks for change detection,

R. C. Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese networks for change detection,”International Conference on Image Processing (ICIP), 2018

work page 2018

[15] [15]

Deep learning for anomaly detection: A survey,

R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,”ACM Computing Surveys, vol. 54, no. 3, pp. 1–38, 2021

work page 2021

[16] [16]

SatlasPretrain: A large-scale dataset for remote sensing image under- standing,

F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi, “SatlasPretrain: A large-scale dataset for remote sensing image under- standing,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 16 772–16 782

work page 2023

[17] [17]

RS5M and GeoRSClip: A large scale vision-language dataset and a large vision-language model for remote sensing,

Z. Zhang, T. Zhao, Y . Guo, and J. Yin, “RS5M and GeoRSClip: A large scale vision-language dataset and a large vision-language model for remote sensing,”IEEE Transactions on Geoscience and Remote Sensing, 2024

work page 2024

[18] [18]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning (ICML), 2021, pp. 8748–8763

work page 2021

[19] [19]

Prithvi-EO-2.0: A versatile multi-temporal foundation model for earth observation applications,

D. Szwarcman, S. Roy, P. Fraccaro, O. E. G ´ıslason, B. Blumenstiel, R. Ghosal, P. H. De Oliveira, J. L. de Sousa Almeida, R. Sedona, Y . Kang et al., “Prithvi-EO-2.0: A versatile multi-temporal foundation model for earth observation applications,”IEEE Transactions on Geoscience and Remote Sensing, 2025

work page 2025

[20] [20]

Ball,Archaeological gazetteer of Afghanistan

W. Ball,Archaeological gazetteer of Afghanistan. Oxford University Press, 2019

work page 2019

[21] [21]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inInternational Conference on Learning Representations (ICLR), 2015. 10 APPENDIXA DATASET: GLOBALSITEMAP ANDPER-COUNTRYSITE EXAMPLES Fig. 7: Countries with archaeological sites selected for evalua- tion in this study to assess the applicability and generalizability of the proposed frame...

work page 2015