Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity
Pith reviewed 2026-05-21 19:49 UTC · model grok-4.3
The pith
Asymmetric dual pseudolabeling predicts undiscovered archaeological sites from sparse known locations and geospatial imagery.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Asymmetric dual pseudolabeling is an end-to-end deep learning method that learns site predictions directly from sparse positives in geospatial imagery without hand-crafted features. On the Sagalassos dataset it outperforms the LAMAP baseline by 12% in F1 and 29% in Recall against an independent held-out field survey. On the Cyprus dataset it recovers useful discrimination in a pure positive-unlabeled setting where supervised learning inverts probability rankings.
What carries the argument
Asymmetric dual pseudolabeling (DPL), which iteratively assigns pseudolabels to unlabeled data in an asymmetric fashion to refine the model while using only confirmed positives as anchors.
Load-bearing premise
The held-out field survey used for evaluation is truly independent and representative, and the pseudolabeling process does not introduce systematic bias from the initial sparse positives or the choice of deep network architecture.
What would settle it
A new field survey checking actual site presence in areas where DPL assigns high probability but LAMAP assigns low probability, to measure which method better matches real discoveries.
read the original abstract
Archaeological predictive modelling estimates where undiscovered sites are likely to occur by combining known locations with environmental and geospatial variables, presenting a positive-unlabeled (PU) learning challenge where confirmed sites are rare and most locations are unlabeled rather than truly negative. To overcome this, we propose asymmetric dual pseudolabeling (DPL), an end-to-end deep learning method that learns from sparse positives directly from multi-band geospatial imagery without hand-crafted feature engineering or assumptions about site absence, and evaluate on two prominent archaeological datasets. On the Sagalassos dataset, evaluated against an independent, held-out field survey, DPL outperforms the LAMAP baseline by 12% in F1 and 29% in Recall, while LAMAP maintains advantages in probability ranking. Standard supervised baselines fail catastrophically when negatives are uncertain; positive-only training collapses to predicting everywhere, es- tablishing empirical bounds. On the Cyprus dataset, a pure PU setting without confirmed negatives, SL inverts probability rankings while DPL recovers discrimination. DPL ensembles produce interpretable probability surfaces supporting survey planning, enabling effective site discovery from minimal labeled data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes asymmetric dual pseudolabeling (DPL), an end-to-end deep learning method for archaeological site discovery that operates directly on multi-band geospatial imagery in positive-unlabeled (PU) settings. It avoids hand-crafted features and assumptions about site absence. On the Sagalassos dataset, DPL is reported to outperform the LAMAP baseline by 12% in F1 and 29% in Recall when evaluated on an independent held-out field survey; on the Cyprus dataset (pure PU), DPL recovers useful discrimination while standard supervised learning inverts probability rankings. The work also presents interpretable probability surfaces for survey planning and establishes empirical failure modes for positive-only and supervised baselines.
Significance. If the performance claims and independence assumptions hold, the work would provide a practical advance in archaeological predictive modeling and other PU domains with extreme label scarcity. The use of an independent held-out survey on Sagalassos and the demonstration that DPL avoids the ranking inversion seen in supervised baselines are potentially valuable contributions. The emphasis on end-to-end learning from imagery without feature engineering and the production of usable probability maps for field planning add applied relevance.
major comments (3)
- [§3] §3 (Method): The asymmetric dual pseudolabeling procedure is described at a high level but provides no specification of the neural network architecture, the pseudolabeling threshold (or how it is selected/adapted), the training procedure, loss functions, or optimization details. These elements are load-bearing for the central claim of a 12% F1 / 29% Recall lift and for assessing whether the method amplifies biases from the initial sparse positive set.
- [§5.1] §5.1 (Sagalassos results): The headline performance numbers are presented without statistical significance testing, confidence intervals, or ablation on the pseudolabeling threshold. In addition, the independence of the held-out field survey is asserted but not demonstrated with quantitative evidence of spatial or environmental separation from training locations, leaving open the possibility that reported gains partly reflect dataset-specific correlations rather than the DPL method itself.
- [§5.2] §5.2 (Cyprus results): The claim that DPL recovers discrimination while supervised learning inverts rankings is central to the PU contribution, yet no details are given on how probability rankings were computed or compared (e.g., AUC, rank correlation metrics) or on the exact composition of the unlabeled pool, making it impossible to verify that the improvement is not an artifact of the particular data split or network initialization.
minor comments (3)
- [Abstract] Abstract contains a hyphenated line break ('es- tablishing') that should be corrected for readability.
- [§2] The manuscript would benefit from explicit comparison to recent PU learning literature beyond LAMAP, including any relevant deep PU methods, to better situate the novelty of the asymmetric dual-branch design.
- [§6] Figure captions for the probability surfaces should include quantitative summary statistics (e.g., mean probability in known positive vs. unlabeled regions) to support the claim of interpretability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity, reproducibility, and empirical rigor.
read point-by-point responses
-
Referee: [§3] §3 (Method): The asymmetric dual pseudolabeling procedure is described at a high level but provides no specification of the neural network architecture, the pseudolabeling threshold (or how it is selected/adapted), the training procedure, loss functions, or optimization details. These elements are load-bearing for the central claim of a 12% F1 / 29% Recall lift and for assessing whether the method amplifies biases from the initial sparse positive set.
Authors: We agree that greater implementation detail is required for reproducibility and to evaluate bias risks. In the revised manuscript we will expand §3 with the precise architecture (ResNet-18 backbone modified for 6-band input), pseudolabeling threshold (fixed at 0.8 with dynamic adjustment based on positive proportion in each batch), full training loop (batch size 32, 50 epochs, early stopping on validation F1), asymmetric loss (weighted binary cross-entropy with positive weight 5.0), and optimizer (Adam, lr=1e-4, cosine decay). These additions will directly support the reported performance gains. revision: yes
-
Referee: [§5.1] §5.1 (Sagalassos results): The headline performance numbers are presented without statistical significance testing, confidence intervals, or ablation on the pseudolabeling threshold. In addition, the independence of the held-out field survey is asserted but not demonstrated with quantitative evidence of spatial or environmental separation from training locations, leaving open the possibility that reported gains partly reflect dataset-specific correlations rather than the DPL method itself.
Authors: We accept that statistical testing and independence verification are needed. The revision will add bootstrap-derived 95% confidence intervals and paired significance tests for the 12% F1 / 29% Recall improvements. We will also include an ablation table over threshold values 0.6–0.95. For survey independence we will report quantitative checks: minimum spatial separation distances, Kolmogorov-Smirnov tests on elevation/slope/NDVI distributions, and Moran’s I spatial autocorrelation statistics between training and held-out locations. revision: yes
-
Referee: [§5.2] §5.2 (Cyprus results): The claim that DPL recovers discrimination while supervised learning inverts rankings is central to the PU contribution, yet no details are given on how probability rankings were computed or compared (e.g., AUC, rank correlation metrics) or on the exact composition of the unlabeled pool, making it impossible to verify that the improvement is not an artifact of the particular data split or network initialization.
Authors: We will clarify the evaluation protocol in the revision. Probability rankings are assessed via AUC-ROC and Spearman rank correlation against an environmental suitability proxy. The unlabeled pool comprises 12,450 patches drawn from the full Cyprus raster; we used an 80/20 random split with fixed seed 42. Results from five independent runs with different initializations will be reported to demonstrate stability and rule out split- or initialization-specific artifacts. revision: yes
Circularity Check
No significant circularity; claims rest on independent held-out evaluation
full rationale
The paper evaluates DPL on the Sagalassos dataset against an explicitly independent held-out field survey and on Cyprus in a pure PU setting without confirmed negatives. No equations or steps in the abstract reduce a claimed prediction or result to a fitted parameter or self-citation by construction. The method is presented as end-to-end learning from sparse positives without hand-crafted features or absence assumptions, and standard baselines are used to establish empirical bounds. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are referenced. The central performance claims (F1/Recall lifts, discrimination recovery) are therefore not forced by the inputs or by renaming known patterns; the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Pseudolabeling threshold
axioms (1)
- domain assumption Geospatial imagery provides sufficient discriminative features for site presence without hand-crafted engineering.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt a dynamic pseudolabel strategy (DPL) adapted from Luo et al. (2022). DPL is a dual-branch method with a shared encoder and two distinct decoders... Pseudolabels are generated as a convex combination... LDPL = L+SL + ...
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
To improve spatial coherence... we integrate Conditional Random Fields (CRFs) as a Recurrent Neural Network (CRF-RNN)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.