Clay-CNN Hybrids: Leveraging Geospatial Foundation Models as Auxiliary Context for Landslide Detection
Pith reviewed 2026-06-27 05:17 UTC · model grok-4.3
The pith
Hybrid U-Net with Clay GFM context as auxiliary input reaches 64.5% F1 on landslide segmentation, beating both Clay-only and plain U-Net baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The hybrid U-Net + Clay model with two-stage Low-Rank Adaptation (LoRA) achieved the best test F1 of 64.5 +/- 1.8% over three seeds, surpassing the Clay-only backbone (55.2 +/- 3.6%) and the U-Net baseline (59.9%). Clay as a standalone encoder underperformed the U-Net due to the absence of multi-scale skip connections, but its pretrained representations consistently improved performance when injected as auxiliary context. These findings suggest that GFMs are most effective for landslide detection when they complement spatially detailed convolutional architectures rather than replace them.
What carries the argument
Two-stage Low-Rank Adaptation (LoRA) that injects Clay semantic context into the U-Net bottleneck as auxiliary input while preserving the convolutional skip connections.
If this is right
- GFMs improve landslide segmentation most when supplied as auxiliary context rather than used as the sole encoder.
- Two-stage LoRA provides an effective way to adapt a pretrained GFM inside a hybrid segmentation pipeline.
- Convolutional skip connections remain necessary for spatially precise outputs even when strong semantic features are available.
- Pretrained geospatial representations help mitigate extreme class imbalance in post-event mapping tasks.
Where Pith is reading between the lines
- The same hybrid pattern may transfer to other geospatial segmentation problems that share high class imbalance and multi-band imagery.
- Alternative fusion locations or other GFMs could be tested to see if bottleneck injection is optimal.
- The performance gap suggests that real-time disaster pipelines could adopt lightweight LoRA adapters on existing CNN backbones without full model replacement.
Load-bearing premise
The measured F1 gains are caused by Clay's semantic context rather than by the specific two-stage LoRA schedule, random seed variation, or unstated differences in training protocol or data augmentation.
What would settle it
Retraining the hybrid architecture with the identical two-stage LoRA schedule but replacing Clay embeddings with random vectors or embeddings from an unrelated model and checking whether the F1 advantage disappears.
Figures
read the original abstract
Rapid post-event landslide mapping is essential for disaster response but remains difficult to automate due to extreme class imbalance. This study evaluates whether Clay v1.5, a Geospatial Foundation Model (GFM), can improve pixel-level landslide segmentation on the Landslide4Sense (L4S) benchmark, which contains 3,799 training chips with 14 Sentinel-2 and terrain bands and approximately 2% positive pixels. We compare three strategies: Clay as the primary encoder with multi-scale residual terrain fusion, a U-Net backbone augmented with Clay semantic context at the bottleneck, and a standard U-Net baseline. The hybrid U-Net + Clay model with two-stage Low-Rank Adaptation (LoRA) achieved the best test F1 of 64.5 +/- 1.8% over three seeds, surpassing the Clay-only backbone (55.2 +/- 3.6%) and the U-Net baseline (59.9%). Clay as a standalone encoder underperformed the U-Net due to the absence of multi-scale skip connections, but its pretrained representations consistently improved performance when injected as auxiliary context. These findings suggest that GFMs are most effective for landslide detection when they complement spatially detailed convolutional architectures rather than replace them.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates Clay v1.5, a geospatial foundation model, for pixel-level landslide segmentation on the Landslide4Sense benchmark (3,799 training chips, ~2% positive pixels). It compares three strategies: Clay as primary encoder with multi-scale residual terrain fusion, a standard U-Net baseline, and a hybrid U-Net that injects Clay features at the bottleneck. All use two-stage LoRA in the reported hybrid configuration. The hybrid achieves the highest test F1 of 64.5 ± 1.8% over three seeds, outperforming Clay-only (55.2 ± 3.6%) and U-Net (59.9%). The authors conclude that GFMs improve performance most effectively when used as auxiliary context to CNNs rather than as standalone encoders, owing to the absence of skip connections in the Clay backbone.
Significance. If the reported F1 gains can be attributed to Clay semantic context under controlled conditions, the work would offer concrete empirical guidance on hybrid GFM-CNN designs for class-imbalanced remote-sensing segmentation. The explicit multi-seed reporting with standard deviations and direct baseline comparisons is a positive methodological feature that strengthens reproducibility of the headline numbers.
major comments (2)
- [Abstract] Abstract: The central claim that the 4.6-point F1 gain of the hybrid over the U-Net baseline is caused by Clay semantic context requires that the U-Net baseline and Clay-only models received identical training protocols (two-stage LoRA schedule, optimizer, augmentation pipeline, and positive-pixel weighting). No such protocol equivalence is stated or demonstrated, rendering the attribution to Clay features unverifiable from the reported results.
- [Results] Results paragraph: No ablation is presented that isolates the contribution of the two-stage LoRA schedule from the Clay feature injection itself. Because the hybrid is the only configuration explicitly described as using two-stage LoRA, the observed delta could arise from the adaptation procedure rather than from the GFM context, directly undermining the conclusion that GFMs are “most effective when they complement” CNNs.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback emphasizing the need for explicit protocol details and ablation clarity. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core findings.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the 4.6-point F1 gain of the hybrid over the U-Net baseline is caused by Clay semantic context requires that the U-Net baseline and Clay-only models received identical training protocols (two-stage LoRA schedule, optimizer, augmentation pipeline, and positive-pixel weighting). No such protocol equivalence is stated or demonstrated, rendering the attribution to Clay features unverifiable from the reported results.
Authors: We agree that protocol equivalence must be stated explicitly for the attribution to be verifiable. All three configurations were trained under the same optimizer, augmentation pipeline, positive-pixel weighting, and epoch schedule; two-stage LoRA was applied to both the Clay-only and hybrid models (as required for GFM adaptation), while the U-Net baseline used standard fine-tuning. We will revise the manuscript to add an explicit methods paragraph and comparison table confirming these shared settings across models. revision: yes
-
Referee: [Results] Results paragraph: No ablation is presented that isolates the contribution of the two-stage LoRA schedule from the Clay feature injection itself. Because the hybrid is the only configuration explicitly described as using two-stage LoRA, the observed delta could arise from the adaptation procedure rather than from the GFM context, directly undermining the conclusion that GFMs are “most effective when they complement” CNNs.
Authors: The referee correctly notes that the current text only highlights two-stage LoRA for the hybrid, leaving open the possibility that LoRA itself drives part of the gain. Clay-only also uses LoRA for adaptation, and the hybrid still outperforms it, but we acknowledge the lack of a pure U-Net + LoRA control. We will revise the results section to clarify LoRA usage per model and add a short ablation (U-Net with two-stage LoRA, no Clay) if space allows; otherwise we will note this as a limitation and qualify the conclusion accordingly. revision: partial
Circularity Check
No circularity; purely empirical benchmark comparison
full rationale
The manuscript reports F1 scores from training three segmentation models (hybrid U-Net+Clay, Clay-only, U-Net baseline) on the fixed Landslide4Sense dataset and evaluating on a held-out test set. No equations, derivations, or first-principles results appear. The central claim rests on direct experimental deltas against explicitly described baselines rather than any reduction of outputs to fitted inputs or self-citations. No self-definitional, fitted-prediction, or uniqueness-theorem patterns are present. This is standard empirical ML evaluation and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- LoRA rank and learning rate schedule
axioms (1)
- domain assumption The Landslide4Sense dataset split and annotation protocol constitute a fair test of generalization for post-event landslide mapping.
Reference graph
Works this paper leans on
-
[1]
Understanding fatal landslides at global scales: A summary of topographic, climatic, and anthropogenic perspectives,
S. Fidanet al., “Understanding fatal landslides at global scales: A summary of topographic, climatic, and anthropogenic perspectives,”Nat. Hazards, vol. 120, pp. 6437–6455, 2024
2024
-
[2]
Quantitative risk analysis for earthquake-induced landslides—Emamzadeh Ali, Iran,
S. M. Mousaviet al., “Quantitative risk analysis for earthquake-induced landslides—Emamzadeh Ali, Iran,”Eng. Geol., vol. 122, no. 3–4, pp. 191– 203, 2011
2011
-
[3]
Landslide susceptibility in cemented volcanic soils, Ask region, Iran,
S. M. Mousavi, “Landslide susceptibility in cemented volcanic soils, Ask region, Iran,”Indian Geotech. J., vol. 47, no. 1, pp. 115–130, 2017
2017
-
[4]
Landslides in a changing world,
I. Alc´antara-Ayala, “Landslides in a changing world,”Landslides, vol. 22, pp. 2851–2865, 2025
2025
-
[5]
Climate change could trigger more landslides in high mountain Asia,
“Climate change could trigger more landslides in high mountain Asia,”NOAA Research, Feb. 11,
-
[6]
Available: https://research.noaa.gov/ climate-change-could-trigger-more-landslides-in-high-mountain-asia/
[Online]. Available: https://research.noaa.gov/ climate-change-could-trigger-more-landslides-in-high-mountain-asia/
-
[7]
Rapid landslide detection from free optical satellite imagery using a robust change detection technique,
R. Coluzziet al., “Rapid landslide detection from free optical satellite imagery using a robust change detection technique,”Sci. Rep., vol. 15, Art. no. 4697, 2025
2025
-
[8]
Land- slide4Sense: Reference benchmark data and deep learning models for landslide detection,
O. Ghorbanzadeh, Y . Xu, P. Ghamisi, M. Kopp, and D. Kreil, “Land- slide4Sense: Reference benchmark data and deep learning models for landslide detection,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–17, 2022. 9
2022
-
[9]
A. Sreekumaret al., “Enhancing landslide detection in Western Ghats of Kerala, India with deep learning and explainable AI,”Sci. Rep., vol. 15, Art. no. 45151, 2025, doi: 10.1038/s41598-025-33065-9
-
[10]
L. Navaet al., “Brief communication: AI-driven rapid landslide mapping following the 2024 Hualien earthquake in Taiwan,”Nat. Hazards Earth Syst. Sci., vol. 25, pp. 2371–2377, 2025, doi: 10.5194/nhess-25-2371- 2025
-
[11]
A feature fusion method on landslide identification in remote sensing with Segment Anything Model,
C. Yanget al., “A feature fusion method on landslide identification in remote sensing with Segment Anything Model,”Landslides, vol. 22, pp. 471–483, 2025, doi: 10.1007/s10346-024-02390-x
-
[12]
Segment Anything,
A. Kirillovet al., “Segment Anything,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023
2023
-
[13]
Prithvi-EO-2.0: A versatile multi-temporal foundation model for Earth observation applications,
D. Szwarcmanet al., “Prithvi-EO-2.0: A versatile multi-temporal foundation model for Earth observation applications,”arXiv preprint arXiv:2412.02732, Mar. 2026
-
[14]
Clay foundation model,
Clay Foundation, “Clay foundation model,” 2024. [Online]. Available: https://clay-foundation.github.io/model/
2024
-
[15]
S. Kaushiket al., “Assessing the value of geo-foundational models for flood inundation mapping: Benchmarking models for Sentinel-1, Sentinel- 2, and PlanetScope for end-users,”arXiv preprint arXiv:2511.01990, Jan. 2026
-
[16]
Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,
Y . Gal and Z. Ghahramani, “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,” inProc. 33rd Int. Conf. Mach. Learn. (ICML), 2016, pp. 1050–1059
2016
-
[17]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient- based localization,”Int. J. Comput. Vis., vol. 128, no. 2, pp. 336–359, 2020, doi: 10.1109/ICCV .2017.74
-
[18]
Landslide4Sense dataset (v1.0),
IBM NASA Geospatial, “Landslide4Sense dataset (v1.0),” Hugging Face, 2024. [Online]. Available: https://huggingface.co/datasets/harshinde/ LandSlide4Sense
2024
-
[19]
O. Ghorbanzadehet al., “The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery,”IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 15, pp. 9927–9942, 2022, doi: 10.1109/JSTARS.2022.3220845
-
[20]
Landslide4Sense-2022: Data description and baseline code for Land- Slide4Sense 2022 competition,
Institute of Advanced Research in Artificial Intelligence (IARAI), “Landslide4Sense-2022: Data description and baseline code for Land- Slide4Sense 2022 competition,” GitHub, 2022. [Online]. Available: https://github.com/iarai/Landslide4Sense-2022
2022
-
[21]
Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015, pp. 1026–1034
2015
-
[22]
The Lov\'asz Hinge: A Novel Convex Surrogate for Submodular Losses
J. Yu and M. Blaschko, “The Lov ´asz hinge: A novel convex surrogate for submodular losses,”arXiv preprint arXiv:1512.07797, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
Two-stage training strategy combined with neural network for segmentation of internal mammary artery graft,
S. Sunet al., “Two-stage training strategy combined with neural network for segmentation of internal mammary artery graft,”Biomed. Signal Process. Control, vol. 80, Art. no. 104278, Feb. 2023
2023
-
[24]
C. J. Van Rijsbergen,Information Retrieval, 2nd ed. London, U.K.: Butterworths, 1979
1979
-
[25]
Beyond temperature scaling: Obtaining well-calibrated multi- class probabilities with Dirichlet calibration,
M. Kull, M. Perell ´o-Nieto, M. K ¨angsepp, T. Silva Filho, H. Song, and P. Flach, “Beyond temperature scaling: Obtaining well-calibrated multi- class probabilities with Dirichlet calibration,” inAdv. Neural Inf. Process. Syst., vol. 32, 2019
2019
-
[26]
CTFNet: CNN-Transformer fusion network for remote-sensing image semantic segmentation,
H. Wu, P. Huang, M. Zhang, and W. Tang, “CTFNet: CNN-Transformer fusion network for remote-sensing image semantic segmentation,”IEEE Geosci. Remote Sens. Lett., vol. 21, Art. no. 5000305, pp. 1–5, 2024, doi: 10.1109/LGRS.2023.3336061
-
[27]
DGCFNet: Dual Global Context Fusion Network for remote sensing image semantic segmentation,
Y . Liao, T. Zhou, L. Li, J. Li, J. Shen, and A. Hamdulla, “DGCFNet: Dual Global Context Fusion Network for remote sensing image semantic segmentation,”PeerJ Comput. Sci., vol. 11, Art. no. e2786, 2025
2025
-
[28]
Landslide segmentation with deep learning: Evaluating model generalization in rainfall-induced landslides in Brazil,
L. P. Soares, H. C. Dias, G. P. B. Garcia, and C. H. Grohmann, “Landslide segmentation with deep learning: Evaluating model generalization in rainfall-induced landslides in Brazil,”Remote Sens., vol. 16, no. 22, Art. no. 4344, 2024
2024
-
[29]
Z. Renet al., “Enhancing deep learning-based landslide detection from open satellite imagery via multisource data fusion of spectral, textural, and topographical features: A case study of old landslide detection in the Three Gorges Reservoir Area (TGRA),”Geomat. Nat. Hazards Risk, vol. 16, no. 1, Art. no. 2421224, 2025
2025
-
[30]
Semi-automatic mapping of shallow landslides using free Sentinel-2 images and Google Earth Engine,
D. Nottiet al., “Semi-automatic mapping of shallow landslides using free Sentinel-2 images and Google Earth Engine,”Nat. Hazards Earth Syst. Sci., vol. 23, no. 7, pp. 2625–2648, 2023, doi: 10.5194/nhess-23- 2625-2023
-
[31]
R. A. Burange, H. K. Shinde, and O. Mutyalwar, “Landslide detection and mapping using deep learning across multi-source satellite data and geographic regions,”arXiv preprint arXiv:2507.01123, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
I. Nasios, “Multi-modal landslide detection from Sentinel-1 SAR and Sentinel-2 optical imagery using multi-encoder vision transformers and ensemble learning,”arXiv preprint arXiv:2604.05959, Apr. 2026. Binh Huong Vureceived a B.A. in Economics with a minor in Computer Science from Harvard University. Her research interests lie at the intersection of machi...
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.