arxiv: 2605.04989 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data

Ali Shibli , Andrea Nascetti , Yifang Ban

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords geospatial foundation modelsLoRAwildfire mappingburned areaSentinel-2parameter-efficient fine-tuningdomain generalizationsatellite imagery

0 comments

The pith

Low-Rank Adaptation lets geospatial foundation models map wildfires across regions more accurately than full fine-tuning while changing less than 1% of parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates how to adapt three geospatial foundation models for mapping burned areas in wildfires using Sentinel-2 satellite images from the US and Canada. It tests full fine-tuning against low-rank adaptation and finds that LoRA delivers better generalization to new locations and times with far fewer updated parameters. Prithvi-v2 combined with LoRA performs best overall. This matters because efficient adaptation could enable scalable, large-scale wildfire monitoring without retraining entire models each time. The experiments use 3820 events from 2017 to 2023 to simulate domain shifts between different biomes and years.

Core claim

Across experiments on burned-area mapping, Low-Rank Adaptation of the Prithvi-v2 model achieves the highest accuracy and the largest gains over full fine-tuning. It does so while updating less than 1% of the model's parameters and shows superior cross-domain generalization in spatial and temporal tests across US and Canada biomes. The study compares Terramind, DINOv3, and Prithvi-v2 using 3820 wildfire events from 2017-2023.

What carries the argument

Low-Rank Adaptation (LoRA), which adapts the models by training only low-rank update matrices for selected layers instead of all parameters.

If this is right

LoRA consistently outperforms full fine-tuning in cross-domain settings for all three models tested.
Prithvi-v2 with LoRA gives the best accuracy-efficiency trade-off for burned-area mapping.
Decoder-only fine-tuning proves less effective than LoRA for generalization across regions and times.
Geospatial foundation models become practical for operational wildfire mapping when paired with parameter-efficient methods like LoRA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar lightweight adaptation might extend to other satellite tasks such as flood detection or land-cover change.
Testing on imagery from additional continents would reveal whether the cross-domain gains hold beyond North America.
The reduced compute cost could support more frequent map updates as new Sentinel-2 acquisitions arrive.

Load-bearing premise

That the selected 3820 wildfire events from 2017-2023 and the spatial-temporal splits across US and Canada biomes sufficiently capture real-world domain shifts without data leakage or selection bias.

What would settle it

Observing that on a held-out test set from a new biome or later year, the LoRA-adapted Prithvi-v2 no longer shows higher accuracy than the fully fine-tuned version.

Figures

Figures reproduced from arXiv: 2605.04989 by Ali Shibli, Andrea Nascetti, Yifang Ban.

**Figure 1.** Figure 1: Overview of the proposed method. Bi-temporal images (pre- and view at source ↗

**Figure 2.** Figure 2: Distribution of wildfire events per biome in the US and Canada (2017- view at source ↗

**Figure 4.** Figure 4: Qualitative full-fire burned-area predictions illustrating the effect of view at source ↗

read the original abstract

Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospatial Foundation Models (GFMs) - Terramind, DINOv3, and Prithvi-v2 - for burned-area mapping across the United States and Canada using Sentinel-2 data. Leveraging 3,820 wildfire events from 2017-2023, we conduct spatial and temporal generalization tests across diverse biomes. We systematically compare full fine-tuning, decoder-only fine-tuning, and Low-Rank Adaptation (LoRA) for adapting each model. Across all experiments, LoRA provides the strongest cross-domain generalization while updating less than 1% of parameters, demonstrating a favorable trade-off between accuracy and efficiency. Prithvi-v2 with LoRA achieves the highest overall accuracy and the largest improvement compared to full fine-tuning. These findings indicate that geospatial foundation models, when adapted using lightweight parameter-efficient methods such as LoRA, offer a robust and scalable solution for large-scale burned-area mapping. Code is available at https://github.com/alishibli97/wildfire-lora-gfm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LoRA adaptation beats full fine-tuning on cross-domain generalization for these GFMs on Sentinel-2 wildfire mapping, with Prithvi-v2 + LoRA on top, but the result depends on clean splits.

read the letter

The main thing to know is that LoRA gives stronger cross-domain performance than full fine-tuning when adapting Terramind, DINOv3, and Prithvi-v2 to burned-area mapping from Sentinel-2, while touching under 1% of parameters, and Prithvi-v2 with LoRA shows the highest accuracy plus the biggest gain over full tuning. The study uses 3,820 events from 2017-2023 across US and Canada biomes with spatial and temporal splits to test generalization. It compares full fine-tuning, decoder-only fine-tuning, and LoRA across the three models in a consistent setup. Code is released, which supports checking the implementation. This is a practical empirical comparison that applies an established method to a real Earth observation task with decent scale and a focus on domain shift. The systematic layout across models and adaptation strategies makes the efficiency-accuracy trade-off clear for readers who care about scalable monitoring. The soft spots are mainly around the splits and reporting. The generalization advantage rests on those partitions truly avoiding leakage and capturing genuine shifts; without explicit checks for event or seasonal overlap, the deltas could partly reflect memorization rather than robustness. The abstract also omits exact metrics, statistical tests, and handling of class imbalance, which leaves the central claim harder to verify from the summary alone. These are addressable gaps rather than fatal ones. This paper is for remote sensing people or ML practitioners working on efficient adaptation of foundation models for geospatial tasks. Readers who need concrete numbers on LoRA versus full tuning for wildfire or similar mapping will get direct value. It deserves a serious referee because the experimental frame is straightforward and the efficiency angle is relevant, even if revisions will be needed on split validation and metric details. I would send it for review with requests for the partitioning protocol and full evaluation tables.

Referee Report

2 major / 2 minor

Summary. The paper evaluates three geospatial foundation models (Terramind, DINOv3, Prithvi-v2) for burned-area mapping on Sentinel-2 data from 3,820 wildfire events (2017-2023) across US and Canada biomes. It systematically compares full fine-tuning, decoder-only fine-tuning, and LoRA adaptation, claiming that LoRA delivers the strongest cross-domain generalization while updating <1% of parameters and that Prithvi-v2+LoRA attains the highest accuracy with the largest gains relative to full fine-tuning.

Significance. If the empirical results hold under verified splits, the work would establish a clear efficiency-accuracy trade-off for adapting large GFMs to domain-shifted Earth-observation tasks, with direct relevance to scalable wildfire monitoring and emissions modeling. The provision of code further supports reproducibility.

major comments (2)

[Experimental setup / data partitioning] The headline generalization claim (LoRA strongest cross-domain performance) is load-bearing on the spatial/temporal splits of the 3,820 events. The manuscript must explicitly describe the partitioning procedure (e.g., how events are assigned by geographic coordinates, year, or biome to ensure zero event-level, pixel-level, or seasonal overlap between train and test sets) and report any checks for leakage; without this, the reported accuracy deltas cannot be confidently attributed to adaptation robustness rather than memorization or selection effects.
[Results and discussion] Results lack specification of the exact evaluation metrics (e.g., IoU, F1-score, overall accuracy), statistical tests for significance of differences, and complete baseline tables including all three adaptation methods for each model. These omissions prevent verification of the claim that Prithvi-v2+LoRA shows the largest improvement over full fine-tuning.

minor comments (2)

[Abstract / code availability] The code repository link is a positive contribution for reproducibility; ensure the released scripts include the exact train/test split generation code and hyperparameter settings used for each GFM.
[Methods] Clarify the precise fraction of parameters updated by LoRA for each model (Terramind, DINOv3, Prithvi-v2) and whether rank and alpha were held constant across experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below. Where the manuscript requires clarification or expansion, we will revise accordingly to strengthen the presentation of our experimental setup and results.

read point-by-point responses

Referee: [Experimental setup / data partitioning] The headline generalization claim (LoRA strongest cross-domain performance) is load-bearing on the spatial/temporal splits of the 3,820 events. The manuscript must explicitly describe the partitioning procedure (e.g., how events are assigned by geographic coordinates, year, or biome to ensure zero event-level, pixel-level, or seasonal overlap between train and test sets) and report any checks for leakage; without this, the reported accuracy deltas cannot be confidently attributed to adaptation robustness rather than memorization or selection effects.

Authors: We agree that explicit details on the data partitioning are necessary to support the cross-domain generalization claims. The current manuscript mentions spatial and temporal generalization tests across biomes but does not provide a full procedural description. In the revised version, we will add a dedicated subsection (likely in Methods) that specifies: (1) assignment of the 3,820 events by geographic coordinates (e.g., disjoint US/Canada regions or biomes), (2) temporal splits by year ranges to avoid seasonal overlap, and (3) verification steps confirming zero event-level, pixel-level, or seasonal leakage between train and test sets. We will also report the resulting split sizes, biome coverage, and any leakage checks performed. This revision will allow readers to attribute performance differences more confidently to the adaptation methods rather than data artifacts. revision: yes
Referee: [Results and discussion] Results lack specification of the exact evaluation metrics (e.g., IoU, F1-score, overall accuracy), statistical tests for significance of differences, and complete baseline tables including all three adaptation methods for each model. These omissions prevent verification of the claim that Prithvi-v2+LoRA shows the largest improvement over full fine-tuning.

Authors: We acknowledge these omissions limit verifiability. The experiments use Intersection-over-Union (IoU) and F1-score as primary metrics for burned-area segmentation, supplemented by overall accuracy; these will be explicitly stated in a new Evaluation Metrics paragraph in the revised manuscript. We will also add statistical significance testing (e.g., paired t-tests across the 3,820 events or bootstrap confidence intervals) for all reported differences. Finally, we will expand the results tables to present all three adaptation strategies (full fine-tuning, decoder-only fine-tuning, and LoRA) side-by-side for each of the three GFMs, enabling direct comparison and verification of the largest gains for Prithvi-v2+LoRA. These changes will be incorporated without altering the underlying experimental outcomes. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct empirical measurements from held-out splits

full rationale

The paper reports experimental results from applying full fine-tuning, decoder-only fine-tuning, and LoRA to three pre-trained geospatial foundation models on a dataset of 3,820 wildfire events (2017-2023) with explicit spatial and temporal splits across US/Canada biomes. Key claims (LoRA strongest cross-domain generalization, Prithvi-v2+LoRA highest accuracy and largest gain vs full fine-tuning) are computed directly from accuracy metrics on those test sets. No mathematical derivation chain exists; no parameters are fitted to a subset and then presented as predictions of related quantities; no self-citations are invoked to justify uniqueness or ansatzes that bear the central result; and no known empirical patterns are renamed as novel derivations. The evaluation is self-contained against the reported data splits and standard adaptation techniques.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim is an empirical performance comparison rather than a derivation, so it rests primarily on the assumption that the experimental design validly tests generalization.

axioms (1)

domain assumption The 3,820 wildfire events and Sentinel-2 imagery splits adequately represent geographic and temporal domain shifts for burned-area mapping.
Invoked to support claims of cross-domain generalization in the abstract.

pith-pipeline@v0.9.0 · 5556 in / 1201 out tokens · 85103 ms · 2026-05-08T18:21:18.522204+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation/AlexanderDuality.lean (8-tick period from 2^D=8, D=3) alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LoRA adapters use rank r = 8 and scaling α = 1.0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Burned area determination using sentinel-2 satellite images and the impact of fire on the availability of soil nutrients in syria

R. Al-Hasn and R. Almuhammad, “Burned area determination using sentinel-2 satellite images and the impact of fire on the availability of soil nutrients in syria.” 2022

2022
[2]

Mapping burned areas in thai- land using sentinel-2 imagery and obia techniques,

C. Suwanprasit and Shahnawaz, “Mapping burned areas in thai- land using sentinel-2 imagery and obia techniques,”Scientific Reports, vol. 14, no. 1, p. 9609, 2024

2024
[3]

A deep learning approach for burned area segmentation with sentinel-2 data,

L. Knopp, M. Wieland, M. R ¨attich, and S. Martinis, “A deep learning approach for burned area segmentation with sentinel-2 data,”Remote Sensing, vol. 12, no. 15, p. 2422, 2020

2020
[4]

Semantic segmentation of burned areas in satellite images using a u-net-based convolutional neu- ral network,

A. Brand and A. Manandhar, “Semantic segmentation of burned areas in satellite images using a u-net-based convolutional neu- ral network,”The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 43, pp. 47–53, 2021

2021
[5]

Biau- net: Wildfire burnt area mapping using bi-temporal sentinel- 2 imagery and u-net with attention mechanism,

T. Sui, Q. Huang, M. Wu, M. Wu, and Z. Zhang, “Biau- net: Wildfire burnt area mapping using bi-temporal sentinel- 2 imagery and u-net with attention mechanism,”International Journal of Applied Earth Observation and Geoinformation, vol. 132, p. 104034, 2024

2024
[6]

Anand, R

A. Anand, R. Imasu, S. K. Dhaka, and P. K. Patra, “Domain adaptation and fine-tuning of a deep learning segmentation model of small agricultural burn area detection using high- resolution sentinel-2 observations: A case study of punjab, india,”Remote Sensing, vol. 17, no. 6, p. 974, 2025

2025
[7]

Prithvi: Large-scale multimodal fms for earth observation,

M. e. a. Reichstein, “Prithvi: Large-scale multimodal fms for earth observation,”NeurIPS, 2023

2023
[8]

Terramind: Modality-agnostic geospatial foundation model,

A. A. Lab, “Terramind: Modality-agnostic geospatial foundation model,”CVPR, 2024

2024
[9]

DINOv3

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa et al., “Dinov3,”arXiv preprint arXiv:2508.10104, 2025

work page Pith review arXiv 2025
[10]

Pangaea: A global and inclusive benchmark for geospatial foundation models.arXiv preprint arXiv:2412.04204, 2024

V . Marsocci, Y . Jia, G. L. Bellier, D. Kerekes, L. Zeng, S. Hafner, S. Gerard, E. Brune, R. Yadav, A. Shibliet al., “Pangaea: A global and inclusive benchmark for geospatial foundation models,”arXiv preprint arXiv:2412.04204, 2024

work page arXiv 2024
[11]

Geo- bench-2: From performance to capability, rethinking evaluation in geospatial ai.arXiv preprint arXiv:2511.15658, 2025

N. Simumba, N. Lehmann, P. Fraccaro, H. Alemohammad, G. De Mel, S. Khan, M. Maskey, N. Longepe, X. X. Zhu, H. Kerneret al., “Geo-bench-2: From performance to capa- bility, rethinking evaluation in geospatial ai,”arXiv preprint arXiv:2511.15658, 2025

work page arXiv 2025
[12]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter- efficient fine-tuning for large models: A comprehensive survey,” arXiv preprint arXiv:2403.14608, 2024

work page internal anchor Pith review arXiv 2024
[13]

Parameter-efficient fine-tuning for pre-trained vision models: A survey,

Y . Xin, S. Luo, H. Zhou, J. Du, X. Liu, Y . Fan, Q. Li, and Y . Du, “Parameter-efficient fine-tuning for pre-trained vision models: A survey,”arXiv e-prints, pp. arXiv–2402, 2024

2024
[14]

Fine-tune smarter, not harder: Parameter- efficient fine-tuning for geospatial foundation models,

F. Marti Escofet, B. Blumenstiel, L. Scheibenreif, P. Fraccaro, and K. Schindler, “Fine-tune smarter, not harder: Parameter- efficient fine-tuning for geospatial foundation models,” inJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2025, pp. 516–532

2025
[15]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.”ICLR, vol. 1, no. 2, p. 3, 2022

2022
[16]

Unified perceptual parsing for scene understanding,

T. Xiao, Y . Liu, B. Zhou, Y . Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 418–434

2018
[17]

A project for monitoring trends in burn severity,

J. Eidenshink, B. Schwind, K. Brewer, Z.-L. Zhu, B. Quayle, and S. Howard, “A project for monitoring trends in burn severity,”Fire ecology, vol. 3, no. 1, pp. 3–21, 2007

2007
[18]

National burned area composite (NBAC) — annual burned area polygons,

Canadian Forest Service, “National burned area composite (NBAC) — annual burned area polygons,” 2023, government of Canada metadata catalogue

2023