pith. machine review for the scientific record. sign in

arxiv: 2605.04989 · v1 · submitted 2026-05-06 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:21 UTC · model grok-4.3

classification 💻 cs.CV
keywords geospatial foundation modelsLoRAwildfire mappingburned areaSentinel-2parameter-efficient fine-tuningdomain generalizationsatellite imagery
0
0 comments X

The pith

Low-Rank Adaptation lets geospatial foundation models map wildfires across regions more accurately than full fine-tuning while changing less than 1% of parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates how to adapt three geospatial foundation models for mapping burned areas in wildfires using Sentinel-2 satellite images from the US and Canada. It tests full fine-tuning against low-rank adaptation and finds that LoRA delivers better generalization to new locations and times with far fewer updated parameters. Prithvi-v2 combined with LoRA performs best overall. This matters because efficient adaptation could enable scalable, large-scale wildfire monitoring without retraining entire models each time. The experiments use 3820 events from 2017 to 2023 to simulate domain shifts between different biomes and years.

Core claim

Across experiments on burned-area mapping, Low-Rank Adaptation of the Prithvi-v2 model achieves the highest accuracy and the largest gains over full fine-tuning. It does so while updating less than 1% of the model's parameters and shows superior cross-domain generalization in spatial and temporal tests across US and Canada biomes. The study compares Terramind, DINOv3, and Prithvi-v2 using 3820 wildfire events from 2017-2023.

What carries the argument

Low-Rank Adaptation (LoRA), which adapts the models by training only low-rank update matrices for selected layers instead of all parameters.

If this is right

  • LoRA consistently outperforms full fine-tuning in cross-domain settings for all three models tested.
  • Prithvi-v2 with LoRA gives the best accuracy-efficiency trade-off for burned-area mapping.
  • Decoder-only fine-tuning proves less effective than LoRA for generalization across regions and times.
  • Geospatial foundation models become practical for operational wildfire mapping when paired with parameter-efficient methods like LoRA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar lightweight adaptation might extend to other satellite tasks such as flood detection or land-cover change.
  • Testing on imagery from additional continents would reveal whether the cross-domain gains hold beyond North America.
  • The reduced compute cost could support more frequent map updates as new Sentinel-2 acquisitions arrive.

Load-bearing premise

That the selected 3820 wildfire events from 2017-2023 and the spatial-temporal splits across US and Canada biomes sufficiently capture real-world domain shifts without data leakage or selection bias.

What would settle it

Observing that on a held-out test set from a new biome or later year, the LoRA-adapted Prithvi-v2 no longer shows higher accuracy than the fully fine-tuned version.

Figures

Figures reproduced from arXiv: 2605.04989 by Ali Shibli, Andrea Nascetti, Yifang Ban.

Figure 1
Figure 1. Figure 1: Overview of the proposed method. Bi-temporal images (pre- and view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of wildfire events per biome in the US and Canada (2017- view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative full-fire burned-area predictions illustrating the effect of view at source ↗
read the original abstract

Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospatial Foundation Models (GFMs) - Terramind, DINOv3, and Prithvi-v2 - for burned-area mapping across the United States and Canada using Sentinel-2 data. Leveraging 3,820 wildfire events from 2017-2023, we conduct spatial and temporal generalization tests across diverse biomes. We systematically compare full fine-tuning, decoder-only fine-tuning, and Low-Rank Adaptation (LoRA) for adapting each model. Across all experiments, LoRA provides the strongest cross-domain generalization while updating less than 1% of parameters, demonstrating a favorable trade-off between accuracy and efficiency. Prithvi-v2 with LoRA achieves the highest overall accuracy and the largest improvement compared to full fine-tuning. These findings indicate that geospatial foundation models, when adapted using lightweight parameter-efficient methods such as LoRA, offer a robust and scalable solution for large-scale burned-area mapping. Code is available at https://github.com/alishibli97/wildfire-lora-gfm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper evaluates three geospatial foundation models (Terramind, DINOv3, Prithvi-v2) for burned-area mapping on Sentinel-2 data from 3,820 wildfire events (2017-2023) across US and Canada biomes. It systematically compares full fine-tuning, decoder-only fine-tuning, and LoRA adaptation, claiming that LoRA delivers the strongest cross-domain generalization while updating <1% of parameters and that Prithvi-v2+LoRA attains the highest accuracy with the largest gains relative to full fine-tuning.

Significance. If the empirical results hold under verified splits, the work would establish a clear efficiency-accuracy trade-off for adapting large GFMs to domain-shifted Earth-observation tasks, with direct relevance to scalable wildfire monitoring and emissions modeling. The provision of code further supports reproducibility.

major comments (2)
  1. [Experimental setup / data partitioning] The headline generalization claim (LoRA strongest cross-domain performance) is load-bearing on the spatial/temporal splits of the 3,820 events. The manuscript must explicitly describe the partitioning procedure (e.g., how events are assigned by geographic coordinates, year, or biome to ensure zero event-level, pixel-level, or seasonal overlap between train and test sets) and report any checks for leakage; without this, the reported accuracy deltas cannot be confidently attributed to adaptation robustness rather than memorization or selection effects.
  2. [Results and discussion] Results lack specification of the exact evaluation metrics (e.g., IoU, F1-score, overall accuracy), statistical tests for significance of differences, and complete baseline tables including all three adaptation methods for each model. These omissions prevent verification of the claim that Prithvi-v2+LoRA shows the largest improvement over full fine-tuning.
minor comments (2)
  1. [Abstract / code availability] The code repository link is a positive contribution for reproducibility; ensure the released scripts include the exact train/test split generation code and hyperparameter settings used for each GFM.
  2. [Methods] Clarify the precise fraction of parameters updated by LoRA for each model (Terramind, DINOv3, Prithvi-v2) and whether rank and alpha were held constant across experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below. Where the manuscript requires clarification or expansion, we will revise accordingly to strengthen the presentation of our experimental setup and results.

read point-by-point responses
  1. Referee: [Experimental setup / data partitioning] The headline generalization claim (LoRA strongest cross-domain performance) is load-bearing on the spatial/temporal splits of the 3,820 events. The manuscript must explicitly describe the partitioning procedure (e.g., how events are assigned by geographic coordinates, year, or biome to ensure zero event-level, pixel-level, or seasonal overlap between train and test sets) and report any checks for leakage; without this, the reported accuracy deltas cannot be confidently attributed to adaptation robustness rather than memorization or selection effects.

    Authors: We agree that explicit details on the data partitioning are necessary to support the cross-domain generalization claims. The current manuscript mentions spatial and temporal generalization tests across biomes but does not provide a full procedural description. In the revised version, we will add a dedicated subsection (likely in Methods) that specifies: (1) assignment of the 3,820 events by geographic coordinates (e.g., disjoint US/Canada regions or biomes), (2) temporal splits by year ranges to avoid seasonal overlap, and (3) verification steps confirming zero event-level, pixel-level, or seasonal leakage between train and test sets. We will also report the resulting split sizes, biome coverage, and any leakage checks performed. This revision will allow readers to attribute performance differences more confidently to the adaptation methods rather than data artifacts. revision: yes

  2. Referee: [Results and discussion] Results lack specification of the exact evaluation metrics (e.g., IoU, F1-score, overall accuracy), statistical tests for significance of differences, and complete baseline tables including all three adaptation methods for each model. These omissions prevent verification of the claim that Prithvi-v2+LoRA shows the largest improvement over full fine-tuning.

    Authors: We acknowledge these omissions limit verifiability. The experiments use Intersection-over-Union (IoU) and F1-score as primary metrics for burned-area segmentation, supplemented by overall accuracy; these will be explicitly stated in a new Evaluation Metrics paragraph in the revised manuscript. We will also add statistical significance testing (e.g., paired t-tests across the 3,820 events or bootstrap confidence intervals) for all reported differences. Finally, we will expand the results tables to present all three adaptation strategies (full fine-tuning, decoder-only fine-tuning, and LoRA) side-by-side for each of the three GFMs, enabling direct comparison and verification of the largest gains for Prithvi-v2+LoRA. These changes will be incorporated without altering the underlying experimental outcomes. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct empirical measurements from held-out splits

full rationale

The paper reports experimental results from applying full fine-tuning, decoder-only fine-tuning, and LoRA to three pre-trained geospatial foundation models on a dataset of 3,820 wildfire events (2017-2023) with explicit spatial and temporal splits across US/Canada biomes. Key claims (LoRA strongest cross-domain generalization, Prithvi-v2+LoRA highest accuracy and largest gain vs full fine-tuning) are computed directly from accuracy metrics on those test sets. No mathematical derivation chain exists; no parameters are fitted to a subset and then presented as predictions of related quantities; no self-citations are invoked to justify uniqueness or ansatzes that bear the central result; and no known empirical patterns are renamed as novel derivations. The evaluation is self-contained against the reported data splits and standard adaptation techniques.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim is an empirical performance comparison rather than a derivation, so it rests primarily on the assumption that the experimental design validly tests generalization.

axioms (1)
  • domain assumption The 3,820 wildfire events and Sentinel-2 imagery splits adequately represent geographic and temporal domain shifts for burned-area mapping.
    Invoked to support claims of cross-domain generalization in the abstract.

pith-pipeline@v0.9.0 · 5556 in / 1201 out tokens · 85103 ms · 2026-05-08T18:21:18.522204+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Burned area determination using sentinel-2 satellite images and the impact of fire on the availability of soil nutrients in syria

    R. Al-Hasn and R. Almuhammad, “Burned area determination using sentinel-2 satellite images and the impact of fire on the availability of soil nutrients in syria.” 2022

  2. [2]

    Mapping burned areas in thai- land using sentinel-2 imagery and obia techniques,

    C. Suwanprasit and Shahnawaz, “Mapping burned areas in thai- land using sentinel-2 imagery and obia techniques,”Scientific Reports, vol. 14, no. 1, p. 9609, 2024

  3. [3]

    A deep learning approach for burned area segmentation with sentinel-2 data,

    L. Knopp, M. Wieland, M. R ¨attich, and S. Martinis, “A deep learning approach for burned area segmentation with sentinel-2 data,”Remote Sensing, vol. 12, no. 15, p. 2422, 2020

  4. [4]

    Semantic segmentation of burned areas in satellite images using a u-net-based convolutional neu- ral network,

    A. Brand and A. Manandhar, “Semantic segmentation of burned areas in satellite images using a u-net-based convolutional neu- ral network,”The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 43, pp. 47–53, 2021

  5. [5]

    Biau- net: Wildfire burnt area mapping using bi-temporal sentinel- 2 imagery and u-net with attention mechanism,

    T. Sui, Q. Huang, M. Wu, M. Wu, and Z. Zhang, “Biau- net: Wildfire burnt area mapping using bi-temporal sentinel- 2 imagery and u-net with attention mechanism,”International Journal of Applied Earth Observation and Geoinformation, vol. 132, p. 104034, 2024

  6. [6]

    Anand, R

    A. Anand, R. Imasu, S. K. Dhaka, and P. K. Patra, “Domain adaptation and fine-tuning of a deep learning segmentation model of small agricultural burn area detection using high- resolution sentinel-2 observations: A case study of punjab, india,”Remote Sensing, vol. 17, no. 6, p. 974, 2025

  7. [7]

    Prithvi: Large-scale multimodal fms for earth observation,

    M. e. a. Reichstein, “Prithvi: Large-scale multimodal fms for earth observation,”NeurIPS, 2023

  8. [8]

    Terramind: Modality-agnostic geospatial foundation model,

    A. A. Lab, “Terramind: Modality-agnostic geospatial foundation model,”CVPR, 2024

  9. [9]

    DINOv3

    O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoa et al., “Dinov3,”arXiv preprint arXiv:2508.10104, 2025

  10. [10]

    Pangaea: A global and inclusive benchmark for geospatial foundation models.arXiv preprint arXiv:2412.04204, 2024

    V . Marsocci, Y . Jia, G. L. Bellier, D. Kerekes, L. Zeng, S. Hafner, S. Gerard, E. Brune, R. Yadav, A. Shibliet al., “Pangaea: A global and inclusive benchmark for geospatial foundation models,”arXiv preprint arXiv:2412.04204, 2024

  11. [11]

    Geo- bench-2: From performance to capability, rethinking evaluation in geospatial ai.arXiv preprint arXiv:2511.15658, 2025

    N. Simumba, N. Lehmann, P. Fraccaro, H. Alemohammad, G. De Mel, S. Khan, M. Maskey, N. Longepe, X. X. Zhu, H. Kerneret al., “Geo-bench-2: From performance to capa- bility, rethinking evaluation in geospatial ai,”arXiv preprint arXiv:2511.15658, 2025

  12. [12]

    Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

    Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter- efficient fine-tuning for large models: A comprehensive survey,” arXiv preprint arXiv:2403.14608, 2024

  13. [13]

    Parameter-efficient fine-tuning for pre-trained vision models: A survey,

    Y . Xin, S. Luo, H. Zhou, J. Du, X. Liu, Y . Fan, Q. Li, and Y . Du, “Parameter-efficient fine-tuning for pre-trained vision models: A survey,”arXiv e-prints, pp. arXiv–2402, 2024

  14. [14]

    Fine-tune smarter, not harder: Parameter- efficient fine-tuning for geospatial foundation models,

    F. Marti Escofet, B. Blumenstiel, L. Scheibenreif, P. Fraccaro, and K. Schindler, “Fine-tune smarter, not harder: Parameter- efficient fine-tuning for geospatial foundation models,” inJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2025, pp. 516–532

  15. [15]

    Lora: Low-rank adaptation of large language models

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.”ICLR, vol. 1, no. 2, p. 3, 2022

  16. [16]

    Unified perceptual parsing for scene understanding,

    T. Xiao, Y . Liu, B. Zhou, Y . Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 418–434

  17. [17]

    A project for monitoring trends in burn severity,

    J. Eidenshink, B. Schwind, K. Brewer, Z.-L. Zhu, B. Quayle, and S. Howard, “A project for monitoring trends in burn severity,”Fire ecology, vol. 3, no. 1, pp. 3–21, 2007

  18. [18]

    National burned area composite (NBAC) — annual burned area polygons,

    Canadian Forest Service, “National burned area composite (NBAC) — annual burned area polygons,” 2023, government of Canada metadata catalogue