arxiv: 2604.26051 · v1 · submitted 2026-04-28 · 💻 cs.CV · cs.AI

Recognition: unknown

Evaluating the Alignment Between GeoAI Explanations and Domain Knowledge in Satellite-Based Flood Mapping

Hyunho Lee , Wenwen Li

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:41 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords GeoAIexplainable AIflood mappingsatellite imagerySHAP explanationsdomain knowledge alignmentremote sensing

0 comments

The pith

The ADAGE framework quantitatively measures how well deep learning explanations for satellite flood mapping align with remote sensing domain knowledge on spectral properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the ADAGE framework to check if explanations from deep learning models used in satellite-based flood mapping match what domain experts know about how different parts of the light spectrum reveal water and land. Deep learning models can map floods accurately but their decisions are hard to understand, which limits their use in science and operations. By grouping input channels and using SHAP values to see contributions, ADAGE creates scores that show alignment or misalignment with reference explanations built from domain knowledge. Experiments on two flood mapping tasks show this works to quantify the match and flag cases where the model relies on unexpected features.

Core claim

The ADAGE framework employs Channel-Group SHAP to estimate the contributions of grouped input channels to pixel-level predictions. Experiments on two satellite-based flood mapping tasks demonstrate that the ADAGE framework can quantitatively assess the alignment between model explanations and reference explanations derived from domain knowledge and help domain experts identify misaligned explanations through alignment scores.

What carries the argument

The ADAGE framework, which applies Channel-Group SHAP to group satellite image channels, compute their contributions to predictions, and compare those to reference explanations based on distinctive spectral properties of the Earth's surface.

If this is right

Models with high alignment scores are using physically meaningful spectral features for flood predictions.
Low alignment scores can direct experts to specific channels or pixels where the model relies on non-standard patterns.
Such scores could support selection or refinement of models for operational flood monitoring workflows.
The framework provides a repeatable way to document how closely a GeoAI model follows established remote sensing principles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same alignment evaluation approach could be tested on other GeoAI tasks such as land cover classification or change detection.
Training models with an explicit term that rewards higher ADAGE scores might produce explanations that better match domain knowledge over time.
If reference explanations vary with different expert groups, the framework could incorporate ranges of scores rather than single values.

Load-bearing premise

Reference explanations derived from domain knowledge about distinctive spectral properties of the Earth's surface can be accurately constructed and directly compared to Channel-Group SHAP contributions without introducing bias or loss of meaning.

What would settle it

An experiment where independent remote sensing experts review the model's behavior on cases flagged as misaligned by ADAGE scores and find that the model's actual decision process does not deviate from expected spectral patterns in the way the scores indicate.

Figures

Figures reproduced from arXiv: 2604.26051 by Hyunho Lee, Wenwen Li.

**Figure 1.** Figure 1: Overview of the ADAGE (Alignment between Domain Knowledge And GeoAI Explanation Evaluation) framework. This definition extends the principles of Group Shapley Value (Jullum et al., 2021; Huber et al., 2023), which originally focused on feature groups, to the concept of channel groups in semantic segmentation models for environmental mapping. Channel-Group SHAP is formulated based on this extended concept … view at source ↗

**Figure 2.** Figure 2: Ternary plot visualizing the proportion of channel groups that contributed the most to true positive water pixel predictions under cloud-covered conditions, with stacked histograms for the three cases showing the number of true positive water pixels according to NIR reflectance. In view at source ↗

**Figure 3.** Figure 3: Visualization of prediction maps and corresponding Most Contributed Channel Group (MCCG) maps for cases (a), (b), and (c) for a single individual sample. selected. Notably, cases (a), (b), and (c) show models with different proportions of channel groups that contribute most to true positive water pixel predictions under cloudy conditions, even when trained under the same configurations. As in case (a), mod… view at source ↗

**Figure 4.** Figure 4: Visualization of prediction maps and corresponding Most Contributed Channel Group (MCCG) maps for cases (a), (b), and (c) for another individual sample. presence. The MCCG maps for case (c) in view at source ↗

**Figure 5.** Figure 5: Scatterplot showing the relationship between IoU and alignment score in flooded urban area mapping. indicates that these two metrics capture different aspects of model performance. Therefore, this suggests that when a high alignment score with a specific reference explanation is required, both a high IoU and a high alignment score can be used as criteria for model selection for deployment to support real-w… view at source ↗

**Figure 6.** Figure 6: Workflow for applying the ADAGE framework to DLSS-RS models. trained with rule-based labeling using SAR intensity inferred in a manner consistent with the training data labeling rule even on manually labeled test data. In contrast, for flooded urban area mapping, models trained with rule-based labeling using SAR interferometric coherence and WSF2019 exhibited patterns that partially relied on SAR intensity… view at source ↗

read the original abstract

The increasing number of satellites has improved the temporal resolution of Earth observation, making satellite-based flood mapping a promising approach for operational flood monitoring. Deep learning-based approaches for flood mapping using satellite imagery, an important application within Geospatial Artificial Intelligence (GeoAI), have shown improved predictive performance by learning complex spatial and spectral patterns from large volumes of remote sensing data. However, the opaque decision-making processes of deep learning models remain a major barrier to their integration into critical scientific and operational workflows. This highlights the need for a systematic assessment of whether model explanations align with established domain knowledge in remote sensing. To address this research gap, this study introduces the ADAGE (Alignment between Domain Knowledge And GeoAI Explanation Evaluation) framework. The proposed framework is designed to systematically evaluate how well explanations of deep learning models align with established remote sensing knowledge, particularly regarding the distinctive spectral properties of the Earth's surface. The ADAGE framework employs Channel-Group SHAP (SHapley Additive exPlanations) method to estimate the contributions of grouped input channels to pixel-level predictions. Experiments on two satellite-based flood mapping tasks demonstrate that the ADAGE framework can (1) quantitatively assess the alignment between model explanations and reference explanations derived from domain knowledge and (2) help domain experts identify misaligned explanations through alignment scores. This study contributes to bridging the gap between explainability and domain knowledge in GeoAI for Earth observation, enhancing the applicability of GeoAI models in scientific and operational workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ADAGE gives a structured way to score how well DL explanations for satellite flood mapping match remote sensing knowledge, but the abstract leaves the actual results and reference construction too vague to judge if it works.

read the letter

The main point is that this paper introduces the ADAGE framework to check whether Channel-Group SHAP explanations from deep learning flood-mapping models line up with what remote sensing experts already know about spectral signatures of water and land. It applies this to two satellite tasks and claims the scores can both quantify alignment and flag cases where the model is using the wrong cues. That is the concrete new piece: a repeatable scoring method tied to domain references rather than generic XAI metrics. The paper does a solid job stating the real-world barrier—opaque models slow down adoption in operational Earth observation—and it correctly identifies that domain knowledge often comes from indices like NDWI rather than raw per-band contributions. Framing the problem this way is useful for the subfield. The soft spots sit in the missing substance. The abstract asserts that experiments demonstrate quantitative assessment and misalignment detection, yet it supplies no alignment numbers, no description of how the reference vectors were actually built from spectral rules, and no discussion of normalization or grouping choices. Without those, it is impossible to tell whether the metric preserves meaning or just adds another layer of fitting. The stress-test worry about translation bias between index-based knowledge and additive SHAP values is real on the current evidence; if the full paper does not spell out the exact mapping steps and test their sensitivity, the scores remain hard to trust. This work is for GeoAI researchers and remote-sensing practitioners who already use SHAP and want a domain-specific sanity check. A reader who needs a ready-to-use validation tool will find the idea but will still have to fill in the implementation gaps. It deserves a serious referee because the gap it targets is genuine and the proposed structure is new enough to be worth refining. I would send it to review with a clear request for the reference-construction details, the actual scores, and a sensitivity check on the grouping step.

Referee Report

2 major / 2 minor

Summary. The paper introduces the ADAGE framework to systematically evaluate the alignment between deep learning model explanations (generated via Channel-Group SHAP on grouped input channels) and established remote sensing domain knowledge regarding spectral properties of Earth's surface. Experiments on two satellite-based flood mapping tasks are presented as demonstrating that the framework can quantitatively assess this alignment and assist domain experts in identifying misaligned explanations through computed alignment scores.

Significance. If the claims hold after addressing the methodological gaps, the work would be significant for GeoAI by providing a structured approach to validate model explanations against domain knowledge in Earth observation. This could increase trust in deep learning for operational flood monitoring, where opaque decisions currently hinder adoption. The adaptation of grouped SHAP to multi-spectral satellite data is a sensible technical choice that aligns with the multi-channel nature of the inputs.

major comments (2)

[§3] §3 (ADAGE framework description): The construction of reference explanations from domain knowledge is under-specified. Domain knowledge in remote sensing typically employs derived indices (e.g., NDWI, MNDWI) or threshold rules on band combinations rather than additive per-channel attributions; the manuscript must explicitly detail the translation process into a reference vector for Channel-Group SHAP comparison, including any grouping, normalization, or sign conventions, to avoid introducing bias or loss of meaning in the alignment metric.
[§4] §4 (Experiments): The abstract asserts that the two tasks demonstrate quantitative assessment of alignment and identification of misalignments, yet no specific alignment scores, error analysis, task details, or example misaligned cases are reported. This leaves the central empirical claim without verifiable support and prevents assessment of whether the metric is robust.

minor comments (2)

[Abstract] Abstract: The expansion of the ADAGE acronym is given but should be repeated on first use in the main text for clarity.
[Figures] Ensure that any figures showing alignment scores include error bars or statistical significance tests to support the identification of misalignments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. We address each major comment point by point below and commit to revisions that will strengthen the clarity and empirical support of the work.

read point-by-point responses

Referee: [§3] §3 (ADAGE framework description): The construction of reference explanations from domain knowledge is under-specified. Domain knowledge in remote sensing typically employs derived indices (e.g., NDWI, MNDWI) or threshold rules on band combinations rather than additive per-channel attributions; the manuscript must explicitly detail the translation process into a reference vector for Channel-Group SHAP comparison, including any grouping, normalization, or sign conventions, to avoid introducing bias or loss of meaning in the alignment metric.

Authors: We agree that the current description in Section 3 leaves the translation from domain knowledge to reference vectors under-specified, which could introduce ambiguity. In the revised manuscript we will add a dedicated subsection 'Constructing Reference Explanations from Domain Knowledge' that explicitly details: (i) the mapping from established indices (NDWI, MNDWI, and threshold rules on band combinations) to expected per-channel contributions; (ii) the channel-grouping strategy chosen to align with the Channel-Group SHAP implementation; (iii) the normalization procedure applied to both reference and SHAP vectors; and (iv) sign conventions derived from remote-sensing literature on flood spectral signatures. We will also discuss how these choices preserve the semantic meaning of the domain knowledge and mitigate potential bias in the alignment metric. revision: yes
Referee: [§4] §4 (Experiments): The abstract asserts that the two tasks demonstrate quantitative assessment of alignment and identification of misalignments, yet no specific alignment scores, error analysis, task details, or example misaligned cases are reported. This leaves the central empirical claim without verifiable support and prevents assessment of whether the metric is robust.

Authors: We acknowledge that the experimental section would benefit from more explicit quantitative reporting. While the manuscript describes the two satellite-based flood-mapping tasks and states that alignment scores are computed to assess and flag misalignments, specific numerical scores, robustness analysis, and concrete examples are not presented at a level that allows independent verification. In the revision we will expand Section 4 to report the actual alignment scores (including the metric used), provide an error/robustness analysis of the alignment metric under variations in reference construction, include full task details (datasets, preprocessing, model architectures), and add illustrative examples or tables of misaligned cases together with their scores. These additions will make the central empirical claims fully verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: ADAGE compares Channel-Group SHAP outputs to independently constructed domain-knowledge references

full rationale

The paper introduces the ADAGE framework to compute alignment scores between model explanations (via Channel-Group SHAP on satellite imagery channels) and reference explanations built from external remote-sensing domain knowledge (distinctive spectral properties of water/land surfaces). This comparison is not derived from the model's own fitted parameters or prior outputs; the references are constructed separately from established spectral indices and properties. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The central claim—that alignment can be quantitatively assessed and misalignments flagged—rests on the external validity of the reference construction and the SHAP method, both treated as independent inputs rather than outputs of the same loop. The framework is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the validity of domain-derived reference explanations and the appropriateness of Channel-Group SHAP for capturing spectral contributions; no explicit free parameters or invented entities are described.

axioms (1)

domain assumption Domain knowledge provides reliable reference explanations for the distinctive spectral properties relevant to flood mapping
The framework uses these references to compute alignment scores and identify misalignments.

pith-pipeline@v0.9.0 · 5559 in / 1209 out tokens · 50240 ms · 2026-05-07T16:41:17.066643+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Responsetofloodevents:Theroleofsatellite-basedemergency mappingandtheexperienceofthecopernicusemergencymanagementservice

Ajmar,A.,Boccardo,P.,Broglia,M.,Kucera,J.,Giulio-Tonolo,F.,Wania,A.,2017. Responsetofloodevents:Theroleofsatellite-basedemergency mappingandtheexperienceofthecopernicusemergencymanagementservice. Flooddamagesurveyandassessment:Newinsightsfromresearch and practice , 211–228. Akiva, P., Purri, M., Dana, K., Tellman, B., Anderson, T.,

2017
[2]

Sentinel-1-basedwaterandfloodmapping:Benchmarkingconvolutional neuralnetworksagainstanoperationalrule-basedprocessingchain

Bereczky,M.,Wieland,M.,Krullikowski,C.,Martinis,S.,Plank,S.,2022. Sentinel-1-basedwaterandfloodmapping:Benchmarkingconvolutional neuralnetworksagainstanoperationalrule-basedprocessingchain. IEEEJournalofSelectedTopicsinAppliedEarthObservationsandRemote Sensing 15, 2023–2036. Boccardo, P., Giulio Tonolo, F.,

2022
[3]

Springer, pp

Remote sensing role in emergency mapping for disaster response, in: Engineering Geology for Society and Territory-Volume 5: Urban Geology, Sustainable Planning and Landscape Exploitation. Springer, pp. 17–24. Bonafilia,D.,Tellman,B.,Anderson,T.,Issenberg,E.,2020. Sen1floods11:Ageoreferenceddatasettotrainandtestdeeplearningfloodalgorithms for sentinel-1, i...

2020
[4]

a global multi-temporal satellite dataset for rapid flood mapping

Kuro siwo: 33 billion𝑚 2 under the water. a global multi-temporal satellite dataset for rapid flood mapping. arXiv preprint arXiv:2311.12056 . Bountos, N.I., Sdraka, M., Zavras, A., Karavias, A., Karasante, I., Herekakis, T., Thanasou, A., Michail, D., Papoutsis, I.,

work page arXiv
[5]

a global multi-temporal satellite dataset for rapid flood mapping, in: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

Kuro siwo: 33 billion mˆ2 under the water. a global multi-temporal satellite dataset for rapid flood mapping, in: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (Eds.), Advances in Neural Information Processing Sys- tems, Curran Associates, Inc.. pp. 38105–38121. URL:https://proceedings.neurips.cc/paper_files/paper/20...

2024
[6]

Hydrological processes 26, 1617–1628

A synergetic use of satellite imagery from sar and optical sensors to improve coastal flood mapping in the gulf of mexico. Hydrological processes 26, 1617–1628. Chen,L.,Cai,X.,Li,Z.,Xing,J.,Ai,J.,2024. Whereismyattention?anexplainableaiexplorationinwaterdetectionfromsarimagery. International Journal of Applied Earth Observation and Geoinformation 130, 103...

2024
[7]

URL:https://doi.org/10

A global flood events and cloud cover dataset (version 1.0). URL:https://doi.org/10. 34911/rdnt.oz32gz. [Accessed on 2024-05-12]. Dong,Z.,Wang,G.,Amankwah,S.O.Y.,Wei,X.,Hu,Y.,Feng,A.,2021. Monitoringthesummerfloodinginthepoyanglakeareaofchinain2020 based on sentinel-1 data and multiple convolutional neural networks. International Journal of Applied Earth ...

2024
[8]

Nature Machine Intelligence 2, 665–673

Shortcut learning in deep neural networks. Nature Machine Intelligence 2, 665–673. Gipiškis,R.,Tsai,C.W.,Kurasova,O.,2024. Explainableai(xai)inimagesegmentationinmedicine,industry,andbeyond:Asurvey. ICTExpress 10, 1331–1354. Grimaldi, S., Xu, J., Li, Y., Pauwels, V.R., Walker, J.P.,

2024
[9]

IEEE Geoscience and Remote Sensing Magazine

Opening the black box: A systematic review on explainable artificial intelligence in remote sensing. IEEE Geoscience and Remote Sensing Magazine . Hsu,C.Y.,Li,W.,2023. Explainablegeoai:cansaliencymapshelpinterpretartificialintelligence’slearningprocess?anempiricalstudyonnatural feature detection. International Journal of Geographical Information Science 3...

2023
[10]

arXiv preprint arXiv:2106.12228

groupshapley: Efficient prediction explanation with shapley values for feature groups. arXiv preprint arXiv:2106.12228 . Kakogeorgiou, I., Karantzalos, K.,

work page arXiv
[11]

International Journal of Applied Earth Observation and Geoinformation 103, 102520

Evaluating explainable artificial intelligence methods for multi-label deep learning classification tasks in remote sensing. International Journal of Applied Earth Observation and Geoinformation 103, 102520. Kang,W.,Xiang,Y.,Wang,F.,Wan,L.,You,H.,2018. Flooddetectioningaofen-3sarimagesviafullyconvolutionalnetworks. Sensors18,2915. Katiyar, V., Tamkuan, N....

2018
[12]

Theneedfortrainingandbenchmarkdatasetsforconvolutionalneuralnetworksinfloodapplications

Khouakhi,A.,Zawadzka,J.,Truckell,I.,2022. Theneedfortrainingandbenchmarkdatasetsforconvolutionalneuralnetworksinfloodapplications. Hydrology Research 53, 795–806. Kierdorf,J.,Stomberg,T.T.,Drees,L.,Rascher,U.,Roscher,R.,2024. Investigatingthecontributionofimagetimeseriesobservationstocauliflower harvest-readiness prediction. Frontiers in Artificial Intell...

2022
[13]

Adam: A Method for Stochastic Optimization

Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 . Konapala, G., Kumar, S.V., Ahmad, S.K.,

work page internal anchor Pith review arXiv
[14]

ISPRS Journal of Photogrammetry and Remote Sensing 180, 163–173

Exploring sentinel-1 and sentinel-2 diversity for flood inundation mapping using deep learning. ISPRS Journal of Photogrammetry and Remote Sensing 180, 163–173. Kotaridis,I.,Lazaridou,M.,2021. Remotesensingimagesegmentationadvances:Ameta-analysis. ISPRSJournalofPhotogrammetryandRemote Sensing 173, 309–322. Lee, H., Li, W.,

2021
[15]

Thin cloud removal fusing full spectral and spatial features for sentinel-2 imagery

Li, J., Zhang, Y., Sheng, Q., Wu, Z., Wang, B., Hu, Z., Shen, G., Schmitt, M., Molinier, M., 2022a. Thin cloud removal fusing full spectral and spatial features for sentinel-2 imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 8759–8775. Li,W.,Lee,H.,Wang,S.,Hsu,C.Y.,Arundel,S.T.,2023.Assessmentofanewgeoaifoundat...

2023
[16]

Science of The Total Environment 869, 161757

U-net-based semantic classification for flood extent extraction using sar imagery and gee platform: A case study for 2019 central us flooding. Science of The Total Environment 869, 161757. Lundberg, S.M., Lee, S.I.,

2019
[17]

Nearinfraredbandoflandsat8aswaterindex:acasestudyaroundcordovaandlapu-lapucity,cebu,philippines

Mondejar,J.P.,Tongco,A.F.,2019. Nearinfraredbandoflandsat8aswaterindex:acasestudyaroundcordovaandlapu-lapucity,cebu,philippines. Sustainable Environment Research 29,

2019
[18]

IEEE Access 10, 96774– 96787

Mmflood: A multimodal dataset for flood delineation from satellite imagery. IEEE Access 10, 96774– 96787. Muñoz,D.F.,Muñoz,P.,Moftakhari,H.,Moradkhani,H.,2021. Fromlocaltoregionalcompoundfloodmappingwithdeeplearninganddatafusion techniques. Science of the Total Environment 782, 146927. Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y...

2021
[19]

ACM Computing Surveys 55, 1–42

From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai. ACM Computing Surveys 55, 1–42. Nemni,E.,Bullock,J.,Belabbes,S.,Bromley,L.,2020. Fullyconvolutionalneuralnetworkforrapidfloodsegmentationinsyntheticapertureradar imagery. Remote Sensing 12,

2020
[20]

Science of Remote Sensing 11, 100210

Evaluating the robustness of bayesian flood mapping with sentinel-1 data: A multi-event validation study. Science of Remote Sensing 11, 100210. Saleh,T.,Weng,X.,Holail,S.,Hao,C.,Xia,G.S.,2024. Dam-net:Flooddetectionfromsarimageryusingdifferentialattentionmetric-basedvision transformers. ISPRS Journal of Photogrammetry and Remote Sensing 212, 440–453. Sand...

2024
[21]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 . Sun,L.,Mi,X.,Wei,J.,Wang,J.,Tian,X.,Yu,H.,Gan,P.,2017. Aclouddetectionalgorithm-generatingmethodforremotesensingdataatvisible to short-wave infrared wavelengths. ISPRS journal of photogrammetry and remote sensing 124, 70–88. Tarp...

work page internal anchor Pith review arXiv 2017
[22]

Wang, R., Zhang, C., Chen, C., Hao, H., Li, W., Jiao, L.,

Wagner, W., Bauer-Marschallinger, B., Roth, F., Raiger-Stachl, T., Reimer, C., McCormick, N., Matgen, P., Chini, M., Li, Y., Martinis, S., et al., 2026.Thefully-automaticsentinel-1globalfloodmonitoringservice:Scientificchallengesandfuturedirections.RemoteSensingofEnvironment 333, 115108. Wang, R., Zhang, C., Chen, C., Hao, H., Li, W., Jiao, L.,

2026
[23]

Remote Sensing 14,

Flood detection using multiple chinese satellite datasets during 2020 china summer floods. Remote Sensing 14,

2020