Recognition: unknown
Evaluating the Alignment Between GeoAI Explanations and Domain Knowledge in Satellite-Based Flood Mapping
Pith reviewed 2026-05-07 16:41 UTC · model grok-4.3
The pith
The ADAGE framework quantitatively measures how well deep learning explanations for satellite flood mapping align with remote sensing domain knowledge on spectral properties.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The ADAGE framework employs Channel-Group SHAP to estimate the contributions of grouped input channels to pixel-level predictions. Experiments on two satellite-based flood mapping tasks demonstrate that the ADAGE framework can quantitatively assess the alignment between model explanations and reference explanations derived from domain knowledge and help domain experts identify misaligned explanations through alignment scores.
What carries the argument
The ADAGE framework, which applies Channel-Group SHAP to group satellite image channels, compute their contributions to predictions, and compare those to reference explanations based on distinctive spectral properties of the Earth's surface.
If this is right
- Models with high alignment scores are using physically meaningful spectral features for flood predictions.
- Low alignment scores can direct experts to specific channels or pixels where the model relies on non-standard patterns.
- Such scores could support selection or refinement of models for operational flood monitoring workflows.
- The framework provides a repeatable way to document how closely a GeoAI model follows established remote sensing principles.
Where Pith is reading between the lines
- The same alignment evaluation approach could be tested on other GeoAI tasks such as land cover classification or change detection.
- Training models with an explicit term that rewards higher ADAGE scores might produce explanations that better match domain knowledge over time.
- If reference explanations vary with different expert groups, the framework could incorporate ranges of scores rather than single values.
Load-bearing premise
Reference explanations derived from domain knowledge about distinctive spectral properties of the Earth's surface can be accurately constructed and directly compared to Channel-Group SHAP contributions without introducing bias or loss of meaning.
What would settle it
An experiment where independent remote sensing experts review the model's behavior on cases flagged as misaligned by ADAGE scores and find that the model's actual decision process does not deviate from expected spectral patterns in the way the scores indicate.
Figures
read the original abstract
The increasing number of satellites has improved the temporal resolution of Earth observation, making satellite-based flood mapping a promising approach for operational flood monitoring. Deep learning-based approaches for flood mapping using satellite imagery, an important application within Geospatial Artificial Intelligence (GeoAI), have shown improved predictive performance by learning complex spatial and spectral patterns from large volumes of remote sensing data. However, the opaque decision-making processes of deep learning models remain a major barrier to their integration into critical scientific and operational workflows. This highlights the need for a systematic assessment of whether model explanations align with established domain knowledge in remote sensing. To address this research gap, this study introduces the ADAGE (Alignment between Domain Knowledge And GeoAI Explanation Evaluation) framework. The proposed framework is designed to systematically evaluate how well explanations of deep learning models align with established remote sensing knowledge, particularly regarding the distinctive spectral properties of the Earth's surface. The ADAGE framework employs Channel-Group SHAP (SHapley Additive exPlanations) method to estimate the contributions of grouped input channels to pixel-level predictions. Experiments on two satellite-based flood mapping tasks demonstrate that the ADAGE framework can (1) quantitatively assess the alignment between model explanations and reference explanations derived from domain knowledge and (2) help domain experts identify misaligned explanations through alignment scores. This study contributes to bridging the gap between explainability and domain knowledge in GeoAI for Earth observation, enhancing the applicability of GeoAI models in scientific and operational workflows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the ADAGE framework to systematically evaluate the alignment between deep learning model explanations (generated via Channel-Group SHAP on grouped input channels) and established remote sensing domain knowledge regarding spectral properties of Earth's surface. Experiments on two satellite-based flood mapping tasks are presented as demonstrating that the framework can quantitatively assess this alignment and assist domain experts in identifying misaligned explanations through computed alignment scores.
Significance. If the claims hold after addressing the methodological gaps, the work would be significant for GeoAI by providing a structured approach to validate model explanations against domain knowledge in Earth observation. This could increase trust in deep learning for operational flood monitoring, where opaque decisions currently hinder adoption. The adaptation of grouped SHAP to multi-spectral satellite data is a sensible technical choice that aligns with the multi-channel nature of the inputs.
major comments (2)
- [§3] §3 (ADAGE framework description): The construction of reference explanations from domain knowledge is under-specified. Domain knowledge in remote sensing typically employs derived indices (e.g., NDWI, MNDWI) or threshold rules on band combinations rather than additive per-channel attributions; the manuscript must explicitly detail the translation process into a reference vector for Channel-Group SHAP comparison, including any grouping, normalization, or sign conventions, to avoid introducing bias or loss of meaning in the alignment metric.
- [§4] §4 (Experiments): The abstract asserts that the two tasks demonstrate quantitative assessment of alignment and identification of misalignments, yet no specific alignment scores, error analysis, task details, or example misaligned cases are reported. This leaves the central empirical claim without verifiable support and prevents assessment of whether the metric is robust.
minor comments (2)
- [Abstract] Abstract: The expansion of the ADAGE acronym is given but should be repeated on first use in the main text for clarity.
- [Figures] Ensure that any figures showing alignment scores include error bars or statistical significance tests to support the identification of misalignments.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We address each major comment point by point below and commit to revisions that will strengthen the clarity and empirical support of the work.
read point-by-point responses
-
Referee: [§3] §3 (ADAGE framework description): The construction of reference explanations from domain knowledge is under-specified. Domain knowledge in remote sensing typically employs derived indices (e.g., NDWI, MNDWI) or threshold rules on band combinations rather than additive per-channel attributions; the manuscript must explicitly detail the translation process into a reference vector for Channel-Group SHAP comparison, including any grouping, normalization, or sign conventions, to avoid introducing bias or loss of meaning in the alignment metric.
Authors: We agree that the current description in Section 3 leaves the translation from domain knowledge to reference vectors under-specified, which could introduce ambiguity. In the revised manuscript we will add a dedicated subsection 'Constructing Reference Explanations from Domain Knowledge' that explicitly details: (i) the mapping from established indices (NDWI, MNDWI, and threshold rules on band combinations) to expected per-channel contributions; (ii) the channel-grouping strategy chosen to align with the Channel-Group SHAP implementation; (iii) the normalization procedure applied to both reference and SHAP vectors; and (iv) sign conventions derived from remote-sensing literature on flood spectral signatures. We will also discuss how these choices preserve the semantic meaning of the domain knowledge and mitigate potential bias in the alignment metric. revision: yes
-
Referee: [§4] §4 (Experiments): The abstract asserts that the two tasks demonstrate quantitative assessment of alignment and identification of misalignments, yet no specific alignment scores, error analysis, task details, or example misaligned cases are reported. This leaves the central empirical claim without verifiable support and prevents assessment of whether the metric is robust.
Authors: We acknowledge that the experimental section would benefit from more explicit quantitative reporting. While the manuscript describes the two satellite-based flood-mapping tasks and states that alignment scores are computed to assess and flag misalignments, specific numerical scores, robustness analysis, and concrete examples are not presented at a level that allows independent verification. In the revision we will expand Section 4 to report the actual alignment scores (including the metric used), provide an error/robustness analysis of the alignment metric under variations in reference construction, include full task details (datasets, preprocessing, model architectures), and add illustrative examples or tables of misaligned cases together with their scores. These additions will make the central empirical claims fully verifiable. revision: yes
Circularity Check
No circularity: ADAGE compares Channel-Group SHAP outputs to independently constructed domain-knowledge references
full rationale
The paper introduces the ADAGE framework to compute alignment scores between model explanations (via Channel-Group SHAP on satellite imagery channels) and reference explanations built from external remote-sensing domain knowledge (distinctive spectral properties of water/land surfaces). This comparison is not derived from the model's own fitted parameters or prior outputs; the references are constructed separately from established spectral indices and properties. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The central claim—that alignment can be quantitatively assessed and misalignments flagged—rests on the external validity of the reference construction and the SHAP method, both treated as independent inputs rather than outputs of the same loop. The framework is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Domain knowledge provides reliable reference explanations for the distinctive spectral properties relevant to flood mapping
Reference graph
Works this paper leans on
-
[1]
Responsetofloodevents:Theroleofsatellite-basedemergency mappingandtheexperienceofthecopernicusemergencymanagementservice
Ajmar,A.,Boccardo,P.,Broglia,M.,Kucera,J.,Giulio-Tonolo,F.,Wania,A.,2017. Responsetofloodevents:Theroleofsatellite-basedemergency mappingandtheexperienceofthecopernicusemergencymanagementservice. Flooddamagesurveyandassessment:Newinsightsfromresearch and practice , 211–228. Akiva, P., Purri, M., Dana, K., Tellman, B., Anderson, T.,
2017
-
[2]
Sentinel-1-basedwaterandfloodmapping:Benchmarkingconvolutional neuralnetworksagainstanoperationalrule-basedprocessingchain
Bereczky,M.,Wieland,M.,Krullikowski,C.,Martinis,S.,Plank,S.,2022. Sentinel-1-basedwaterandfloodmapping:Benchmarkingconvolutional neuralnetworksagainstanoperationalrule-basedprocessingchain. IEEEJournalofSelectedTopicsinAppliedEarthObservationsandRemote Sensing 15, 2023–2036. Boccardo, P., Giulio Tonolo, F.,
2022
-
[3]
Springer, pp
Remote sensing role in emergency mapping for disaster response, in: Engineering Geology for Society and Territory-Volume 5: Urban Geology, Sustainable Planning and Landscape Exploitation. Springer, pp. 17–24. Bonafilia,D.,Tellman,B.,Anderson,T.,Issenberg,E.,2020. Sen1floods11:Ageoreferenceddatasettotrainandtestdeeplearningfloodalgorithms for sentinel-1, i...
2020
-
[4]
a global multi-temporal satellite dataset for rapid flood mapping
Kuro siwo: 33 billion𝑚 2 under the water. a global multi-temporal satellite dataset for rapid flood mapping. arXiv preprint arXiv:2311.12056 . Bountos, N.I., Sdraka, M., Zavras, A., Karavias, A., Karasante, I., Herekakis, T., Thanasou, A., Michail, D., Papoutsis, I.,
-
[5]
a global multi-temporal satellite dataset for rapid flood mapping, in: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C
Kuro siwo: 33 billion mˆ2 under the water. a global multi-temporal satellite dataset for rapid flood mapping, in: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (Eds.), Advances in Neural Information Processing Sys- tems, Curran Associates, Inc.. pp. 38105–38121. URL:https://proceedings.neurips.cc/paper_files/paper/20...
2024
-
[6]
Hydrological processes 26, 1617–1628
A synergetic use of satellite imagery from sar and optical sensors to improve coastal flood mapping in the gulf of mexico. Hydrological processes 26, 1617–1628. Chen,L.,Cai,X.,Li,Z.,Xing,J.,Ai,J.,2024. Whereismyattention?anexplainableaiexplorationinwaterdetectionfromsarimagery. International Journal of Applied Earth Observation and Geoinformation 130, 103...
2024
-
[7]
URL:https://doi.org/10
A global flood events and cloud cover dataset (version 1.0). URL:https://doi.org/10. 34911/rdnt.oz32gz. [Accessed on 2024-05-12]. Dong,Z.,Wang,G.,Amankwah,S.O.Y.,Wei,X.,Hu,Y.,Feng,A.,2021. Monitoringthesummerfloodinginthepoyanglakeareaofchinain2020 based on sentinel-1 data and multiple convolutional neural networks. International Journal of Applied Earth ...
2024
-
[8]
Nature Machine Intelligence 2, 665–673
Shortcut learning in deep neural networks. Nature Machine Intelligence 2, 665–673. Gipiškis,R.,Tsai,C.W.,Kurasova,O.,2024. Explainableai(xai)inimagesegmentationinmedicine,industry,andbeyond:Asurvey. ICTExpress 10, 1331–1354. Grimaldi, S., Xu, J., Li, Y., Pauwels, V.R., Walker, J.P.,
2024
-
[9]
IEEE Geoscience and Remote Sensing Magazine
Opening the black box: A systematic review on explainable artificial intelligence in remote sensing. IEEE Geoscience and Remote Sensing Magazine . Hsu,C.Y.,Li,W.,2023. Explainablegeoai:cansaliencymapshelpinterpretartificialintelligence’slearningprocess?anempiricalstudyonnatural feature detection. International Journal of Geographical Information Science 3...
2023
-
[10]
arXiv preprint arXiv:2106.12228
groupshapley: Efficient prediction explanation with shapley values for feature groups. arXiv preprint arXiv:2106.12228 . Kakogeorgiou, I., Karantzalos, K.,
-
[11]
International Journal of Applied Earth Observation and Geoinformation 103, 102520
Evaluating explainable artificial intelligence methods for multi-label deep learning classification tasks in remote sensing. International Journal of Applied Earth Observation and Geoinformation 103, 102520. Kang,W.,Xiang,Y.,Wang,F.,Wan,L.,You,H.,2018. Flooddetectioningaofen-3sarimagesviafullyconvolutionalnetworks. Sensors18,2915. Katiyar, V., Tamkuan, N....
2018
-
[12]
Theneedfortrainingandbenchmarkdatasetsforconvolutionalneuralnetworksinfloodapplications
Khouakhi,A.,Zawadzka,J.,Truckell,I.,2022. Theneedfortrainingandbenchmarkdatasetsforconvolutionalneuralnetworksinfloodapplications. Hydrology Research 53, 795–806. Kierdorf,J.,Stomberg,T.T.,Drees,L.,Rascher,U.,Roscher,R.,2024. Investigatingthecontributionofimagetimeseriesobservationstocauliflower harvest-readiness prediction. Frontiers in Artificial Intell...
2022
-
[13]
Adam: A Method for Stochastic Optimization
Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 . Konapala, G., Kumar, S.V., Ahmad, S.K.,
work page internal anchor Pith review arXiv
-
[14]
ISPRS Journal of Photogrammetry and Remote Sensing 180, 163–173
Exploring sentinel-1 and sentinel-2 diversity for flood inundation mapping using deep learning. ISPRS Journal of Photogrammetry and Remote Sensing 180, 163–173. Kotaridis,I.,Lazaridou,M.,2021. Remotesensingimagesegmentationadvances:Ameta-analysis. ISPRSJournalofPhotogrammetryandRemote Sensing 173, 309–322. Lee, H., Li, W.,
2021
-
[15]
Thin cloud removal fusing full spectral and spatial features for sentinel-2 imagery
Li, J., Zhang, Y., Sheng, Q., Wu, Z., Wang, B., Hu, Z., Shen, G., Schmitt, M., Molinier, M., 2022a. Thin cloud removal fusing full spectral and spatial features for sentinel-2 imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 8759–8775. Li,W.,Lee,H.,Wang,S.,Hsu,C.Y.,Arundel,S.T.,2023.Assessmentofanewgeoaifoundat...
2023
-
[16]
Science of The Total Environment 869, 161757
U-net-based semantic classification for flood extent extraction using sar imagery and gee platform: A case study for 2019 central us flooding. Science of The Total Environment 869, 161757. Lundberg, S.M., Lee, S.I.,
2019
-
[17]
Nearinfraredbandoflandsat8aswaterindex:acasestudyaroundcordovaandlapu-lapucity,cebu,philippines
Mondejar,J.P.,Tongco,A.F.,2019. Nearinfraredbandoflandsat8aswaterindex:acasestudyaroundcordovaandlapu-lapucity,cebu,philippines. Sustainable Environment Research 29,
2019
-
[18]
IEEE Access 10, 96774– 96787
Mmflood: A multimodal dataset for flood delineation from satellite imagery. IEEE Access 10, 96774– 96787. Muñoz,D.F.,Muñoz,P.,Moftakhari,H.,Moradkhani,H.,2021. Fromlocaltoregionalcompoundfloodmappingwithdeeplearninganddatafusion techniques. Science of the Total Environment 782, 146927. Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y...
2021
-
[19]
ACM Computing Surveys 55, 1–42
From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai. ACM Computing Surveys 55, 1–42. Nemni,E.,Bullock,J.,Belabbes,S.,Bromley,L.,2020. Fullyconvolutionalneuralnetworkforrapidfloodsegmentationinsyntheticapertureradar imagery. Remote Sensing 12,
2020
-
[20]
Science of Remote Sensing 11, 100210
Evaluating the robustness of bayesian flood mapping with sentinel-1 data: A multi-event validation study. Science of Remote Sensing 11, 100210. Saleh,T.,Weng,X.,Holail,S.,Hao,C.,Xia,G.S.,2024. Dam-net:Flooddetectionfromsarimageryusingdifferentialattentionmetric-basedvision transformers. ISPRS Journal of Photogrammetry and Remote Sensing 212, 440–453. Sand...
2024
-
[21]
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 . Sun,L.,Mi,X.,Wei,J.,Wang,J.,Tian,X.,Yu,H.,Gan,P.,2017. Aclouddetectionalgorithm-generatingmethodforremotesensingdataatvisible to short-wave infrared wavelengths. ISPRS journal of photogrammetry and remote sensing 124, 70–88. Tarp...
work page internal anchor Pith review arXiv 2017
-
[22]
Wang, R., Zhang, C., Chen, C., Hao, H., Li, W., Jiao, L.,
Wagner, W., Bauer-Marschallinger, B., Roth, F., Raiger-Stachl, T., Reimer, C., McCormick, N., Matgen, P., Chini, M., Li, Y., Martinis, S., et al., 2026.Thefully-automaticsentinel-1globalfloodmonitoringservice:Scientificchallengesandfuturedirections.RemoteSensingofEnvironment 333, 115108. Wang, R., Zhang, C., Chen, C., Hao, H., Li, W., Jiao, L.,
2026
-
[23]
Remote Sensing 14,
Flood detection using multiple chinese satellite datasets during 2020 china summer floods. Remote Sensing 14,
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.