Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery

Ali Mostafavi; Yu-Hsuan Ho

arxiv: 2506.06667 · v1 · submitted 2025-06-07 · 💻 cs.CV · cs.LG· eess.IV

Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery

Yu-Hsuan Ho , Ali Mostafavi This is my paper

Pith reviewed 2026-05-19 11:15 UTC · model grok-4.3

classification 💻 cs.CV cs.LGeess.IV

keywords flood damage assessmentSAR remote sensingMamba architecturemultitask learningbuilding damage classificationHurricane Harveymultimodal fusioninherent flood risk

0 comments

The pith

A multimodal Mamba model fuses SAR imagery, optical basemaps, and a flood-risk layer to assess graded building damage after inundation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Flood-DamageSense as a purpose-built deep-learning framework for building-level flood damage assessment from remote sensing data. It combines pre- and post-event SAR and InSAR scenes with very-high-resolution optical imagery and an inherent flood-risk layer to direct attention toward structures that may show only minimal visible change. A semi-Siamese Mamba backbone with task-specific decoders performs joint prediction of damage grades, floodwater extent, and building footprints. On Hurricane Harvey imagery from Harris County, the approach yields up to a 19-point mean F1 gain over prior methods, with the largest improvements in the minor and moderate damage classes. Ablation results single out the risk layer as the dominant contributor to this improvement.

Core claim

Flood-DamageSense is the first end-to-end deep-learning system designed specifically for graded building flood damage assessment; its multimodal Mamba architecture with semi-Siamese encoder and task-specific decoders, augmented by an inherent flood-risk feature, produces building-scale damage maps that outperform state-of-the-art baselines by up to 19 F1 points on Hurricane Harvey data, with the risk layer identified as the most influential input.

What carries the argument

Multimodal Mamba backbone with semi-Siamese encoder and task-specific decoders, fused with an inherent flood-risk layer that encodes long-term exposure probabilities to guide detection of low-change damage.

If this is right

Pixel-level outputs can be converted to actionable building-scale damage maps within minutes of image acquisition.
Performance gains are largest for the minor and moderate damage categories that are most often misclassified by existing models.
SAR-based all-weather acquisition enables damage assessment even under cloud cover or at night.
Joint multitask prediction of damage, flood extent, and footprints reduces the need for separate models.
The risk layer improves detection when spectral or structural change signatures are weak or absent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same risk-guided fusion strategy could be tested on other flood events where insurance or exposure data exist.
End-to-end pipelines of this form may shorten the time between image capture and allocation of repair resources.
If the risk layer generalizes across regions, the framework could support pre-event vulnerability mapping as well as post-event assessment.

Load-bearing premise

Insurance-derived property-damage extents supply accurate, unbiased ground-truth labels for the graded damage states used in training and evaluation.

What would settle it

Independent field-verified damage grades or a second insurance dataset for the same Harris County buildings would show whether the reported F1 gains hold when the training labels are replaced.

read the original abstract

Most post-disaster damage classifiers succeed only when destructive forces leave clear spectral or structural signatures -- conditions rarely present after inundation. Consequently, existing models perform poorly at identifying flood-related building damages. The model presented in this study, Flood-DamageSense, addresses this gap as the first deep-learning framework purpose-built for building-level flood-damage assessment. The architecture fuses pre- and post-event SAR/InSAR scenes with very-high-resolution optical basemaps and an inherent flood-risk layer that encodes long-term exposure probabilities, guiding the network toward plausibly affected structures even when compositional change is minimal. A multimodal Mamba backbone with a semi-Siamese encoder and task-specific decoders jointly predicts (1) graded building-damage states, (2) floodwater extent, and (3) building footprints. Training and evaluation on Hurricane Harvey (2017) imagery from Harris County, Texas -- supported by insurance-derived property-damage extents -- show a mean F1 improvement of up to 19 percentage points over state-of-the-art baselines, with the largest gains in the frequently misclassified "minor" and "moderate" damage categories. Ablation studies identify the inherent-risk feature as the single most significant contributor to this performance boost. An end-to-end post-processing pipeline converts pixel-level outputs to actionable, building-scale damage maps within minutes of image acquisition. By combining risk-aware modeling with SAR's all-weather capability, Flood-DamageSense delivers faster, finer-grained, and more reliable flood-damage intelligence to support post-disaster decision-making and resource allocation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a multimodal Mamba model for SAR flood damage grading that reports solid F1 gains on Harvey data, but the insurance labels for the minor and moderate classes are the part that needs checking.

read the letter

This paper introduces Flood-DamageSense, a multimodal Mamba architecture for assessing building flood damage using SAR imagery combined with optical data and a flood risk layer. The key takeaway is that it claims up to 19 percentage points better mean F1 score than baselines on Hurricane Harvey data from Harris County, particularly improving on the minor and moderate damage classes that are usually hard to catch. What is new here is the purpose-built setup for this task: a Mamba backbone with semi-Siamese encoder for pre and post event fusion, multitask decoders handling damage states, floodwater extent, and building footprints all at once, plus the inherent flood-risk layer to help with cases where damage is subtle. The end-to-end pipeline that goes from pixels to building-scale maps is a nice practical touch. The work does well in focusing on SAR for all-weather capability, which is crucial for floods, and in using multitask learning to potentially improve feature sharing. Adding the risk prior seems like a reasonable way to bias the model toward likely affected areas. The main soft spot is the reliance on insurance-derived extents for the ground truth labels. The biggest reported gains are in the minor and moderate categories, but if those labels come from claim data or payout thresholds rather than detailed inspections, there could be noise or bias exactly where the model claims to shine. The ablation highlights the risk layer as key, but without details on how the four-class labels were generated from the insurance data or any cross-checks, it's hard to rule out that the improvement is partly fitting to label artifacts. I'd like to see more on dataset construction, splits, and qualitative error analysis in the full paper. This kind of work is for researchers and practitioners in remote sensing and disaster management who deal with post-event damage mapping. A reader looking for applied deep learning methods in SAR for hazards would get value from the architecture choices and the reported performance. It deserves a serious referee because the problem is important and the approach is grounded in real data, even if some aspects of the evaluation need tightening. My recommendation is to send it for peer review, with reviewers asked to pay special attention to the label quality and how the risk layer interacts with it.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Flood-DamageSense, a multimodal Mamba architecture with a semi-Siamese encoder and task-specific decoders for joint prediction of graded building damage states, floodwater extent, and building footprints. It fuses pre- and post-event SAR/InSAR imagery, very-high-resolution optical basemaps, and an inherent flood-risk layer. Training and evaluation on Hurricane Harvey (2017) imagery from Harris County, Texas, using insurance-derived property-damage extents, report a mean F1 improvement of up to 19 percentage points over state-of-the-art baselines, with the largest gains in the minor and moderate damage categories; ablations attribute the boost primarily to the risk feature. An end-to-end post-processing pipeline produces building-scale damage maps.

Significance. If the quantitative gains hold after rigorous validation of the insurance-derived labels, the work would be significant for post-disaster remote sensing by demonstrating how risk priors can improve detection of subtle flood damages in SAR data where spectral signatures are weak. The multitask Mamba design offers efficiency advantages, and the practical pipeline for rapid mapping is a strength. The emphasis on minor/moderate classes addresses a known weakness in existing flood-damage classifiers.

major comments (2)

[Abstract and §4 (results)] Abstract and §4 (results): The headline claim of up to 19 pp mean F1 improvement, concentrated in the minor and moderate damage categories, rests on the assumption that insurance-derived property-damage extents provide accurate, unbiased per-building labels for the four-class taxonomy. The manuscript must detail the exact mapping from claim payouts or loss thresholds to graded damage states, quantify label noise or inter-rater agreement, and include error analysis or sensitivity tests showing that the reported gains (and the ablation attributing them to the risk layer) survive plausible label perturbations.
[§5 (ablations)] §5 (ablations): The ablation that singles out the inherent-risk feature as the dominant contributor could be confounded if label noise is systematically higher in the subtle-change classes; additional controls (e.g., training with injected label noise or comparison against an independent damage survey) are needed to confirm that the risk layer improves detection rather than simply regularizing toward the same noisy distribution.

minor comments (2)

[§3 (methods)] Provide explicit dataset statistics: number of buildings, exact train/validation/test splits, and temporal alignment procedure for pre/post SAR pairs.
[§4 (results)] Clarify the multitask loss weighting scheme and whether the reported F1 scores are macro-averaged or weighted; include per-class precision/recall tables to support the claim of largest gains in minor/moderate categories.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which have prompted us to improve the transparency around our label generation process and to strengthen the ablation analysis. We address each major comment below.

read point-by-point responses

Referee: [Abstract and §4 (results)] Abstract and §4 (results): The headline claim of up to 19 pp mean F1 improvement, concentrated in the minor and moderate damage categories, rests on the assumption that insurance-derived property-damage extents provide accurate, unbiased per-building labels for the four-class taxonomy. The manuscript must detail the exact mapping from claim payouts or loss thresholds to graded damage states, quantify label noise or inter-rater agreement, and include error analysis or sensitivity tests showing that the reported gains (and the ablation attributing them to the risk layer) survive plausible label perturbations.

Authors: We agree that greater transparency on the label derivation is essential. In the revised manuscript we have added a dedicated subsection in §3 that specifies the exact mapping from insurance claim payouts to the four-class taxonomy, using the loss-percentage thresholds supplied by the data provider. Although inter-rater agreement cannot be quantified because the labels are derived from proprietary insurance records rather than multiple human annotators, we have inserted an error-analysis paragraph that discusses known sources of label uncertainty (e.g., under-reporting of minor damage and payout-to-physical-damage discrepancies). We have also performed sensitivity tests by shifting class boundaries by ±5 % and ±10 %; the mean F1 gains remain above 16 pp and the attribution of performance to the risk layer is unchanged. These additions appear in the updated §4 and a new supplementary table. revision: yes
Referee: [§5 (ablations)] §5 (ablations): The ablation that singles out the inherent-risk feature as the dominant contributor could be confounded if label noise is systematically higher in the subtle-change classes; additional controls (e.g., training with injected label noise or comparison against an independent damage survey) are needed to confirm that the risk layer improves detection rather than simply regularizing toward the same noisy distribution.

Authors: We share the concern about possible confounding. We have therefore added controlled label-noise injection experiments (random flips at 5 %, 10 %, and 15 % rates, with elevated noise applied to the minor and moderate classes) and report the results in the revised §5. The risk-layer contribution remains statistically significant under these conditions. We are, however, unable to provide a comparison against an independent damage survey, as no such dataset was available for the study region beyond the insurance records used. revision: partial

standing simulated objections not resolved

Direct comparison against an independent damage survey

Circularity Check

0 steps flagged

No circularity: empirical gains measured on held-out imagery against external baselines and independent labels

full rationale

The paper's central claims consist of training a multimodal Mamba architecture on Hurricane Harvey SAR/optical imagery and reporting mean F1 improvements (up to 19 pp) on held-out test imagery, with ablation results attributing gains to the added risk layer. These metrics are computed directly against state-of-the-art baselines using insurance-derived property-damage extents as ground truth; no equations, fitted parameters, or self-citations are shown to reduce the reported F1 scores or ablation rankings to quantities defined by the model's own outputs. The multitask prediction of damage states, flood extent, and footprints is presented as a forward architectural choice rather than a self-referential derivation. The evaluation chain therefore remains externally benchmarked and does not collapse to self-definition or fitted-input renaming.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard remote-sensing assumptions about SAR penetration and on the availability of insurance labels as ground truth; the risk layer is treated as an input feature rather than a newly postulated entity.

free parameters (1)

multitask loss weights
Weights balancing the three task-specific losses (damage grading, floodwater, footprints) are typically tuned on validation data and directly affect the joint training objective.

axioms (2)

domain assumption Pre- and post-event SAR/InSAR scenes contain detectable signals of flood-related building damage even when optical change is minimal.
Invoked when the model fuses SAR scenes to guide detection of structures with little compositional change.
domain assumption The inherent flood-risk layer encodes reliable long-term exposure probabilities that can be used as an auxiliary input.
Stated in the abstract as guiding the network toward plausibly affected structures.

pith-pipeline@v0.9.0 · 5825 in / 1576 out tokens · 45642 ms · 2026-05-19T11:15:53.852864+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The architecture fuses pre- and post-event SAR/InSAR scenes with very-high-resolution optical basemaps and an inherent flood-risk layer that encodes long-term exposure probabilities, guiding the network toward plausibly affected structures even when compositional change is minimal.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Ablation studies identify the inherent-risk feature as the single most significant contributor to this performance boost.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.