arxiv: 2604.11709 · v1 · submitted 2026-04-13 · 💻 cs.AI

Recognition: unknown

A Mamba-Based Multimodal Network for Multiscale Blast-Induced Rapid Structural Damage Assessment

Wanli Ma , Sivasakthy Selvakumaran , Dain G. Farrimond , Adam A. Dennis , Samuel E. Rigby

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:11 UTC · model grok-4.3

classification 💻 cs.AI

keywords structural damage assessmentMamba networkmultimodal fusionblast loadingremote sensingpost-disaster managementexplosion damageBeirut explosion

0 comments

The pith

A Mamba-based multimodal network integrates multi-scale blast-loading data with optical images to improve rapid structural damage assessment after explosions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Mamba-based multimodal network that combines optical remote sensing images with multi-scale blast-loading information for rapid structural damage assessment following large explosions. Current machine learning methods on remote sensing data often ignore physical blast characteristics and demand extensive training datasets, restricting their use in real disasters. By fusing these inputs, the network seeks to deliver more accurate damage maps while avoiding the limitations of field inspections such as inaccessibility and safety risks. Testing on the 2020 Beirut explosion shows clear performance gains over state-of-the-art techniques. If the gains hold, responders could allocate resources and plan recoveries more quickly after similar events.

Core claim

The authors develop a Mamba-based multimodal network that integrates multi-scale blast-loading information with optical remote sensing images, achieving significantly improved performance in structural damage assessment on the 2020 Beirut explosion dataset compared to existing approaches.

What carries the argument

A Mamba-based multimodal network that fuses multi-scale blast-loading information with optical remote sensing images to capture both visual features and physical blast characteristics.

Load-bearing premise

That multi-scale blast-loading information can be reliably generated and fused with optical images to produce generalizable improvements beyond the single Beirut test case.

What would settle it

Testing the network on damage data from a different major explosion event and checking whether the performance advantage over baselines persists.

Figures

Figures reproduced from arXiv: 2604.11709 by Adam A. Dennis, Dain G. Farrimond, Samuel E. Rigby, Sivasakthy Selvakumaran, Wanli Ma.

**Figure 1.** Figure 1: The workflow of the proposed rapid structural damage assessment method. BS encoder denotes the building segmentation decoder, and SDA encoder [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Structure of the fine-tuning framework. BS: building segmentation; [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Residual attention–base spatiotemporal state space (RA-STSS) module [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of training time versus performance ( [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Accurate and rapid structural damage assessment (SDA) is crucial for post-disaster management, helping responders prioritise resources, plan rescues, and support recovery. Traditional field inspections, though precise, are limited by accessibility, safety risks, and time constraints, especially after large explosions. Machine learning with remote sensing has emerged as a scalable solution for rapid SDA, with Mamba-based networks achieving state-of-the-art performance. However, these methods often require extensive training and large datasets, limiting real-world applicability. Moreover, they fail to incorporate key physical characteristics of blast loading for SDA. To overcome these challenges, we propose a Mamba-based multimodal network for rapid SDA that integrates multi-scale blast-loading information with optical remote sensing images. Evaluated on the 2020 Beirut explosion, our method significantly improves performance over state-of-the-art approaches. Code is available at: https://github.com/IMPACTSquad/Blast-Mamba

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mamba multimodal fusion with blast physics for damage assessment is a reasonable idea, but the Beirut-only test without metrics or ablations makes the improvement claim hard to evaluate.

read the letter

The paper proposes a Mamba-based multimodal network that fuses multi-scale blast-loading information with optical remote sensing images for structural damage assessment after explosions. They evaluate it on the 2020 Beirut blast and say it beats state-of-the-art methods. What stands out is the attempt to bring in blast physics explicitly, which most image-only ML approaches skip. Mamba's linear scaling could help with large remote sensing data. Releasing the code on GitHub is a plus for reproducibility. The main issue is the evaluation. It's all on one event, the Beirut explosion. The abstract gives no numbers, no ablation studies to show what the blast channel contributes, and no mention of other test cases or cross-validation. Without those, it's tough to know if the gains are meaningful or if they would hold for a different blast or different imaging conditions. The stress-test note raises a fair point about whether the blast-loading fields are independent or derived in a way that could make the fusion circular. If the full paper has detailed tables and more datasets, that would change things, but based on what's here, the generalizability is unproven. This work would interest researchers in applying state-of-the-art sequence models to disaster response and remote sensing. Someone looking for practical tools in post-explosion assessment might get ideas from the architecture. I think it deserves peer review because the problem is important and the multimodal angle is worth exploring, even if the current evidence is thin. A referee could push for more experiments.

Referee Report

3 major / 1 minor

Summary. The paper proposes a Mamba-based multimodal network for multiscale blast-induced rapid structural damage assessment. It integrates multi-scale blast-loading information with optical remote sensing images and evaluates the approach on the 2020 Beirut explosion, claiming significant improvements over state-of-the-art methods.

Significance. If the empirical claims hold, the work could advance the field by demonstrating how physical blast characteristics can be fused with efficient sequence models like Mamba to enhance post-disaster structural damage assessment. The release of code supports reproducibility and allows for further validation. However, the reliance on a single event limits broader significance until generalizability is demonstrated.

major comments (3)

[Abstract] The statement that the method 'significantly improves performance over state-of-the-art approaches' is presented without any quantitative metrics, ablation studies, dataset details, or error analysis. This makes it difficult to assess the validity and magnitude of the claimed improvement.
[Evaluation] The performance evaluation is limited to a single disaster event (2020 Beirut explosion). To substantiate the multimodal claim, the paper needs to show that the blast-loading information is generated independently and that the gains persist across other blast events or held-out regions from the same event.
[Methodology] There is insufficient detail on how the multi-scale blast-loading fields are computed and fused with the optical images. Without this, it is unclear whether the fusion avoids circularity with the labeling process.

minor comments (1)

[Abstract] The abstract could be strengthened by including at least one key performance metric to support the improvement claim.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their detailed and constructive feedback. We address each major comment below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] The statement that the method 'significantly improves performance over state-of-the-art approaches' is presented without any quantitative metrics, ablation studies, dataset details, or error analysis. This makes it difficult to assess the validity and magnitude of the claimed improvement.

Authors: We agree that the abstract would benefit from more concrete quantitative support. The full manuscript already contains these details in Sections 4 and 5, including specific metrics (e.g., accuracy, F1-score, IoU), ablation studies on the multimodal components, dataset description (Beirut 2020 satellite imagery with damage annotations), and error analysis. In the revision we will update the abstract to include key quantitative gains, such as the reported improvement margins over baselines, while keeping the abstract concise. revision: yes
Referee: [Evaluation] The performance evaluation is limited to a single disaster event (2020 Beirut explosion). To substantiate the multimodal claim, the paper needs to show that the blast-loading information is generated independently and that the gains persist across other blast events or held-out regions from the same event.

Authors: We acknowledge the single-event limitation, which is inherent to the domain given the scarcity of well-documented large-scale blast incidents with paired satellite and physical data. The blast-loading fields are generated independently via established physical models (Kingery-Bulmash equations and multi-scale propagation simulations) using only the known explosion parameters (location, yield, height of burst), without reference to damage labels. To demonstrate persistence of gains, we will add spatial cross-validation experiments on held-out geographic regions within the Beirut dataset. However, we cannot introduce results from additional distinct blast events because no comparable public multimodal datasets exist for other incidents. revision: partial
Referee: [Methodology] There is insufficient detail on how the multi-scale blast-loading fields are computed and fused with the optical images. Without this, it is unclear whether the fusion avoids circularity with the labeling process.

Authors: We will expand the Methodology section with explicit computation details: blast-loading fields are derived at multiple scales using distance-based attenuation formulas and numerical propagation from the explosion epicenter, independent of any image-derived labels. Fusion occurs through a dedicated multimodal Mamba block that concatenates blast feature embeddings with image patch tokens before state-space modeling. Because blast fields rely exclusively on pre-event physical parameters and the damage labels are produced separately from post-event optical imagery, the process contains no circularity; we will add a clarifying paragraph and diagram to make this explicit. revision: yes

standing simulated objections not resolved

Demonstrating performance gains on additional distinct blast events, as no suitable public multimodal datasets for other incidents are currently available.

Circularity Check

0 steps flagged

No significant circularity; empirical performance claim rests on independent evaluation rather than definitional reduction.

full rationale

The paper introduces a Mamba-based multimodal architecture that fuses generated multi-scale blast-loading fields with optical imagery for structural damage assessment. The central result is an empirical performance gain on the 2020 Beirut dataset relative to prior SOTA methods. No quoted equations or sections reduce the reported improvement to a fitted parameter, self-referential definition, or load-bearing self-citation chain. Blast-field generation is described as an external input derived from explosion metadata, and the network training plus test-set comparison supplies falsifiable external evidence rather than tautological equivalence. The single-event evaluation raises generalizability questions but does not constitute circularity under the specified criteria.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The paper relies on standard deep-learning training assumptions and the premise that blast-loading simulations provide useful auxiliary signals; no new physical entities or ad-hoc constants are introduced in the abstract.

free parameters (1)

Mamba network hyperparameters
Standard training choices such as learning rate and layer dimensions are required but not detailed in the abstract.

axioms (1)

domain assumption Blast-loading simulations at multiple scales can be accurately computed and aligned with optical imagery for damage prediction
Invoked by the multimodal fusion design described in the abstract.

pith-pipeline@v0.9.0 · 5477 in / 1216 out tokens · 31342 ms · 2026-05-10T15:11:37.632221+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 4 canonical work pages · 4 internal anchors

[1]

The protection of buildings against terrorism and disorder.,

CL Elliott, GC Mays, and PD Smith, “The protection of buildings against terrorism and disorder.,”Proceedings of the Institution of Civil Engineers-Structures and Buildings, vol. 94, no. 3, pp. 287–297, 1992

1992
[2]

Analysis of building collapse under blast loads,

Bibiana Maria Luccioni, Ricardo Daniel Ambrosini, and Rodolfo Fran- cisco Danesi, “Analysis of building collapse under blast loads,”Engi- neering structures, vol. 26, no. 1, pp. 63–71, 2004

2004
[3]

Recovery after disaster: Achieving sustainable development, mitigation and equity,

Philip R Berke, Jack Kartez, and Dennis Wenger, “Recovery after disaster: Achieving sustainable development, mitigation and equity,” Disasters, vol. 17, no. 2, pp. 93–109, 1993

1993
[4]

Deep learning-based rapid damage assessment of rc columns under blast loading,

Xiao-Qing Zhou, Bing-Gui Huang, Xiao-You Wang, and Yong Xia, “Deep learning-based rapid damage assessment of rc columns under blast loading,”Engineering Structures, vol. 271, pp. 114949, 2022

2022
[5]

A comprehensive review of earthquake- induced building damage detection with remote sensing techniques,

Laigen Dong and Jie Shan, “A comprehensive review of earthquake- induced building damage detection with remote sensing techniques,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 84, pp. 85–99, 2013

2013
[6]

Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters,

Zhuo Zheng, Yanfei Zhong, Junjue Wang, Ailong Ma, and Liangpei Zhang, “Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters,”Remote Sensing of Environment, vol. 265, pp. 112636, 2021

2021
[7]

Unsupervised structural damage assessment from space using the segment anything model (usda-sam): A case study of the 2023 turkiye earthquake,

Sudharshan Balaji and Oktay Karakus ¸, “Unsupervised structural damage assessment from space using the segment anything model (usda-sam): A case study of the 2023 turkiye earthquake,” inIGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2024, pp. 585–589

2023
[8]

Automated building damage assessment and large-scale mapping by integrating satellite imagery, gis, and deep learning,

Abdullah M Braik and Maria Koliou, “Automated building damage assessment and large-scale mapping by integrating satellite imagery, gis, and deep learning,”Computer-Aided Civil and Infrastructure Engineering, vol. 39, no. 15, pp. 2389–2404, 2024

2024
[9]

Residential wildfire structural damage detection using deep learning to analyze uncrewed aerial system (uas) imagery, aerial imagery, and satellite imagery,

Dae Kun Kang, Michael J Olsen, Erica Fischer, and Jaehoon Jung, “Residential wildfire structural damage detection using deep learning to analyze uncrewed aerial system (uas) imagery, aerial imagery, and satellite imagery,”Fire and Materials, 2025

2025
[10]

Deep learning for post-hurricane aerial damage assessment of buildings,

Chih-Shen Cheng, Amir H Behzadan, and Arash Noshadravan, “Deep learning for post-hurricane aerial damage assessment of buildings,” Computer-Aided Civil and Infrastructure Engineering, vol. 36, no. 6, pp. 695–710, 2021

2021
[11]

Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel, and Christopher R ´e, “Efficiently model- ing long sequences with structured state spaces,”arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review arXiv 2021
[12]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[14]

Vmamba: Visual state space model,

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu, “Vmamba: Visual state space model,”Advances in neural information processing systems, vol. 37, pp. 103031–103063, 2024

2024
[15]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang, “Vision mamba: Efficient visual represen- tation learning with bidirectional state space model,”arXiv preprint arXiv:2401.09417, 2024

work page internal anchor Pith review arXiv 2024
[16]

Changemamba: Remote sensing change detection with spa- tiotemporal state space model,

Hongruixuan Chen, Jian Song, Chengxi Han, Junshi Xia, and Naoto Yokoya, “Changemamba: Remote sensing change detection with spa- tiotemporal state space model,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–20, 2024

2024
[17]

Bright: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response,

Hongruixuan Chen, Jian Song, Olivier Dietrich, Clifford Broni-Bediako, Weihao Xuan, Junjue Wang, Xinlei Shao, Yimin Wei, Junshi Xia, Cuiling Lan, et al., “Bright: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response,”Earth System Science Data Discussions, vol. 2025, pp. 1–51, 2025

2025
[18]

An accurate and robust flux splitting scheme for shock and contact discontinuities,

Yasuhiro Wada and Meng Sing Liou, “An accurate and robust flux splitting scheme for shock and contact discontinuities,”SIAM Journal of Scientific Computing, vol. 18, no. 3, pp. 633–657, 1997

1997
[19]

Rose,An Approach to the Evaluation of Blast Loads on Finites and Semi-Infinite Structures, Phd thesis, Cranfield University, 2001

Timothy A. Rose,An Approach to the Evaluation of Blast Loads on Finites and Semi-Infinite Structures, Phd thesis, Cranfield University, 2001

2001
[20]

The Direction-encoded Neural Network: A machine learning approach to rapidly predict blast loading in obstructed environments,

Adam A Dennis and Samuel E. Rigby, “The Direction-encoded Neural Network: A machine learning approach to rapidly predict blast loading in obstructed environments,”International Journal of Protective Structures, vol. 15, no. 3, pp. 455–483, sep 2024

2024
[21]

Airblast variability and fatality risks from a VBIED in a complex urban environment,

Nicholas A. Marks, Mark G. Stewart, Michael D. Netherton, and Chris G. Stirling, “Airblast variability and fatality risks from a VBIED in a complex urban environment,”Reliability Engineering and System Safety, vol. 209, no. June 2020, pp. 107459, 2021

2020
[22]

Preliminary yield estimation of the 2020 Beirut explosion using video footage from social media,

S. E. Rigby, T. J. Lodge, S. Alotaibi, A. D. Barr, S. D. Clarke, G. S. Langdon, and A. Tyas, “Preliminary yield estimation of the 2020 Beirut explosion using video footage from social media,”Shock Waves, vol. 30, no. 6, pp. 671–675, 2020

2020
[23]

A revised analysis of the beirut explosion,

D V Ritzel, S J Cimpoeru, P Phillips, Defence Science, Technology Group, and Fishermans Bend, “A revised analysis of the beirut explosion,” in26th International Symposium on Military Aspects of Blast and Shock, Wollongong, Austrailia, 2023

2023
[24]

Creating xbd: A dataset for assessing building damage from satellite imagery,

Ritwik Gupta, Bryce Goodman, Nirav Patel, Ricky Hosfelt, Sandra Sajeev, Eric Heim, Jigar Doshi, Keane Lucas, Howie Choset, and Matthew Gaston, “Creating xbd: A dataset for assessing building damage from satellite imagery,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2019, pp. 10–17

2019
[25]

Encoder-decoder with atrous separable convolution for semantic image segmentation,

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818

2018
[26]

U-Net: Convo- lutional networks for biomedical image segmentation,

Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convo- lutional networks for biomedical image segmentation,” inProc. 18th Int. Conf. Med. Image Comput. Comput.-Assist. Interv.Springer, 2015, pp. 234–241

2015
[27]

Learning from multi- modal and multitemporal earth observation data for building damage mapping,

Bruno Adriano, Naoto Yokoya, Junshi Xia, Hiroyuki Miura, Wen Liu, Masashi Matsuoka, and Shunichi Koshimura, “Learning from multi- modal and multitemporal earth observation data for building damage mapping,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 175, pp. 132–143, 2021

2021
[28]

Change detection in multisource vhr images via deep siamese convolu- tional multiple-layers recurrent neural network,

Hongruixuan Chen, Chen Wu, Bo Du, Liangpei Zhang, and Le Wang, “Change detection in multisource vhr images via deep siamese convolu- tional multiple-layers recurrent neural network,”IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 4, pp. 2848–2864, 2019

2019
[29]

Dual-tasks siamese transformer framework for building damage assessment,

Hongruixuan Chen, Edoardo Nemni, Sofia Vallecorsa, Xi Li, Chen Wu, and Lars Bromley, “Dual-tasks siamese transformer framework for building damage assessment,” inIGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2022, pp. 1600– 1603

2022