DamageArbiter: A Multimodal Arbitration Framework for Disaster Damage Assessment from Street-View Imagery

Bing Zhou; Hao Tian; Heng Cai; Kani Fu; Lei Zou; Siqin Wang; Wenjing Gong; Yifan Yang; Zongrong Li

arxiv: 2603.14837 · v2 · pith:M2GS7B5Dnew · submitted 2026-03-16 · 💻 cs.CV

DamageArbiter: A Multimodal Arbitration Framework for Disaster Damage Assessment from Street-View Imagery

Yifan Yang , Lei Zou , Wenjing Gong , Kani Fu , Zongrong Li , Siqin Wang , Bing Zhou , Heng Cai

show 1 more author

Hao Tian

This is my paper

classification 💻 cs.CV

keywords damagearbiteraccuracymodelsassessmentdamagebaselinedisastermodel

0 comments

read the original abstract

Analyzing street-view imagery with computer vision models offers a promising approach for rapid, hyperlocal disaster damage assessment, but existing approaches typically rely on black-box pre-trained vision models, which lack interpretability and reliability. This study proposes DamageArbiter, a multimodal disagreement-driven arbitration framework designed to improve the accuracy and reliability of street-view-based damage assessment. DamageArbiter leverages the complementary strengths of unimodal and multimodal models and employs a lightweight logistic regression meta-classifier to arbitrate cases in which model predictions disagree. Using 2,556 post-disaster street-view images, paired with manually generated or large language model (LLM)-generated text descriptions, we systematically compared DamageArbiter with fine-tuned unimodal (image-only and text-only) models and CLIP-based multimodal models in terms of classification performance and overconfidence errors. Results show that DamageArbiter improved accuracy to 75.85% and the Matthews correlation coefficient (MCC) to 0.6188, compared with the best-performing text-only baseline (63.07% accuracy, 0.4126 MCC), image-only baseline (74.33% accuracy, 0.5947 MCC), and CLIP baseline (74.22% accuracy, 0.5915 MCC). The overconfidence analysis further reveals that DamageArbiter substantially reduced the overconfidence error from 70.58% for the best-performing baseline, the image-only ViT model, to 16.45%. Overall, this study demonstrates that accuracy alone is insufficient for evaluating disaster damage classification models and highlights the importance of measuring overconfidence errors as part of model reliability assessment. DamageArbiter thus offers a more reliable framework for rapid, hyperlocal disaster damage assessment from street-view imagery.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RAPID: A Reproducible Multi-Agent Pipeline for Interpretable Disaster Damage Assessment from Satellite and Street-View Imagery
cs.CV 2026-06 unverdicted novelty 6.0

RAPID is a multi-agent pipeline for zero-shot interpretable damage assessment and reporting from cross-view satellite and street-view imagery across multiple disaster types.