Rawat, Disasterqa: A benchmark for assessing the performance of llms in disaster response, arXiv preprint arXiv:2410.20707 (2024)

· 2024 · arXiv 2410.20707

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments

cs.CV · 2026-06-04 · unverdicted · novelty 7.0

DisasterBench is a new multi-stage multimodal reasoning benchmark for UAV disaster response with 14 scenes and 9 tasks; the accompanying 2B DisasterVL model outperforms open-source MLLMs and approaches GPT-4o efficiency.

RAPID: A Reproducible Multi-Agent Pipeline for Interpretable Disaster Damage Assessment from Satellite and Street-View Imagery

cs.CV · 2026-06-20 · unverdicted · novelty 6.0

RAPID is a multi-agent pipeline for zero-shot interpretable damage assessment and reporting from cross-view satellite and street-view imagery across multiple disaster types.

A Survey of Scaling in Large Language Model Reasoning

cs.AI · 2025-04-02 · unverdicted · novelty 3.0

A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

citing papers explorer

Showing 3 of 3 citing papers.

DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments cs.CV · 2026-06-04 · unverdicted · none · ref 44
DisasterBench is a new multi-stage multimodal reasoning benchmark for UAV disaster response with 14 scenes and 9 tasks; the accompanying 2B DisasterVL model outperforms open-source MLLMs and approaches GPT-4o efficiency.
RAPID: A Reproducible Multi-Agent Pipeline for Interpretable Disaster Damage Assessment from Satellite and Street-View Imagery cs.CV · 2026-06-20 · unverdicted · none · ref 29
RAPID is a multi-agent pipeline for zero-shot interpretable damage assessment and reporting from cross-view satellite and street-view imagery across multiple disaster types.
A Survey of Scaling in Large Language Model Reasoning cs.AI · 2025-04-02 · unverdicted · none · ref 163
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

Rawat, Disasterqa: A benchmark for assessing the performance of llms in disaster response, arXiv preprint arXiv:2410.20707 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer