Experiments show that while ALLMs excel on standard answerable tasks, they suffer from a pro- nounced forced-choice bias, often answering when they should ab- stain

CONCLUSION We present AQUA-Bench, a benchmark for evaluating unanswerability in audio question answering through three scenarios: Absent Answer Detection, Incompatible Answer Set

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering

eess.AS · 2026-01-18 · unverdicted · novelty 7.0

AQUA-Bench evaluates audio QA models on three unanswerability scenarios: missing correct answers, mismatched choice sets, and questions irrelevant to the audio.

citing papers explorer

Showing 1 of 1 citing paper.

AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering eess.AS · 2026-01-18 · unverdicted · none · ref 5
AQUA-Bench evaluates audio QA models on three unanswerability scenarios: missing correct answers, mismatched choice sets, and questions irrelevant to the audio.

Experiments show that while ALLMs excel on standard answerable tasks, they suffer from a pro- nounced forced-choice bias, often answering when they should ab- stain

fields

years

verdicts

representative citing papers

citing papers explorer