Introduces the first benchmark for over-refusal in large audio language models using 3,000 pseudo-harmful audio samples and evaluates 12 models across six families, finding widespread over-refusal.
arXiv preprint arXiv:2505.23473 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SEAR trains one LLM via adversarial process rewards to explore harmful reasoning paths but flip to safe outputs, reducing over-refusal while preserving safety.
citing papers explorer
-
AOR-Bench: Do Large Audio Language Models Over-Refuse Pseudo-Harmful Queries?
Introduces the first benchmark for over-refusal in large audio language models using 3,000 pseudo-harmful audio samples and evaluates 12 models across six families, finding widespread over-refusal.
-
Addressing Over-Refusal in LLMs with Competing Rewards
SEAR trains one LLM via adversarial process rewards to explore harmful reasoning paths but flip to safe outputs, reducing over-refusal while preserving safety.