pith. sign in

VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

Large Audio-Language Models (LALMs) are increasingly integrated into daily applications, yet their generative biases remain underexplored. Existing speech fairness benchmarks rely on synthetic speech and Multiple-Choice Questions (MCQs), both offering a fragmented view of fairness. We propose VIBE, a framework that evaluates generative bias through open-ended tasks such as personalized recommendations, using real-world human recordings. Unlike MCQs, our method allows stereotypical associations to manifest organically without predefined options, making it easily extensible to new tasks. Evaluating 11 state-of-the-art LALMs reveals systematic biases in realistic scenarios. We find that gender cues often trigger larger distributional shifts than accent cues, indicating that current LALMs reproduce social stereotypes.

citation-role summary

background 1 dataset 1

citation-polarity summary

fields

eess.AS 3

years

2026 3

representative citing papers

citing papers explorer

Showing 3 of 3 citing papers.