arXiv preprint arXiv:2508.21376 , year=

Ahelm: A holistic evaluation of audio-language models , author= · 2025 · arXiv 2508.21376

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

cs.SD · 2026-04-16 · unverdicted · novelty 8.0

VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.

Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI

eess.AS · 2026-05-02 · accept · novelty 7.0

The paper delivers a unified framework for fairness in speech technologies by formalizing seven definitions, organizing research into three paradigms, diagnosing pipeline-specific biases, and mapping mitigations to those sources.

Can Large Audio Language Models Ignore Multilingual Distractors? An Evaluation of Their Selective Auditory Attention Capabilities

eess.AS · 2026-05-17 · unverdicted · novelty 6.0

Introduces the MUSA benchmark and evaluates LALMs showing that strong single-speaker performance fails to ensure robust selective attention under multilingual interference, with errors from source confusion and unresolved attribution after separation.

AudioMosaic: Contrastive Masked Audio Representation Learning

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

AudioMosaic learns general-purpose audio representations through contrastive pre-training with structured spectrogram masking, reaching state-of-the-art results on standard benchmarks and improving audio-language tasks.

VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech

eess.AS · 2026-04-19 · unverdicted · novelty 6.0

VIBE evaluates generative biases in large audio-language models with real-world speech and open-ended tasks, showing that gender cues produce larger distributional shifts than accent cues across 11 tested models.

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs

cs.SD · 2025-09-09 · unverdicted · novelty 6.0

AU-Harness introduces an efficient unified evaluation framework for audio LLMs featuring batch optimizations, multi-turn dialogue support, and standardized protocols for fair comparisons.

citing papers explorer

Showing 6 of 6 citing papers.

VoxSafeBench: Not Just What Is Said, but Who, How, and Where cs.SD · 2026-04-16 · unverdicted · none · ref 10
VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI eess.AS · 2026-05-02 · accept · none · ref 227
The paper delivers a unified framework for fairness in speech technologies by formalizing seven definitions, organizing research into three paradigms, diagnosing pipeline-specific biases, and mapping mitigations to those sources.
Can Large Audio Language Models Ignore Multilingual Distractors? An Evaluation of Their Selective Auditory Attention Capabilities eess.AS · 2026-05-17 · unverdicted · none · ref 18
Introduces the MUSA benchmark and evaluates LALMs showing that strong single-speaker performance fails to ensure robust selective attention under multilingual interference, with errors from source confusion and unresolved attribution after separation.
AudioMosaic: Contrastive Masked Audio Representation Learning cs.LG · 2026-05-14 · unverdicted · none · ref 6
AudioMosaic learns general-purpose audio representations through contrastive pre-training with structured spectrogram masking, reaching state-of-the-art results on standard benchmarks and improving audio-language tasks.
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech eess.AS · 2026-04-19 · unverdicted · none · ref 25
VIBE evaluates generative biases in large audio-language models with real-world speech and open-ended tasks, showing that gender cues produce larger distributional shifts than accent cues across 11 tested models.
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs cs.SD · 2025-09-09 · unverdicted · none · ref 1
AU-Harness introduces an efficient unified evaluation framework for audio LLMs featuring batch optimizations, multi-turn dialogue support, and standardized protocols for fair comparisons.

arXiv preprint arXiv:2508.21376 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer