Air-bench: Benchmarking large audio-language models via generative comprehension

Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, Jingren Zhou · 2024 · arXiv 2402.07729

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models

cs.SD · 2026-01-06 · unverdicted · novelty 7.0

TWNM framework equips audio-language models with spatial scene analysis via FOA simulation and metadata-grounded training, reaching 70.8% accuracy on a new ASA benchmark.

VoiceBench: Benchmarking LLM-Based Voice Assistants

cs.CL · 2024-10-22 · unverdicted · novelty 7.0

VoiceBench is the first benchmark for multi-faceted evaluation of LLM voice assistants using real and synthetic spoken instructions with speaker, environmental, and content variations.

Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

eess.AS · 2026-05-20 · unverdicted · novelty 5.0

DuplexSLA is a dual-stream three-channel full-duplex model that synchronizes continuous user audio, discrete assistant audio, and rate-limited action text for native turn-taking and in-conversation tool calling.

citing papers explorer

Showing 4 of 4 citing papers.

The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models cs.SD · 2026-01-06 · unverdicted · none · ref 21
TWNM framework equips audio-language models with spatial scene analysis via FOA simulation and metadata-grounded training, reaching 70.8% accuracy on a new ASA benchmark.
VoiceBench: Benchmarking LLM-Based Voice Assistants cs.CL · 2024-10-22 · unverdicted · none · ref 104
VoiceBench is the first benchmark for multi-faceted evaluation of LLM voice assistants using real and synthetic spoken instructions with speaker, environmental, and content variations.
Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time cs.LG · 2026-05-03 · unverdicted · none · ref 36
LIME reduces hallucinations in multimodal LLMs by using LRP to boost perceptual modality contributions through inference-time KV updates.
DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action eess.AS · 2026-05-20 · unverdicted · none · ref 39
DuplexSLA is a dual-stream three-channel full-duplex model that synchronizes continuous user audio, discrete assistant audio, and rate-limited action text for native turn-taking and in-conversation tool calling.

Air-bench: Benchmarking large audio-language models via generative comprehension

fields

years

verdicts

representative citing papers

citing papers explorer