Introduces RAIL, a CHC-grounded benchmark with five core auditory capabilities to assess LALMs beyond task-centric metrics, showing uneven model performance.
Diffa-2: A practical diffusion large language model for general audio understanding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Audio-Interaction unifies offline and online audio tasks into one streaming model via the SoundFlow framework and a new 2.6M-item streaming corpus, enabling real-time instruction following and proactive responses.
citing papers explorer
-
RAIL: Rethinking Auditory Intelligence in Large Audio-Language Models with a CHC-Grounded Benchmark
Introduces RAIL, a CHC-grounded benchmark with five core auditory capabilities to assess LALMs beyond task-centric metrics, showing uneven model performance.
-
Audio Interaction Model
Audio-Interaction unifies offline and online audio tasks into one streaming model via the SoundFlow framework and a new 2.6M-item streaming corpus, enabling real-time instruction following and proactive responses.