Diffa-2: A practical diffusion large language model for general audio understanding

Jiaming Zhou, Xuxin Cheng, Shiwan Zhao, Yuhang Jia, Cao Liu, Ke Zeng, Xunliang Cai, Yong Qin · 2026 · arXiv 2601.23161

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

RAIL: Rethinking Auditory Intelligence in Large Audio-Language Models with a CHC-Grounded Benchmark

cs.SD · 2026-06-09 · unverdicted · novelty 6.0

Introduces RAIL, a CHC-grounded benchmark with five core auditory capabilities to assess LALMs beyond task-centric metrics, showing uneven model performance.

Audio Interaction Model

cs.SD · 2026-06-03 · unverdicted · novelty 6.0

Audio-Interaction unifies offline and online audio tasks into one streaming model via the SoundFlow framework and a new 2.6M-item streaming corpus, enabling real-time instruction following and proactive responses.

citing papers explorer

Showing 2 of 2 citing papers after filters.

RAIL: Rethinking Auditory Intelligence in Large Audio-Language Models with a CHC-Grounded Benchmark cs.SD · 2026-06-09 · unverdicted · none · ref 38
Introduces RAIL, a CHC-grounded benchmark with five core auditory capabilities to assess LALMs beyond task-centric metrics, showing uneven model performance.
Audio Interaction Model cs.SD · 2026-06-03 · unverdicted · none · ref 42
Audio-Interaction unifies offline and online audio tasks into one streaming model via the SoundFlow framework and a new 2.6M-item streaming corpus, enabling real-time instruction following and proactive responses.

Diffa-2: A practical diffusion large language model for general audio understanding

fields

years

verdicts

representative citing papers

citing papers explorer