Ultraeval-audio: A unified framework for comprehensive evaluation of audio foundation models,

Qundong Shi, Jie Zhou, Biyuan Lin, et al · 2026 · arXiv 2601.01373

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

cs.CL · 2026-05-31 · unverdicted · novelty 7.0

PolySpeech-100 is a new benchmark for native-level speech comprehension across 110 linguistic variants that evaluates 22 models and reports E2E advantages on dialects, robustness gaps on low-resource languages, and degradation from Chain-of-Thought prompting.

Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech Generation

eess.AS · 2026-06-29 · unverdicted · novelty 6.0

PRIME-Speech adds low-latency speech output to frozen S2T LLMs by synchronizing a causal post-decoder with intermediate hidden states and using mixed conditioning plus turn-level KV-cache packing, preserving original S2T performance across translation, QA, and dialogue tasks.

AudioKV: KV Cache Eviction in Efficient Large Audio Language Models

cs.SD · 2026-04-08 · unverdicted · novelty 5.0

AudioKV prioritizes audio-critical attention heads identified via ASR analysis and applies spectral score smoothing to evict KV cache tokens, achieving high compression with minimal accuracy loss in LALMs.

citing papers explorer

Showing 3 of 3 citing papers.

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects cs.CL · 2026-05-31 · unverdicted · none · ref 69
PolySpeech-100 is a new benchmark for native-level speech comprehension across 110 linguistic variants that evaluates 22 models and reports E2E advantages on dialects, robustness gaps on low-resource languages, and degradation from Chain-of-Thought prompting.
Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech Generation eess.AS · 2026-06-29 · unverdicted · none · ref 49
PRIME-Speech adds low-latency speech output to frozen S2T LLMs by synchronizing a causal post-decoder with intermediate hidden states and using mixed conditioning plus turn-level KV-cache packing, preserving original S2T performance across translation, QA, and dialogue tasks.
AudioKV: KV Cache Eviction in Efficient Large Audio Language Models cs.SD · 2026-04-08 · unverdicted · none · ref 20
AudioKV prioritizes audio-critical attention heads identified via ASR analysis and applies spectral score smoothing to evict KV cache tokens, achieving high compression with minimal accuracy loss in LALMs.

Ultraeval-audio: A unified framework for comprehensive evaluation of audio foundation models,

fields

years

verdicts

representative citing papers

citing papers explorer