Ultraeval-audio: A unified framework for comprehensive evaluation of audio foundation models,

· 2026 · arXiv 2601.01373

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

cs.CL · 2026-05-31 · unverdicted · novelty 7.0

PolySpeech-100 is a new benchmark for native-level speech comprehension across 110 linguistic variants that evaluates 22 models and reports E2E advantages on dialects, robustness gaps on low-resource languages, and degradation from Chain-of-Thought prompting.

Preserving Speech-to-Text LLM Capabilities in Speech-to-Speech Generation

eess.AS · 2026-06-29 · unverdicted · novelty 6.0

PRIME-Speech adds low-latency speech output to frozen S2T LLMs by synchronizing a causal post-decoder with intermediate hidden states and using mixed conditioning plus turn-level KV-cache packing, preserving original S2T performance across translation, QA, and dialogue tasks.

AudioKV: KV Cache Eviction in Efficient Large Audio Language Models

cs.SD · 2026-04-08 · unverdicted · novelty 5.0

AudioKV prioritizes audio-critical attention heads identified via ASR analysis and applies spectral score smoothing to evict KV cache tokens, achieving high compression with minimal accuracy loss in LALMs.

Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving

cs.CL · 2026-07-02 · unverdicted · novelty 4.0

JSTIP interleaves speech and text sequences during pretraining on 38k hours of ASR data to improve entity accuracy over ASR-only and simple joint-training baselines while matching performance from domain text.

citing papers explorer

Showing 1 of 1 citing paper after filters.

AudioKV: KV Cache Eviction in Efficient Large Audio Language Models cs.SD · 2026-04-08 · unverdicted · none · ref 20
AudioKV prioritizes audio-critical attention heads identified via ASR analysis and applies spectral score smoothing to evict KV cache tokens, achieving high compression with minimal accuracy loss in LALMs.

Ultraeval-audio: A unified framework for comprehensive evaluation of audio foundation models,

fields

years

verdicts

representative citing papers

citing papers explorer