Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu J

URL https://github · 2024 · arXiv 2402.13236

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models

eess.AS · 2026-04-28 · unverdicted · novelty 7.0

Semantic-level and verification-based uncertainty methods outperform token-level baselines for audio reasoning in ALLMs, but their relative performance on hallucination and unanswerable-question benchmarks is model- and task-dependent.

Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

eess.AS · 2025-09-30 · unverdicted · novelty 7.0

Game-Time Benchmark shows spoken language models handle basic tasks but degrade sharply under temporal constraints like tempo adherence and synchronized responses.

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

MoVE uses specialized LoRA expert adapters and a soft router to translate non-verbal vocalizations in S2ST, reproducing them in 76% of cases versus at most 14% for baselines while scoring highest on naturalness and emotional fidelity.

On The Landscape of Spoken Language Models: A Comprehensive Survey

cs.CL · 2025-04-11 · unverdicted · novelty 3.0

A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

cs.LG · 2026-05-07

citing papers explorer

Showing 5 of 5 citing papers.

Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models eess.AS · 2026-04-28 · unverdicted · none · ref 16
Semantic-level and verification-based uncertainty methods outperform token-level baselines for audio reasoning in ALLMs, but their relative performance on hallucination and unanswerable-question benchmarks is model- and task-dependent.
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models eess.AS · 2025-09-30 · unverdicted · none · ref 11
Game-Time Benchmark shows spoken language models handle basic tasks but degrade sharply under temporal constraints like tempo adherence and synchronized responses.
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation cs.CL · 2026-04-19 · unverdicted · none · ref 26
MoVE uses specialized LoRA expert adapters and a soft router to translate non-verbal vocalizations in S2ST, reproducing them in 76% of cases versus at most 14% for baselines while scoring highest on naturalness and emotional fidelity.
On The Landscape of Spoken Language Models: A Comprehensive Survey cs.CL · 2025-04-11 · unverdicted · none · ref 46
A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization cs.LG · 2026-05-07 · unreviewed · ref 53

Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu J

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer