Title resolution pending

Tu Anh Nguyen, Wei · 2023 · DOI 10.21437/interspeech.2023-1905

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

cs.CL · 2025-12-18 · unverdicted · novelty 7.0

Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

cs.SD · 2026-06-03 · unverdicted · novelty 6.0

CleanCodec reframes audio tokenization as a selective information bottleneck to encode only perceptually important features at 12.5 tokens per second, outperforming prior codecs in efficiency, speaker similarity, and intelligibility.

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

cs.SD · 2025-02-07 · unverdicted · novelty 6.0

Unified no-reference models assess audio aesthetics across speech, music, and sound via four perceptual axes and achieve performance comparable or superior to human mean opinion scores.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs cs.CL · 2025-12-18 · unverdicted · none · ref 65
Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.
CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding cs.SD · 2026-06-03 · unverdicted · none · ref 40
CleanCodec reframes audio tokenization as a selective information bottleneck to encode only perceptually important features at 12.5 tokens per second, outperforming prior codecs in efficiency, speaker similarity, and intelligibility.
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound cs.SD · 2025-02-07 · unverdicted · none · ref 25
Unified no-reference models assess audio aesthetics across speech, music, and sound via four perceptual axes and achieve performance comparable or superior to human mean opinion scores.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer