Snac: Multi-scale neural audio codec.arXiv preprint arXiv:2410.14411

Snac: Multi-scale neural audio codec , author= · 2024 · arXiv 2410.14411

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

cs.SD · 2026-06-30 · unverdicted · novelty 7.0

FlexiSLM is the first spoken language model supporting dynamic and controllable frame rates on speech input and output, outperforming fixed-rate 7B models at high quality and enabling faster inference at lower rates like 6.25 Hz.

DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection

eess.AS · 2026-06-28 · unverdicted · novelty 7.0

DTM-Codec achieves better reconstruction quality and intelligibility than fixed-frame-rate neural speech codecs at matched total bitrate via dynamic token masking and Path Length Equalization for variable frame rates.

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

eess.AS · 2026-04-29 · unverdicted · novelty 7.0

Semantic priors from HuBERT and Whisper improve speech codec intelligibility up to 6 kbps but show diminishing returns beyond that, with a bitrate-aware regulation strategy balancing semantic consistency and naturalness.

Step-Audio 2 Technical Report

cs.CL · 2025-07-22 · unverdicted · novelty 6.0

Step-Audio 2 integrates a latent audio encoder, reasoning-centric reinforcement learning, and discrete audio token generation into language modeling to deliver state-of-the-art performance on audio understanding and conversational benchmarks.

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.

On The Landscape of Spoken Language Models: A Comprehensive Survey

cs.CL · 2025-04-11 · unverdicted · novelty 3.0

A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Step-Audio 2 Technical Report cs.CL · 2025-07-22 · unverdicted · none · ref 60
Step-Audio 2 integrates a latent audio encoder, reasoning-centric reinforcement learning, and discrete audio token generation into language modeling to deliver state-of-the-art performance on audio understanding and conversational benchmarks.
On The Landscape of Spoken Language Models: A Comprehensive Survey cs.CL · 2025-04-11 · unverdicted · none · ref 41
A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.

Snac: Multi-scale neural audio codec.arXiv preprint arXiv:2410.14411

fields

years

verdicts

representative citing papers

citing papers explorer