Snac: Multi-scale neural audio codec.arXiv preprint arXiv:2410.14411

· 2024 · arXiv 2410.14411

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection

eess.AS · 2026-06-28 · unverdicted · novelty 7.0

DTM-Codec achieves better reconstruction quality and intelligibility than fixed-frame-rate neural speech codecs at matched total bitrate via dynamic token masking and Path Length Equalization for variable frame rates.

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

eess.AS · 2026-04-29 · unverdicted · novelty 7.0

Semantic priors from HuBERT and Whisper improve speech codec intelligibility up to 6 kbps but show diminishing returns beyond that, with a bitrate-aware regulation strategy balancing semantic consistency and naturalness.

Step-Audio 2 Technical Report

cs.CL · 2025-07-22 · unverdicted · novelty 6.0

Step-Audio 2 integrates a latent audio encoder, reasoning-centric reinforcement learning, and discrete audio token generation into language modeling to deliver state-of-the-art performance on audio understanding and conversational benchmarks.

On The Landscape of Spoken Language Models: A Comprehensive Survey

cs.CL · 2025-04-11 · unverdicted · novelty 3.0

A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.

citing papers explorer

Showing 4 of 4 citing papers.

DTM-Codec: Dynamic Token Masking for VFR Speech Coding with Efficient Boundary Selection eess.AS · 2026-06-28 · unverdicted · none · ref 32
DTM-Codec achieves better reconstruction quality and intelligibility than fixed-frame-rate neural speech codecs at matched total bitrate via dynamic token masking and Path Length Equalization for variable frame rates.
SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding eess.AS · 2026-04-29 · unverdicted · none · ref 16
Semantic priors from HuBERT and Whisper improve speech codec intelligibility up to 6 kbps but show diminishing returns beyond that, with a bitrate-aware regulation strategy balancing semantic consistency and naturalness.
Step-Audio 2 Technical Report cs.CL · 2025-07-22 · unverdicted · none · ref 60
Step-Audio 2 integrates a latent audio encoder, reasoning-centric reinforcement learning, and discrete audio token generation into language modeling to deliver state-of-the-art performance on audio understanding and conversational benchmarks.
On The Landscape of Spoken Language Models: A Comprehensive Survey cs.CL · 2025-04-11 · unverdicted · none · ref 41
A literature survey that organizes spoken language models by architecture, training, and evaluation choices and identifies key challenges and future directions.

Snac: Multi-scale neural audio codec.arXiv preprint arXiv:2410.14411

fields

years

verdicts

representative citing papers

citing papers explorer