Soundstream: An end-to-end neural audio codec

Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi · 2021 · arXiv 2107.03312

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

ENSEMBITS: an alphabet of protein conformational ensembles

cs.LG · 2026-05-13 · unverdicted · novelty 8.0 · 2 refs

Ensembits is the first tokenizer of protein conformational ensembles that outperforms static tokenizers on RMSF prediction and matches them on function and mutation tasks while using less pretraining data.

Codec-Robust Attacks on Audio LLMs

cs.SD · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.

AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI

eess.SP · 2026-05-20 · unverdicted · novelty 6.0

AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient deployment.

Efficient Retrieval Scaling with Hierarchical Indexing for Large Scale Recommendation

cs.IR · 2026-04-14 · unverdicted · novelty 6.0

A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.

FAST: Efficient Action Tokenization for Vision-Language-Action Models

cs.RO · 2025-01-16 · unverdicted · novelty 6.0

FAST applies discrete cosine transform to robot action sequences for efficient tokenization, enabling autoregressive VLAs to succeed on high-frequency dexterous tasks and scale to 10k hours of data while matching diffusion VLA performance with up to 5x faster training.

F3-Tokenizer: Taming Audio Autoencoder Latents for Understanding and Generation

cs.SD · 2026-06-04 · unverdicted · novelty 5.0

F3-Tokenizer adapts audio autoencoder latents with noise-regularized bottleneck (channel normalization and stochastic perturbation) and a representation encoder (RQ-MTP plus frozen-LLM supervision) to support both high-dimensional understanding representations and normalized continuous generation ta

Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals

eess.AS · 2026-05-30 · unverdicted · novelty 5.0

A continuous-token model with shared Haar wavelet coefficients reports 39.92 dB audio, 29.37 dB image, and 23.93 dB video PSNR on three datasets and shows energy-based selection outperforms uniform selection by roughly 16 dB.

Woosh: A Sound Effects Foundation Model

cs.SD · 2026-04-02 · accept · novelty 5.0

Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.

EntangleCodec: A Unified Discrete Audio Tokenizer via Semantic-Acoustic Entanglement

cs.SD · 2026-06-01 · unverdicted · novelty 4.0

EntangleCodec unifies semantic and acoustic audio tokenization via caption alignment and flow-matching decoding, reporting competitive reconstruction, +7.4% gains on MMAR understanding, and 0.6B-parameter ALMs surpassing 13B-parameter continuous baselines.

Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model

cs.SD · 2026-05-20 · unverdicted · novelty 4.0

The paper introduces Musical Attention, an attention variant that incorporates eight musical features including metadata to generate more coherent and varied music than standard or strided attention baselines.

Telephony Voice Agent for Banking Services

cs.HC · 2026-06-27 · unverdicted · novelty 2.0

Implementation of a telephony voice agent for banking services using Dialogflow CX supporting queries, authentication, and live agent handoff.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Codec-Robust Attacks on Audio LLMs cs.SD · 2026-05-19 · unverdicted · none · ref 55 · 2 links
CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.
F3-Tokenizer: Taming Audio Autoencoder Latents for Understanding and Generation cs.SD · 2026-06-04 · unverdicted · none · ref 22
F3-Tokenizer adapts audio autoencoder latents with noise-regularized bottleneck (channel normalization and stochastic perturbation) and a representation encoder (RQ-MTP plus frozen-LLM supervision) to support both high-dimensional understanding representations and normalized continuous generation ta
Woosh: A Sound Effects Foundation Model cs.SD · 2026-04-02 · accept · none · ref 21
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.
EntangleCodec: A Unified Discrete Audio Tokenizer via Semantic-Acoustic Entanglement cs.SD · 2026-06-01 · unverdicted · none · ref 36
EntangleCodec unifies semantic and acoustic audio tokenization via caption alignment and flow-matching decoding, reporting competitive reconstruction, +7.4% gains on MMAR understanding, and 0.6B-parameter ALMs surpassing 13B-parameter continuous baselines.
Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model cs.SD · 2026-05-20 · unverdicted · none · ref 23
The paper introduces Musical Attention, an attention variant that incorporates eight musical features including metadata to generate more coherent and varied music than standard or strided attention baselines.

Soundstream: An end-to-end neural audio codec

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer