hub

arXiv preprint arXiv:2305.02765 , year=

Dongchao Yang, Songxiang Liu, Rongjie Huang, Jinchuan Tian, Chao Weng, Yuexian Zou · 2023 · arXiv 2305.02765

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ

cs.SD · 2026-05-22 · unverdicted · novelty 7.0

AffectCodec applies block-diagonal projections in residual FSQ to explicitly allocate bits to emotion and acoustic subspaces, combined with emotion conditioning, yielding better emotion preservation at low bitrates with competitive acoustic quality.

Optimising Neural Speech Codecs for 300bps Communication using Reinforcement Learning

cs.SD · 2026-05-19 · unverdicted · novelty 7.0

ClariCodec achieves 3.55% WER on LibriSpeech test-clean at 300 bps by RL fine-tuning the encoder for intelligibility, yielding a 23% relative WER reduction while preserving perceptual quality.

AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling

cs.SD · 2026-05-11 · unverdicted · novelty 7.0

AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token count by 55% on TIMIT.

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

eess.AS · 2026-04-29 · unverdicted · novelty 7.0

Semantic priors from HuBERT and Whisper improve speech codec intelligibility up to 6 kbps but show diminishing returns beyond that, with a bitrate-aware regulation strategy balancing semantic consistency and naturalness.

Beyond Static Collision Handling: Adaptive Semantic ID Learning for Multimodal Recommendation at Industrial Scale

cs.IR · 2026-04-26 · unverdicted · novelty 7.0

AdaSID adaptively regulates semantic ID overlaps in multimodal recommendations to improve retrieval performance, codebook utilization, and downstream metrics like GMV.

DASB - Discrete Audio and Speech Benchmark

cs.SD · 2024-06-20 · unverdicted · novelty 7.0

DASB is a new benchmark for discrete audio tokens showing semantic tokens outperform acoustic ones but discrete representations remain less robust than continuous features across domains.

Two-Dimensional Quantization for Geometry-Aware Audio Coding

cs.SD · 2025-12-01 · unverdicted · novelty 6.0

Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.

SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization

cs.SD · 2025-05-30 · unverdicted · novelty 6.0

SwitchCodec introduces Residual Experts Vector Quantization and a multi-tiered STFT discriminator to achieve PESQ 2.87 and ViSQOL 4.27 at 2.67 kbps while halving training time via post-training.

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

eess.AS · 2024-10-09 · unverdicted · novelty 5.0

F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.

On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation

cs.SD · 2026-04-14

citing papers explorer

Showing 11 of 11 citing papers.

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ cs.SD · 2026-05-22 · unverdicted · none · ref 23
AffectCodec applies block-diagonal projections in residual FSQ to explicitly allocate bits to emotion and acoustic subspaces, combined with emotion conditioning, yielding better emotion preservation at low bitrates with competitive acoustic quality.
Optimising Neural Speech Codecs for 300bps Communication using Reinforcement Learning cs.SD · 2026-05-19 · unverdicted · none · ref 16
ClariCodec achieves 3.55% WER on LibriSpeech test-clean at 300 bps by RL fine-tuning the encoder for intelligibility, yielding a 23% relative WER reduction while preserving perceptual quality.
AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling cs.SD · 2026-05-11 · unverdicted · none · ref 59
AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization cs.LG · 2026-05-07 · unverdicted · none · ref 56
PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token count by 55% on TIMIT.
SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding eess.AS · 2026-04-29 · unverdicted · none · ref 4
Semantic priors from HuBERT and Whisper improve speech codec intelligibility up to 6 kbps but show diminishing returns beyond that, with a bitrate-aware regulation strategy balancing semantic consistency and naturalness.
Beyond Static Collision Handling: Adaptive Semantic ID Learning for Multimodal Recommendation at Industrial Scale cs.IR · 2026-04-26 · unverdicted · none · ref 35
AdaSID adaptively regulates semantic ID overlaps in multimodal recommendations to improve retrieval performance, codebook utilization, and downstream metrics like GMV.
DASB - Discrete Audio and Speech Benchmark cs.SD · 2024-06-20 · unverdicted · none · ref 45
DASB is a new benchmark for discrete audio tokens showing semantic tokens outperform acoustic ones but discrete representations remain less robust than continuous features across domains.
Two-Dimensional Quantization for Geometry-Aware Audio Coding cs.SD · 2025-12-01 · unverdicted · none · ref 75
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization cs.SD · 2025-05-30 · unverdicted · none · ref 5
SwitchCodec introduces Residual Experts Vector Quantization and a multi-tiered STFT discriminator to achieve PESQ 2.87 and ViSQOL 4.27 at 2.67 kbps while halving training time via post-training.
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching eess.AS · 2024-10-09 · unverdicted · none · ref 151
F5-TTS generates natural speech from text via flow matching on DiT with simple text padding, ConvNeXt refinement, and sway sampling, trained on 100K hours multilingual data.
On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation cs.SD · 2026-04-14 · unreviewed · ref 15

arXiv preprint arXiv:2305.02765 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer