Speech resynthesis from discrete disentangled self-supervised representations

Speech resynthesis from discrete disentangled self-supervised representations , author= · 2021 · arXiv 2104.00355

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

eess.AS · 2022-10-24 · accept · novelty 7.0

EnCodec is an end-to-end trained streaming neural audio codec that uses a single multiscale spectrogram discriminator and a gradient-normalizing loss balancer to achieve higher fidelity than prior methods at the same bitrates for 24 kHz mono and 48 kHz stereo audio.

Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment

cs.SD · 2026-06-11 · unverdicted · novelty 6.0

Self-guidance adds a lightweight feature-mapping loss to align decoder manifolds in VQ-VAE speech codecs, raising reconstruction metrics and allowing 4x codebook reduction with no fidelity loss.

MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora

cs.SD · 2026-04-13 · unverdicted · novelty 6.0

MimicLM achieves better naturalness in zero-shot voice imitation by autoregressively modeling pseudo-parallel data with synthetic sources and real targets, plus interleaved text-audio guidance and preference alignment.

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

cs.CL · 2025-09-26 · unverdicted · novelty 6.0

StableToken introduces a multi-branch architecture with bit-wise voting to create noise-robust semantic speech tokens, achieving lower Unit Edit Distance and better SpeechLLM robustness than prior single-path tokenizers.

Privacy-preserving Prosody Representation Learning

eess.AS · 2026-05-29 · unverdicted · novelty 5.0

A self-supervised prosody encoder with speaker disentanglement strategies outperforms raw prosody and HuBERT baselines on pitch reconstruction and prosodic event detection while achieving strong speaker separation.

A Synonymous Variational Perspective on the Rate-Distortion-Perception Tradeoff

cs.IT · 2026-04-16

citing papers explorer

Showing 4 of 4 citing papers after filters.

Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment cs.SD · 2026-06-11 · unverdicted · none · ref 75
Self-guidance adds a lightweight feature-mapping loss to align decoder manifolds in VQ-VAE speech codecs, raising reconstruction metrics and allowing 4x codebook reduction with no fidelity loss.
MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora cs.SD · 2026-04-13 · unverdicted · none · ref 5
MimicLM achieves better naturalness in zero-shot voice imitation by autoregressively modeling pseudo-parallel data with synthetic sources and real targets, plus interleaved text-audio guidance and preference alignment.
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs cs.CL · 2025-09-26 · unverdicted · none · ref 61
StableToken introduces a multi-branch architecture with bit-wise voting to create noise-robust semantic speech tokens, achieving lower Unit Edit Distance and better SpeechLLM robustness than prior single-path tokenizers.
Privacy-preserving Prosody Representation Learning eess.AS · 2026-05-29 · unverdicted · none · ref 74
A self-supervised prosody encoder with speaker disentanglement strategies outperforms raw prosody and HuBERT baselines on pitch reconstruction and prosodic event detection while achieving strong speaker separation.

Speech resynthesis from discrete disentangled self-supervised representations

fields

years

verdicts

representative citing papers

citing papers explorer