Wavlm: Large-scale self-supervised pre-training for full stack speech processing.IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518

Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, et al · 2022

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ

cs.SD · 2026-05-22 · unverdicted · novelty 7.0

AffectCodec applies block-diagonal projections in residual FSQ to explicitly allocate bits to emotion and acoustic subspaces, combined with emotion conditioning, yielding better emotion preservation at low bitrates with competitive acoustic quality.

AST: Adaptive, Seamless, and Training-Free Precise Speech Editing

cs.SD · 2026-04-17 · unverdicted · novelty 7.0

AST enables seamless speech editing by latent recomposition on pre-trained TTS models plus adaptive weak fact guidance, plus a new dataset and WDTW metric, claiming 70% WER reduction and better temporal consistency without training.

3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars

cs.CV · 2026-02-11 · unverdicted · novelty 6.0

3DXTalker unifies identity modeling, lip synchronization, emotional expression, and head-pose dynamics in audio-driven 3D avatars via 2D-to-3D curation, amplitude/emotion audio cues, and a flow-matching transformer with prompt control.

emg2speech: Synthesizing speech from electromyography using self-supervised speech models

cs.SD · 2025-10-28 · conditional · novelty 6.0

EMG signals from orofacial muscles are mapped via linear transformation into self-supervised speech representation space to enable direct audio synthesis, shown on an ALS patient during silent articulation.

citing papers explorer

Showing 4 of 4 citing papers.

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ cs.SD · 2026-05-22 · unverdicted · none · ref 5
AffectCodec applies block-diagonal projections in residual FSQ to explicitly allocate bits to emotion and acoustic subspaces, combined with emotion conditioning, yielding better emotion preservation at low bitrates with competitive acoustic quality.
AST: Adaptive, Seamless, and Training-Free Precise Speech Editing cs.SD · 2026-04-17 · unverdicted · none · ref 29
AST enables seamless speech editing by latent recomposition on pre-trained TTS models plus adaptive weak fact guidance, plus a new dataset and WDTW metric, claiming 70% WER reduction and better temporal consistency without training.
3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars cs.CV · 2026-02-11 · unverdicted · none · ref 34
3DXTalker unifies identity modeling, lip synchronization, emotional expression, and head-pose dynamics in audio-driven 3D avatars via 2D-to-3D curation, amplitude/emotion audio cues, and a flow-matching transformer with prompt control.
emg2speech: Synthesizing speech from electromyography using self-supervised speech models cs.SD · 2025-10-28 · conditional · none · ref 20
EMG signals from orofacial muscles are mapped via linear transformation into self-supervised speech representation space to enable direct audio synthesis, shown on an ALS patient during silent articulation.

Wavlm: Large-scale self-supervised pre-training for full stack speech processing.IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518

fields

years

verdicts

representative citing papers

citing papers explorer