Openvoice: Versatile instant voice cloning

Zengyi Qin, Wenliang Zhao, Xumin Yu, Xin Sun, “Openvoice: Versatile instant voice cloning,”arXiv preprint arXiv:2312 · 2023 · arXiv 2312.01479

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

baseline 1 dataset 1 method 1

citation-polarity summary

baseline 1 use dataset 1 use method 1

representative citing papers

Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling

cs.SD · 2026-05-12 · unverdicted · novelty 7.0

Poly-SVC converts singing voices from polyphonic recordings while keeping melody, lyrics, and harmonies by combining CQT-based pitch extraction with a conditional flow matching diffusion decoder.

V.O.I.C.E (Voice, Ownership, Identity, Control, Expression): Risk Taxonomy of Synthetic Voice Generation From Empirical Data

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

V.O.I.C.E is a new taxonomy that organizes synthetic voice risks into five categories and shows how they interact with exposure, visibility, and legal context using empirical incident data.

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

eess.AS · 2026-04-14 · unverdicted · novelty 7.0

X-VC achieves zero-shot streaming voice conversion via one-step codec-space conversion with dual-conditioning acoustic converter and role-assignment training on generated paired data.

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

cs.GR · 2026-01-29 · unverdicted · novelty 7.0

JUST-DUB-IT adapts a joint audio-visual diffusion model via LoRA to generate high-quality dubbed videos with translated audio and lip-synced facial motion.

EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection

eess.AS · 2025-10-22 · unverdicted · novelty 7.0

EchoFake is a new replay-aware dataset combining zero-shot TTS deepfakes and physical replay recordings to improve generalization of speech deepfake detection models over existing lab-focused datasets.

MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora

cs.SD · 2026-04-13 · unverdicted · novelty 6.0

MimicLM achieves better naturalness in zero-shot voice imitation by autoregressively modeling pseudo-parallel data with synthetic sources and real targets, plus interleaved text-audio guidance and preference alignment.

Elderly-Contextual Data Augmentation via Speech Synthesis for Elderly ASR

cs.CL · 2026-04-15 · unverdicted · novelty 5.0

Combining LLM-based elderly-contextual paraphrasing with TTS synthesis using elderly speakers reduces word error rates in elderly ASR by up to 58% over standard Whisper baselines on English and Korean datasets.

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

cs.SD · 2024-12-13 · unverdicted · novelty 5.0

CosyVoice 2 delivers human-parity naturalness and near-lossless streaming speech synthesis by combining finite-scalar quantization, a streamlined pre-trained LLM, and chunk-aware causal flow matching on large multilingual data.

Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching

cs.HC · 2026-04-26

citing papers explorer

Showing 9 of 9 citing papers.

Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling cs.SD · 2026-05-12 · unverdicted · none · ref 21
Poly-SVC converts singing voices from polyphonic recordings while keeping melody, lyrics, and harmonies by combining CQT-based pitch extraction with a conditional flow matching diffusion decoder.
V.O.I.C.E (Voice, Ownership, Identity, Control, Expression): Risk Taxonomy of Synthetic Voice Generation From Empirical Data cs.CR · 2026-04-25 · unverdicted · none · ref 66
V.O.I.C.E is a new taxonomy that organizes synthetic voice risks into five categories and shows how they interact with exposure, visibility, and legal context using empirical incident data.
X-VC: Zero-shot Streaming Voice Conversion in Codec Space eess.AS · 2026-04-14 · unverdicted · none · ref 32
X-VC achieves zero-shot streaming voice conversion via one-step codec-space conversion with dual-conditioning acoustic converter and role-assignment training on generated paired data.
JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion cs.GR · 2026-01-29 · unverdicted · none · ref 21
JUST-DUB-IT adapts a joint audio-visual diffusion model via LoRA to generate high-quality dubbed videos with translated audio and lip-synced facial motion.
EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection eess.AS · 2025-10-22 · unverdicted · none · ref 27
EchoFake is a new replay-aware dataset combining zero-shot TTS deepfakes and physical replay recordings to improve generalization of speech deepfake detection models over existing lab-focused datasets.
MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora cs.SD · 2026-04-13 · unverdicted · none · ref 6
MimicLM achieves better naturalness in zero-shot voice imitation by autoregressively modeling pseudo-parallel data with synthetic sources and real targets, plus interleaved text-audio guidance and preference alignment.
Elderly-Contextual Data Augmentation via Speech Synthesis for Elderly ASR cs.CL · 2026-04-15 · unverdicted · none · ref 32
Combining LLM-based elderly-contextual paraphrasing with TTS synthesis using elderly speakers reduces word error rates in elderly ASR by up to 58% over standard Whisper baselines on English and Korean datasets.
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models cs.SD · 2024-12-13 · unverdicted · none · ref 58
CosyVoice 2 delivers human-parity naturalness and near-lossless streaming speech synthesis by combining finite-scalar quantization, a streamlined pre-trained LLM, and chunk-aware causal flow matching on large multilingual data.
Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching cs.HC · 2026-04-26 · unreviewed · ref 9

Openvoice: Versatile instant voice cloning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer