Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling

· 2026 · cs.SD · arXiv 2605.12310

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Singing Voice Conversion (SVC) aims to transform a source singing voice into a target singer while preserving lyrics and melody. Most existing SVC methods depend on F0 extractors to capture the lead melody from clean vocals. However, no existing method can reliably extract clean vocals from accompanied recordings without leaving residual harmonies behind. In this paper, we innovatively propose Poly-SVC, a zero-shot, cross-lingual singing voice conversion system designed to process residual harmonies. Poly-SVC is composed of three key components: a Constant-Q Transform (CQT)-based pitch extractor to preserve both the lead melody and residual harmony, a random sampler to reduce interference information from the CQT and a diffusion decoder based on Conditional Flow Matching (CFM) that fuses pitch, content, and timbre features into natural-sounding polyphonic outputs. Experiments demonstrate that Poly-SVC surpasses the baseline models in naturalness, timbre similarity and harmony reconstruction across both harmony-rich and single-melody recordings.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling

cs.SD · 2026-05-12 · unverdicted · novelty 7.0

Poly-SVC converts singing voices from polyphonic recordings while keeping melody, lyrics, and harmonies by combining CQT-based pitch extraction with a conditional flow matching diffusion decoder.

citing papers explorer

Showing 1 of 1 citing paper.

Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling cs.SD · 2026-05-12 · unverdicted · none · ref 1 · internal anchor
Poly-SVC converts singing voices from polyphonic recordings while keeping melody, lyrics, and harmonies by combining CQT-based pitch extraction with a conditional flow matching diffusion decoder.

Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer