Edit Content, Preserve Acoustics: Imperceptible Text-Based Speech Editing via Self-Consistency Rewards

· 2026 · cs.SD · arXiv 2602.00560

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Imperceptible text-based speech editing modifies spoken content through transcript manipulation while preserving acoustic continuity. Prior acoustic-space approaches suffer from content-style entanglement, causing unstable generation and boundary artifacts. We introduce a framework guided by the principle of "Edit Content, Preserve Acoustics". Editing is conducted in a stable semantic space, while acoustic realization is handled by a Flow Matching decoder. To ensure perceptual consistency, we propose Self-Consistency Rewards Group Relative Policy Optimization, which leverages a pre-trained Text-to-Speech model as an implicit critic, together with intelligibility and duration constraints. Experiments demonstrate consistent improvements over state-of-the-art autoregressive and non-autoregressive baselines in intelligibility, robustness, and perceptual quality.

representative citing papers

CosyEdit2: Speech-Editing-Oriented Reinforcement Learning Unlocks Better Zero-Shot TTS

cs.SD · 2026-05-25 · unverdicted · novelty 6.0

A two-stage post-training pipeline of SFT followed by editing-oriented GRPO on unpaired data improves speech editing consistency and zero-shot TTS quality.

citing papers explorer

Showing 1 of 1 citing paper.

CosyEdit2: Speech-Editing-Oriented Reinforcement Learning Unlocks Better Zero-Shot TTS cs.SD · 2026-05-25 · unverdicted · none · ref 2 · internal anchor
A two-stage post-training pipeline of SFT followed by editing-oriented GRPO on unpaired data improves speech editing consistency and zero-shot TTS quality.

Edit Content, Preserve Acoustics: Imperceptible Text-Based Speech Editing via Self-Consistency Rewards

fields

years

verdicts

representative citing papers

citing papers explorer