pith. sign in

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

To preserve or not to preserve prosody is a central question in voice anonymization. Prosody conveys meaning and affect, yet is tightly coupled with speaker identity. Existing methods either discard prosody for privacy or lack a principled mechanism to control the utility-privacy trade-off, operating at fixed design points. We propose DiffAnon, a diffusion-based anonymization method with classifier-free guidance (CFG) that provides explicit, continuous inference-time control over prosody preservation. DiffAnon refines acoustic detail over semantic embeddings of an RVQ codec, enabling smooth interpolation between anonymization strength and prosodic fidelity within a single model. To the best of our knowledge, it is the first voice anonymization framework to provide structured, interpolatable inference-time prosody control. Experiments demonstrate structured trade-off behavior, achieving strong utility while maintaining competitive privacy across controllable operating points.

fields

eess.AS 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

eess.AS · 2026-04-29 · unverdicted · novelty 7.0

DiffAnon introduces the first diffusion model for voice anonymization that supplies structured, continuous, inference-time control over prosody preservation via classifier-free guidance on RVQ semantic embeddings.

citing papers explorer

Showing 1 of 1 citing paper.

  • DiffAnon: Diffusion-based Prosody Control for Voice Anonymization eess.AS · 2026-04-29 · unverdicted · none · ref 1 · internal anchor

    DiffAnon introduces the first diffusion model for voice anonymization that supplies structured, continuous, inference-time control over prosody preservation via classifier-free guidance on RVQ semantic embeddings.