Speaker Anonymization Using X-vector and Neural Waveform Models

Fuming Fang; Isao Echizen; Jean-Francois Bonastre; Junichi Yamagishi; Massimiliano Todisco; Nicholas Evans; Xin Wang

Speaker Anonymization Using X-vector and Neural Waveform Models

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1905.13561 v1 pith:OILHIXME submitted 2019-05-30 eess.AS cs.CLcs.LGcs.SDstat.ML

Speaker Anonymization Using X-vector and Neural Waveform Models

Fuming Fang , Xin Wang , Junichi Yamagishi , Isao Echizen , Massimiliano Todisco , Nicholas Evans , Jean-Francois Bonastre This is my paper

classification eess.AS cs.CLcs.LGcs.SDstat.ML

keywords speakerspeechdataidentityanonymizedapproachidentitiesused

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

The social media revolution has produced a plethora of web services to which users can easily upload and share multimedia documents. Despite the popularity and convenience of such services, the sharing of such inherently personal data, including speech data, raises obvious security and privacy concerns. In particular, a user's speech data may be acquired and used with speech synthesis systems to produce high-quality speech utterances which reflect the same user's speaker identity. These utterances may then be used to attack speaker verification systems. One solution to mitigate these concerns involves the concealing of speaker identities before the sharing of speech data. For this purpose, we present a new approach to speaker anonymization. The idea is to extract linguistic and speaker identity features from an utterance and then to use these with neural acoustic and waveform models to synthesize anonymized speech. The original speaker identity, in the form of timbre, is suppressed and replaced with that of an anonymous pseudo identity. The approach exploits state-of-the-art x-vector speaker representations. These are used to derive anonymized pseudo speaker identities through the combination of multiple, random speaker x-vectors. Experimental results show that the proposed approach is effective in concealing speaker identities. It increases the equal error rate of a speaker verification system while maintaining high quality, anonymized speech.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization
cs.CL 2026-06 unverdicted novelty 6.0

InfoShield uses TimeAwareMINE to minimize mutual information between speech representations and sensitive attributes, cutting gender inference from 92.6% to 55.5% and age inference from 55.7% to 30.3% while dropping d...