Perceptual implications of automatic anonymization in pathological speech
Pith reviewed 2026-05-22 17:54 UTC · model grok-4.3
The pith
Automatic anonymization changes perceived quality and detectability of pathological speech without disrupting clinical severity ratings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a study with 180 speakers across five pathological groups and controls, automatic anonymization led to 91% zero-shot and 93% few-shot detection accuracy by ten listeners. Quality ratings fell by 30 points on a 0-100 scale, reordering group perceptions. Clinical severity ratings showed high agreement with kappa values of 0.87 to 0.94 for several disorders and no more than one-grade shifts. Perceptual results decoupled from computational privacy metrics, where the pathology with strongest computational anonymization was the least perceptually conspicuous.
What carries the argument
A structured human listening protocol that includes zero-shot and few-shot discrimination, quality rating, and blinded clinical severity assessment by a phoniatrician.
If this is right
- Severity ratings by clinicians remain reliable for dysarthria, dysglossia, and dysphonia after anonymization.
- Quality perception decreases and the relative ordering of disorder groups changes.
- Native language influences how easily anonymization is detected but not the quality drop.
- Listener expertise influences quality degradation but not detection rates.
- Disorder-stratified evaluation is needed because perceptual and computational measures do not align.
Where Pith is reading between the lines
- Developers of anonymization systems should test with actual clinicians and patients rather than relying solely on algorithms.
- Future work could explore whether adjusting anonymization intensity per disorder improves the balance between privacy and usability.
- Similar evaluations in other languages or with voice disorders not tested here might show different patterns.
Load-bearing premise
The specific anonymization method used and the results from ten listeners will hold for other anonymization approaches and for larger groups of clinicians and patients.
What would settle it
Finding a different anonymization technique where perceptual conspicuousness directly corresponds to the computational privacy score, or observing clinical severity ratings that shift by more than one grade in a blinded evaluation.
Figures
read the original abstract
Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates perceptual effects of automatic anonymization on pathological speech from 180 German speakers across CLP, Dysarthria, Dysglossia, Dysphonia, and controls. Using 10 listeners (native/non-native, clinical/signal-processing expertise), it reports 91-93% detection accuracy (zero- and few-shot), a 30-point drop in perceived quality, preserved clinical severity ratings (kappa 0.87-0.94 in three disorders), listener-attribute double dissociations, and a decoupling between computational privacy metrics and perceptual conspicuousness, arguing that disorder-stratified, listener-stratified, clinician-validated evaluation is required as the minimum standard for clinical licensing.
Significance. If the decoupling and preservation findings hold, the work supplies concrete evidence that computational privacy metrics can misalign with human perception in disordered speech, supporting more rigorous, stratified protocols for ethical data release. The structured multi-task protocol, statistical tests, and explicit double dissociation between language and expertise effects are strengths; the near-perfect severity agreement in selected disorders is a useful clinical anchor.
major comments (2)
- [Results (decoupling paragraph)] Results section on decoupling: the claim that 'the pathology with the strongest computational anonymization was the least perceptually conspicuous' and the ensuing call for disorder-stratified evaluation as minimum standard rest on a single unspecified anonymization pipeline; without cross-technique replication or explicit confirmation that the privacy metric is pathology-invariant, the generalization does not follow from the data.
- [Methods (listener protocol)] Methods and listener cohort: subgroup splits for native-language and expertise effects are necessarily small (n=10 total), and no power analysis or pre-registration of exclusion rules is referenced; this limits support for the reported p=0.008 disorder variation and the double-dissociation interpretation as generalizable findings.
minor comments (2)
- [Abstract] Abstract and results: the 30-point quality drop is reported on a 0-100 scale but the exact rating instrument and anchoring are not restated, making direct comparison to prior work harder.
- [Introduction or Methods] The manuscript refers to 'the standard computational privacy metric' without a brief equation or citation in the main text; adding this would clarify the cross-disorder comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review of our manuscript. We address each major comment below and have made revisions to strengthen the paper where appropriate.
read point-by-point responses
-
Referee: [Results (decoupling paragraph)] Results section on decoupling: the claim that 'the pathology with the strongest computational anonymization was the least perceptually conspicuous' and the ensuing call for disorder-stratified evaluation as minimum standard rest on a single unspecified anonymization pipeline; without cross-technique replication or explicit confirmation that the privacy metric is pathology-invariant, the generalization does not follow from the data.
Authors: We appreciate this observation. The anonymization pipeline is described in detail in the Methods (Section 2.3), using a standard voice-conversion approach whose implementation details and hyperparameters are provided. The computational privacy metric follows the protocol established in prior anonymization literature. We agree that the observed decoupling is specific to the evaluated pipeline and that broader claims would benefit from multi-technique replication. We have revised the Results and Discussion to explicitly qualify the finding as pertaining to this pipeline, to note that the privacy metric's pathology-invariance was not independently verified here, and to frame the call for disorder-stratified evaluation as a minimum standard motivated by the present data while recommending cross-technique studies for future work. revision: partial
-
Referee: [Methods (listener protocol)] Methods and listener cohort: subgroup splits for native-language and expertise effects are necessarily small (n=10 total), and no power analysis or pre-registration of exclusion rules is referenced; this limits support for the reported p=0.008 disorder variation and the double-dissociation interpretation as generalizable findings.
Authors: We acknowledge the constraints of the modest listener sample (n=10). The cohort was assembled to capture the four listener attributes of interest while remaining feasible for a multi-task perceptual protocol. The reported p=0.008 reflects the omnibus test across the full listener group; the double dissociation emerges from the distinct patterns of modulation by language versus expertise. Because the study was exploratory rather than hypothesis-confirmatory, pre-registration was not undertaken. We will expand the Discussion with an explicit limitations subsection that (i) states the sample-size limitation, (ii) includes post-hoc power estimates for the key statistical tests, and (iii) calls for larger, pre-registered replications to establish generalizability. revision: yes
Circularity Check
No significant circularity: direct empirical listener study
full rationale
The paper reports results from a structured human evaluation protocol involving 180 speakers and 10 listeners across discrimination, quality, and clinical severity tasks. All central claims (detection rates, quality drops, disorder-specific variation, decoupling from a computational privacy metric, and the call for stratified evaluation) rest on measured listener responses and standard statistical tests rather than any derivation, fitted parameter renamed as prediction, or self-citation chain. No equations, ansatzes, or uniqueness theorems appear; the study is self-contained against its own data collection and does not reduce any result to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human listeners can perform reliable zero-shot and few-shot discrimination and rating tasks on short speech samples.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Listeners demonstrated consistently high discrimination accuracy... repeated-measures ANOVA... Pearson correlation coefficients... EER vs. Turing (Zero-shot) r = –0.020
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Anonymization consistently reduced perceived quality... one-way ANOVA p=0.0046
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Kent, R. D. Hearing and Believing: Some Limits to the Auditory-Perceptual Assessment of Speech and Voice Disorders. Am J Speech Lang Pathol 5, 7–23 (1996)
work page 1996
-
[3]
Pappagari, R., Cho, J., Moro-Velázquez, L. & Dehak, N. Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer’s Disease and Assess its Severity. in INTERSPEECH 2020 2177–2181 (ISCA, 2020). doi:10.21437/Interspeech.2020-2587
-
[4]
Riedhammer, K. et al. Medical Speech Processing for Diagnosis and Monitoring: Clinical Use Cases. in Fortschritte der Akustik - DAGA 1417–1420 (Hamburg, Germany, 2023)
work page 2023
-
[5]
Bayerl, S. P. et al. What can Speech and Language Tell us About the Working Alliance in Psychotherapy. in Interspeech 2022 2443–2447 (ISCA, 2022). doi:10.21437/Interspeech.2022-347
-
[6]
Strimbu, K. & Tavel, J. A. What are biomarkers?: Current Opinion in HIV and AIDS 5, 463– 466 (2010)
work page 2010
-
[7]
Califf, R. M. Biomarker definitions and their applications. Exp Biol Med (Maywood) 243, 213–221 (2018)
work page 2018
-
[8]
Ramanarayanan, V., Lammert, A. C., Rowe, H. P., Quatieri, T. F. & Green, J. R. Speech as a Biomarker: Opportunities, Interpretability, and Challenges. Perspect ASHA SIGs 7, 276– 283 (2022)
work page 2022
-
[9]
Kröger, J. L., Lutz, O. H.-M. & Raschke, P. Privacy Implications of Voice and Speech Analysis – Information Disclosure by Inference. in Privacy and Identity Management. Data for Better Living: AI and Privacy (eds. Friedewald, M., Önen, M., Lievens, E., Krenn, S. & Fricker, S.) vol. 576 242–258 (Springer International Publishing, Cham, 2020)
work page 2020
-
[10]
Tayebi Arasteh, S. et al. Federated Learning for Secure Development of AI Models for Parkinson’s Disease Detection Using Speech from Different Languages. in INTERSPEECH 2023 5003--5007 (Dublin, Ireland, 2023). doi:10.21437/Interspeech.2023-2108
-
[11]
Khamsehashari, R. et al. Voice Privacy - leveraging multi-scale blocks with ECAPA-TDNN SE-Res2NeXt extension for speaker anonymization. in 2nd Symposium on Security and Privacy in Speech Communication 43–48 (ISCA, 2022). doi:10.21437/SPSC.2022-8
-
[14]
Nautsch, A. et al. Preserving privacy in speaker and speech characterisation. Computer Speech & Language 58, 441–480 (2019)
work page 2019
-
[18]
Qian, J. et al. Towards Privacy-Preserving Speech Data Publishing. in IEEE INFOCOM 2018 - IEEE Conference on Computer Communications 1079–1087 (IEEE, Honolulu, HI, 2018). doi:10.1109/INFOCOM.2018.8486250
-
[19]
Lal Srivastava, B. M. et al. Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers. in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, 32 Speech and Signal Processing (ICASSP) 2802–2806 (IEEE, Barcelona, Spain, 2020). doi:10.1109/ICASSP40776.2020.9053868
-
[20]
Ghosh, S. et al. Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example. in Interspeech 2024 4438–4442 (ISCA, 2024). doi:10.21437/Interspeech.2024-328
-
[21]
Srivastava, B. M. L. et al. Design Choices for X-Vector Based Speaker Anonymization. in INTERSPEECH 2020 1713–1717 (ISCA, 2020). doi:10.21437/Interspeech.2020-2692
-
[22]
Mawalim, C. O., Okada, S. & Unoki, M. Speaker anonymization by pitch shifting based on time-scale modification. in 2nd Symposium on Security and Privacy in Speech Communication 35–42 (ISCA, 2022). doi:10.21437/SPSC.2022-7
-
[23]
Tayebi Arasteh, S. et al. The effect of speech pathology on automatic speaker verification: a large-scale study. Sci Rep 13, 20476 (2023)
work page 2023
-
[24]
Tayebi Arasteh, S. et al. Differential privacy enables fair and accurate AI-based analysis of speech disorders while protecting patient data. Preprint at https://doi.org/10.48550/arXiv.2409.19078 (2024)
-
[25]
Srivastava, B. M. L. et al. Privacy and Utility of X-Vector Based Speaker Anonymization. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 2383–2395 (2022)
work page 2022
-
[26]
Siegert, I., Rech, S., Bäckström, T. & Haase, M. User Perspective on Anonymity in Voice Assistants – A comparison between Germany and Finland. in Proceedings of the Workshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024 (Turin, Italy, 2024)
work page 2024
-
[27]
Kluin, K. J., Foster, N. L., Berent, S. & Gilman, S. Perceptual analysis of speech disorders in progressive supranuclear palsy. Neurology 43, 563–566 (1993)
work page 1993
-
[28]
Sachin, S. et al. Clinical speech impairment in Parkinson’s disease, progressive supranuclear palsy, and multiple system atrophy. Neurol India 56, 122–126 (2008)
work page 2008
-
[29]
Pernon, M., Assal, F., Kodrasi, I. & Laganaro, M. Perceptual Classification of Motor Speech Disorders: The Role of Severity, Speech Task, and Listener’s Expertise. J Speech Lang Hear Res 65, 2727–2747 (2022)
work page 2022
-
[30]
Turing, A. M. COMPUTING MACHINERY AND INTELLIGENCE. Mind LIX, 433–460 (1950)
work page 1950
-
[31]
Maier, A. et al. PEAKS – A system for the automatic evaluation of voice and speech disorders. Speech Communication 51, 425–437 (2009)
work page 2009
-
[32]
Harding, A. & Grunwell, P. Characteristics of cleft palate speech. Intl J Lang & Comm Disor 31, 331–357 (1996)
work page 1996
-
[33]
Millard, T. & Richman, L. C. Different Cleft Conditions, Facial Appearance, and Speech: Relationship to Psychological Variables. The Cleft Palate-Craniofacial Journal 38, 68–75 (2001)
work page 2001
-
[34]
Maier, A., Nöth, E., Batliner, A., Nkenke, E. & Schuster, M. Fully Automatic Assessment of Speech of Children with Cleft Lip and Palate. Informatica 30, 477–482 (2006)
work page 2006
-
[35]
Pathophysiology of Motor Speech Disorders (Dysarthria)
Hirose, H. Pathophysiology of Motor Speech Disorders (Dysarthria). Folia Phoniatr Logop 38, 61–88 (1986)
work page 1986
-
[36]
Schröter-Morasch, H. & Ziegler, W. Rehabilitation of impaired speech function (dysarthria, dysglossia). GMS Curr Top Otorhinolaryngol Head Neck Surg 4, Doc15 (2005)
work page 2005
-
[37]
Sama, A., Carding, P. N., Price, S., Kelly, P. & Wilson, J. A. The Clinical Features of Functional Dysphonia. The Laryngoscope 111, 458–463 (2001)
work page 2001
-
[38]
Fox, A. V. PLAKSS : Psycholinguistische Analyse kindlicher Sprechstörungen. (Swets & Zeitlinger, Frankfurt a.M, Germany, 2002)
work page 2002
-
[40]
McAdams, S. E. Spectral Fusion, Spectral Parsing and the Formation of Auditory Images. (Ph.D. dissertation, Stanford University, 1984). 33
work page 1984
-
[41]
Common European Framework of Reference for Languages
Little, D. Common European Framework of Reference for Languages. in The TESOL Encyclopedia of English Language Teaching (eds. Liontas, J. I., International Association, T. & DelliCarpini, M.) 1–7 (Wiley, 2020). doi:10.1002/9781118784235.eelt0114.pub2
-
[42]
Larson, M. G. Analysis of variance. Circulation 117, 115–121 (2008)
work page 2008
-
[43]
Sullivan, L. M. Repeated Measures. Circulation 117, 1238–1243 (2008)
work page 2008
-
[44]
Muhammad, L. N. Guidelines for repeated measures statistical analysis approaches with basic science research considerations. J Clin Invest 133, e171058 (2023)
work page 2023
-
[45]
Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B: Statistical Methodology 57, 289–300 (1995)
work page 1995
-
[46]
Mann, H. B. & Whitney, D. R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Statist. 18, 50–60 (1947)
work page 1947
-
[47]
McKnight, P. E. & Najab, J. Mann‐Whitney U Test. in The Corsini Encyclopedia of Psychology (eds. Weiner, I. B. & Craighead, W. E.) 1–1 (Wiley, 2010). doi:10.1002/9780470479216.corpsy0524
-
[48]
Shapiro, S. S. & Wilk, M. B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika 52, 591 (1965)
work page 1965
-
[49]
A technique for the measurement of attitudes
Likert, A. A technique for the measurement of attitudes. Archives of Psychology 22, 140 (1932)
work page 1932
-
[50]
Ross, A. & Willson, V. L. One-Way Anova. in Basic and Advanced Statistical Tests 21–24 (SensePublishers, Rotterdam, 2017). doi:10.1007/978-94-6351-086-8_5
-
[51]
Hansen, J. H. L. & Hasan, T. Speaker Recognition by Machines and Humans: A tutorial review. IEEE Signal Process. Mag. 32, 74–99 (2015)
work page 2015
-
[52]
Kinnunen, T. & Li, H. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52, 12–40 (2010)
work page 2010
-
[53]
Hochreiter, S. & Schmidhuber, J. Long Short-Term Memory. Neural Computation 9, 1735– 1780 (1997)
work page 1997
-
[54]
Panayotov, V., Chen, G., Povey, D. & Khudanpur, S. Librispeech: An ASR corpus based on public domain audio books. in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5206–5210 (IEEE, South Brisbane, Queensland, Australia, 2015). doi:10.1109/ICASSP.2015.7178964
-
[55]
Wan, L., Wang, Q., Papir, A. & Moreno, I. L. Generalized End-to-End Loss for Speaker Verification. in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4879–4883 (IEEE, Calgary, AB, 2018). doi:10.1109/ICASSP.2018.8462665
-
[56]
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. in Proceedings of the 3rd International Conference for Learning Representations (ICLR) (San Diego, CA, USA, 2015)
work page 2015
- [57]
-
[58]
Prabhavalkar, R., Alvarez, R., Parada, C., Nakkiran, P. & Sainath, T. N. Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks. in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4704–4708 (IEEE, South Brisbane, Queensland, Australia, 2015). doi:10.1109/...
-
[59]
Determining the initial states in forward-backward filtering
Gustafsson, F. Determining the initial states in forward-backward filtering. IEEE Trans. Signal Process. 44, 988–992 (1996)
work page 1996
-
[60]
Chlasta, K., Wołk, K. & Krejtz, I. Automated speech-based screening of depression using deep convolutional neural networks. Procedia Computer Science 164, 618–628 (2019)
work page 2019
-
[61]
Muzammel, M., Salam, H., Hoffmann, Y., Chetouani, M. & Othmani, A. AudVowelConsNet: A phoneme-level based deep CNN architecture for clinical depression diagnosis. Machine Learning with Applications 2, 100005 (2020). 34
work page 2020
-
[62]
Deep residual learning for image recognition,
He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, Las Vegas, NV, USA, 2016). doi:10.1109/CVPR.2016.90
-
[63]
Deng, J. et al. ImageNet: A large-scale hierarchical image database. in 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, Miami, FL, 2009). doi:10.1109/CVPR.2009.5206848
-
[64]
Tayebi Arasteh, S. et al. Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging. Commun Med 4, 46 (2024)
work page 2024
-
[65]
Mehrabian, A. & Ferris, S. R. Inference of attitudes from nonverbal communication in two channels. Journal of Consulting Psychology 31, 248–252 (1967)
work page 1967
-
[66]
Bänziger, T. & Scherer, K. R. The role of intonation in emotional expressions. Speech Communication 46, 252–267 (2005)
work page 2005
-
[67]
Kitzing, P., Maier, A. & Åhlander, V. L. Automatic speech recognition (ASR) and its use as a tool for assessment or therapy of voice, speech, and language disorders. Logopedics Phoniatrics Vocology 34, 91–96 (2009)
work page 2009
-
[68]
Moro-Velazquez, L., Villalba, J. & Dehak, N. Using X-Vectors to Automatically Detect Parkinson’s Disease from Speech. in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1155–1159 (IEEE, Barcelona, Spain, 2020). doi:10.1109/ICASSP40776.2020.9053770
-
[69]
Jamal, N., Shanta, S., Mahmud, F. & Sha’abani, M. Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review. in 020028 (Johor, Malaysia, 2017). doi:10.1063/1.5002046
-
[70]
Picou, E. M., Ricketts, T. A. & Hornsby, B. W. Y. How Hearing Aids, Background Noise, and Visual Cues Influence Objective Listening Effort. Ear & Hearing 34, e52–e64 (2013). 35 Supplementary information Supplementary Note 1 Anonymization method Speech anonymization methods are broadly categorized into two classes: signal processing - based methods and dee...
work page 2013
-
[71]
Spectral conversion: The waveform is converted into Mel -spectrograms or other time - frequency representations
-
[72]
Feature disentanglement : Speaker identity features are extracted and modified or replaced. 37
-
[73]
The VoicePrivacy Challenge4–7 includes several DL-based baseline systems are explained below
Re-synthesis: The modified features are used to synthesize a new waveform via a vocoder or neural synthesizer. The VoicePrivacy Challenge4–7 includes several DL-based baseline systems are explained below. X-vector replacement with neural source-filter synthesis This method 8 anonymizes speech by replacing the original speaker representation with a synthet...
-
[74]
Tayebi Arasteh, S. et al. Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech. Commun Med 4, 182 (2024)
work page 2024
-
[75]
McAdams, S. E. Spectral Fusion, Spectral Parsing and the Formation of Auditory Images. (Ph.D. dissertation, Stanford University, 1984)
work page 1984
-
[76]
Patino, J., Tomashenko, N., Todisco, M., Nautsch, A. & Evans, N. Speaker Anonymisation Using the McAdams Coefficient. in INTERSPEECH 2021 1099–1103 (2021). doi:10.21437/Interspeech.2021-1070
- [77]
-
[78]
Tomashenko, N. et al. The VoicePrivacy 2020 Challenge: Results and findings. Computer Speech & Language 74, 101362 (2022)
work page 2020
-
[79]
Tomashenko, N. et al. Introducing the VoicePrivacy Initiative. in INTERSPEECH 2020 1693–1697 (ISCA, 2020). doi:10.21437/Interspeech.2020-1333. 44
-
[80]
Tomashenko, N. et al. The VoicePrivacy 2024 Challenge Evaluation Plan. Preprint at https://doi.org/10.48550/arXiv.2404.02677 (2024)
-
[81]
Fang, F. et al. Speaker Anonymization Using X-vector and Neural Waveform Models. in 10th ISCA Speech Synthesis Workshop (Vienna, Austria, 2019)
work page 2019
-
[82]
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. & Khudanpur, S. X-Vectors: Robust DNN Embeddings for Speaker Recognition. in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5329–5333 (IEEE, Calgary, AB, 2018). doi:10.1109/ICASSP.2018.8461375
-
[83]
Peddinti, V., Povey, D. & Khudanpur, S. A time delay neural network architecture for efficient modeling of long temporal contexts. in Interspeech 2015 (ISCA, ISCA, 2015). doi:10.21437/interspeech.2015-647
-
[84]
Povey, D. et al. Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks. in Interspeech 2018 (ISCA, ISCA, 2018). doi:10.21437/interspeech.2018-1417
- [85]
-
[86]
Meyer, S. et al. Prosody Is Not Identity: A Speaker Anonymization Approach Using Prosody Cloning. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5 (IEEE, Rhodes Island, Greece, 2023). doi:10.1109/icassp49357.2023.10096607
-
[87]
Meyer, S. et al. Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy. in 2022 IEEE Spoken Language Technology Workshop (SLT) 912–919 (IEEE, Doha, Qatar, 2023). doi:10.1109/SLT54892.2023.10022601
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.