Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials
Pith reviewed 2026-05-09 23:42 UTC · model grok-4.3
The pith
Acoustic metamaterials alter sound waves to anonymize voiceprints before any microphone can capture them.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EchoMask is the first practical physical-layer system for real-time voiceprint anonymization using acoustic metamaterials. By combining frequency-selective interference to disrupt voiceprint features, an acoustic-field model for stability under movement, and reconfigurable structures that produce time-varying interference, the system prevents capture of clean voiceprints through compromised devices. Experiments demonstrate that it raises the miss-match rate above 90 percent while maintaining high speech intelligibility, and the entire solution is low-cost, power-free, and 3D-printable.
What carries the argument
Reconfigurable acoustic metamaterials that generate frequency-selective and time-varying interference patterns to disrupt voiceprint features before sound reaches the microphone.
If this is right
- Voiceprint-based authentication fails against speakers using the metamaterial structures even if all recording devices are compromised.
- No software updates or hardware changes to microphones are needed to achieve anonymization in public or sensitive spaces.
- Low-cost 3D-printed physical barriers can provide biometric protection that works independently of any digital system.
- Speech remains usable for normal communication since the interference preserves intelligibility.
Where Pith is reading between the lines
- The same physical interference approach could be tested on other acoustic biometrics such as emotion detection from voice.
- Reconfigurable metamaterials might be adapted into wearable or portable forms for individual use in varied settings.
- Physical-layer methods like this could reduce dependence on software-only privacy tools that require trusted hardware.
- Combining the structures with everyday objects such as conference tables or podiums might enable widespread passive deployment.
Load-bearing premise
The interference patterns stay stable enough under normal speaker movement and remain unpredictable enough that attackers cannot learn or cancel them.
What would settle it
A demonstration that an attacker using multiple microphones and signal reconstruction can achieve voiceprint matching success rates well above 10 percent on speech processed by EchoMask.
Figures
read the original abstract
Voiceprints are widely used for authentication; however, they are easily captured in public settings and cannot be revoked once leaked. Existing anonymization systems operate inside recording devices, which makes them ineffective when microphones or software are untrusted, as in conference rooms, lecture halls, and interviews. We present EchoMask, the first practical physical-layer system for real-time voiceprint anonymization using acoustic metamaterials. By modifying sound waves before they reach the microphone, EchoMask prevents attackers from capturing clean voiceprints through compromised devices. Our design combines three key innovations: frequency-selective interference to disrupt voiceprint features while preserving speech intelligibility, an acoustic-field model to ensure stability under speaker movement, and reconfigurable structures that create time-varying interference to prevent learning or canceling a fixed acoustic pattern. EchoMask is low-cost, power-free, and 3D-printable, requiring no machine learning, software support, or microphone modification. Experiments conducted across eight microphones in diverse environments demonstrate that EchoMask increases the Miss-match Rate, i.e., the fraction of failed voiceprint matching attempts, to over 90%, while maintaining high speech intelligibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents EchoMask, a physical-layer voiceprint anonymization system using acoustic metamaterials to modify sound waves before they reach the microphone. It claims three innovations: frequency-selective interference that disrupts voiceprint features while preserving intelligibility, an acoustic-field model ensuring stability under speaker movement, and reconfigurable structures generating time-varying interference to prevent attackers from learning or canceling fixed patterns. The system is low-cost, power-free, and 3D-printable with no ML, software, or microphone modifications required. Experiments across eight microphones in diverse environments are reported to achieve over 90% miss-match rate (fraction of failed voiceprint matching attempts) while maintaining high speech intelligibility.
Significance. If the performance claims and underlying assumptions hold, this would constitute a meaningful advance in voice authentication privacy by shifting anonymization to the physical layer, rendering it effective against untrusted or compromised recording devices in public settings. The passive, hardware-only design and lack of dependence on software or ML distinguish it from prior approaches and could enable practical deployment. Strengths include the emphasis on revocability and low cost; however, significance hinges on rigorous validation of movement stability and resistance to adaptive, multi-session attacks.
major comments (3)
- [Experiments] Experiments section: the central claim of >90% miss-match rate is presented without error bars, baseline comparisons to software-based anonymization methods, or explicit data exclusion criteria, preventing assessment of whether the data support the performance assertion across the eight microphones and environments.
- [Design] Acoustic-field model (design section): the model is asserted to ensure stability under speaker movement, but no quantitative error bounds, sensitivity analysis, or validation against actual movement trajectories are provided; this directly bears on whether small positional changes invalidate the frequency-selective interference.
- [Evaluation] Evaluation of reconfigurability: the manuscript provides no evidence that experiments included multi-session recordings or adaptive attacker models (e.g., averaging or retraining across sessions); without this, the claim that time-varying interference prevents pattern learning or cancellation remains untested and load-bearing for the overall security argument.
minor comments (1)
- [Abstract] Abstract: the description of 'high speech intelligibility' would be strengthened by naming the specific metric (e.g., word error rate or MOS) used to quantify it.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater rigor and clarity.
read point-by-point responses
-
Referee: Experiments section: the central claim of >90% miss-match rate is presented without error bars, baseline comparisons to software-based anonymization methods, or explicit data exclusion criteria, preventing assessment of whether the data support the performance assertion across the eight microphones and environments.
Authors: We agree that the presentation of results in the Experiments section can be strengthened with additional statistical details and comparisons. In the revised manuscript, we will add error bars to the reported miss-match rates, include baseline comparisons against representative software-based voiceprint anonymization methods, and explicitly describe the data exclusion criteria applied. These changes will enable a clearer assessment of the results across the eight microphones and environments. revision: yes
-
Referee: Acoustic-field model (design section): the model is asserted to ensure stability under speaker movement, but no quantitative error bounds, sensitivity analysis, or validation against actual movement trajectories are provided; this directly bears on whether small positional changes invalidate the frequency-selective interference.
Authors: We acknowledge the need for quantitative validation of the acoustic-field model's robustness to movement. We will augment the Design section with a sensitivity analysis that includes quantitative error bounds and validation against recorded movement trajectories from our experiments. This will clarify the positional tolerances under which the frequency-selective interference remains effective. revision: yes
-
Referee: Evaluation of reconfigurability: the manuscript provides no evidence that experiments included multi-session recordings or adaptive attacker models (e.g., averaging or retraining across sessions); without this, the claim that time-varying interference prevents pattern learning or cancellation remains untested and load-bearing for the overall security argument.
Authors: We agree that direct evaluation against adaptive, multi-session attacks is important to substantiate the reconfigurability claims. In the revised manuscript, we will add experiments involving multi-session recordings and adaptive attacker models (including averaging and retraining across sessions) to demonstrate that the time-varying interference prevents effective pattern learning or cancellation. revision: yes
Circularity Check
No circularity: claims rest on physical design and empirical measurements, not self-referential derivations
full rationale
The paper describes a hardware system (EchoMask) using acoustic metamaterials for physical-layer anonymization, with three stated innovations (frequency-selective interference, acoustic-field model for movement stability, and reconfigurable time-varying patterns). Success is asserted via experiments measuring miss-match rates (>90%) and intelligibility across eight microphones in varied environments. No equations, predictions, or first-principles derivations are presented in the provided text that reduce by construction to fitted parameters, self-citations, or renamed inputs. The central results are experimental outcomes rather than analytic claims that loop back to their own assumptions. This matches the default expectation of no significant circularity for an engineering/experimental paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The acoustic-field model accurately predicts system stability under speaker movement.
- domain assumption Time-varying interference prevents attackers from learning or canceling a fixed acoustic pattern.
invented entities (1)
-
EchoMask reconfigurable metamaterial structure
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Google speech-to-text ai, https://cloud.google. com/speech-to-text. Last accessed: 2024-8-6
work page 2024
-
[2]
iflytek, https://console.xfyun.cn/services/ivp. Last accessed: 2026-2-01
work page 2026
-
[3]
Huawei, https://consumer.huawei.com/cn/ phones/. Last accessed: 2025-1-20
work page 2025
-
[4]
Ecapa-tdnn, https://github.com/TaoRuijie/ ECAPA-TDNN. Last accessed: 2026-2-01
work page 2026
-
[5]
se electronics v7, https://seelectronics.com/ products/v7/. Last accessed: 2026-2-01
work page 2026
-
[6]
Apple, https://www.apple.com.cn/iphone/. Last accessed: 2026-1-14
work page 2026
-
[7]
audio-technica.com.hk/index.php?op= productdetails&pid=478&lang=eng
Audio-technica at9930, https://www. audio-technica.com.hk/index.php?op= productdetails&pid=478&lang=eng. Last ac- cessed: 2026-2-01
work page 2026
- [8]
-
[9]
Google, https://www.google-mobile.cn/. Last ac- cessed: 2025-1-20
work page 2025
-
[10]
Samsung, https://www.samsung.com/hk/ smartphones/galaxy-s24/. Last accessed: 2026-1- 14
work page 2026
-
[11]
Shure sv200, https://www.shure.com/en-ASIA/ products/microphones/sv200. Last accessed: 2026-2-01
work page 2026
-
[12]
Behringer ta5212, https:// www.sweelee.com.sg/products/ behringer-ta5212-condenser-gooseneck-microphone . Last accessed: 2026-2-01
work page 2026
-
[13]
Mel frequency cepstral coefficient: a re- view.ICIDSSD, 2020
Shalbbya Ali, Safdar Tanweer, Syed Sibtain Khalid, and Naseem Rao. Mel frequency cepstral coefficient: a re- view.ICIDSSD, 2020
work page 2020
-
[14]
Noor Almaadeed, Amar Aggoun, and Abbes Amira. Text-independent speaker identification using vowel for- mants.Journal of Signal Processing Systems, 82(3):345– 356, 2016
work page 2016
-
[15]
Badreddine Assouar, Bin Liang, Ying Wu, Yong Li, Jian- Chun Cheng, and Yun Jing. Acoustic metasurfaces. Nature Reviews Materials, 3(12):460–472, 2018
work page 2018
-
[16]
3-d sound for virtual reality and multimedia
Durand R Begault and Leonard J Trejo. 3-d sound for virtual reality and multimedia. Technical report, 2000
work page 2000
-
[17]
Integrated psy- choacoustic active noise control and masking.Applied Acoustics, 145:339–348, 2019
Valiantsin Belyi and Woon-Seng Gan. Integrated psy- choacoustic active noise control and masking.Applied Acoustics, 145:339–348, 2019
work page 2019
-
[18]
Techniques for high-quality acelp coding of wideband speech
Bruno Bessette, Roch Lefebvre, Redwan Salami, Milan Jelinek, Janne Vainio, J Rotola-Pukkila, Hannu Mikkola, and Kari Järvinen. Techniques for high-quality acelp coding of wideband speech. InINTERSPEECH, pages 1997–2000, 2001
work page 1997
-
[19]
V oice biometrics: Deep learning-based voiceprint authentication system
Andrew Boles and Paul Rad. V oice biometrics: Deep learning-based voiceprint authentication system. In 2017 12th system of systems engineering conference (SoSE), pages 1–6. IEEE, 2017. 14
work page 2017
-
[20]
Pasquale Bottalico and Silvia Murgia. The effect of the frequency and energetic content of broadband noise on the lombard effect and speech intelligibility. InAcous- tics, volume 5, pages 898–908. MDPI, 2023
work page 2023
-
[21]
S Bouketta and Y Bouchahm. Numerical evaluation of urban geometry’s control of wind movements in out- door spaces during winter period. case of mediterranean climate.Renewable Energy, 146:1062–1069, 2020
work page 2020
-
[22]
W Owen Brimijoin, Alan W Boyd, and Michael A Akeroyd. The contribution of head movement to the externalization and internalization of sounds.PloS one, 8(12):e83068, 2013
work page 2013
-
[23]
Songbsab: A dual prevention approach against singing voice conversion based illegal song covers
Guangke Chen and Yedi Zhang. Songbsab: A dual prevention approach against singing voice conversion based illegal song covers. In32nd Annual Network and Distributed System Security Symposium, 2025
work page 2025
-
[24]
Saic: Integration of speech anonymization and identity classification
Ming Cheng, Xingjian Diao, Shitong Cheng, and Wen- jun Liu. Saic: Integration of speech anonymization and identity classification. InAI for Health Equity and Fair- ness: Leveraging AI to Address Social Determinants of Health, pages 295–306. Springer, 2024
work page 2024
-
[25]
Y Cheng, C Zhou, BG Yuan, DJ Wu, Q Wei, and XJ Liu. Ultra-sparse metasurface for high reflection of low- frequency sound based on artificial mie resonances.Na- ture materials, 14(10):1013–1019, 2015
work page 2015
-
[26]
Electromagnetic interference attacks on cyber-physical systems: Theory, demonstra- tion, and defense
Gokcen Yilmaz Dayanikli. Electromagnetic interference attacks on cyber-physical systems: Theory, demonstra- tion, and defense. 2021
work page 2021
-
[27]
Anal- ysis of temperature, air humidity and wind conditions for the needs of outdoor thermal comfort
Ewelina Dec, Bo˙zena Babiarz, and Robert Sekret. Anal- ysis of temperature, air humidity and wind conditions for the needs of outdoor thermal comfort. InE3S Web of Conferences, volume 44, page 00028. EDP Sciences, 2018
work page 2018
-
[28]
Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Du- mouchel, and Pierre Ouellet. Front-end factor analy- sis for speaker verification.IEEE Transactions on Au- dio, Speech, and Language Processing, 19(4):788–798, 2010
work page 2010
-
[29]
Jiangyi Deng, Yanjiao Chen, and Wenyuan Xu. Fencesit- ter: Black-box, content-agnostic, and synchronization- free enrollment-phase attacks on speaker recognition systems. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, page 755–767, New York, NY , USA, 2022. Association for Computing Machinery
work page 2022
-
[30]
{V- Cloak}: Intelligibility-, naturalness-& {Timbre- Preserving}{Real-Time} voice anonymization
Jiangyi Deng, Fei Teng, Yanjiao Chen, Xiaofu Chen, Zhaohui Wang, and Wenyuan Xu. {V- Cloak}: Intelligibility-, naturalness-& {Timbre- Preserving}{Real-Time} voice anonymization. In32nd USENIX Security Symposium (USENIX Security 23), pages 5181–5198, 2023
work page 2023
-
[31]
Your voice assistant is mine: How to abuse speakers to steal information and control your phone
Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang. Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the 4th ACM workshop on security and privacy in smartphones & mobile devices, pages 63–74, 2014
work page 2014
-
[32]
Catherine Diaz-Asper, Lars Ailo Bongo, and Brita Elvevåg. Navigating the tradeoff between personal pri- vacy and data utility in speech anonymization for clinical research.npj Digital Medicine, 8(1):616, 2025
work page 2025
-
[33]
Devel- opment and characterization of a flexible soundproofing metapanel for noise reduction
Jang Dongil, Kang Sanha, Kim Jinyoung, Kim Hyeonghoon, Lee Sinwoo, and Kim Bongjoong. Devel- opment and characterization of a flexible soundproofing metapanel for noise reduction. 2024
work page 2024
-
[34]
Christina Dörfling. Silicon listening. mems, near- ultrasound, and machine listening beyond ai.Sound Studies, 11(2):314–338, 2025
work page 2025
-
[35]
Alexa versus alexa: Controlling smart speak- ers by self-issuing voice commands
Sergio Esposito, Daniele Sgandurra, and Giampaolo Bella. Alexa versus alexa: Controlling smart speak- ers by self-issuing voice commands. InProceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pages 1064–1078, 2022
work page 2022
-
[36]
Privacy leakage on dnns: A survey of model inversion attacks and defenses
Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, Shu-Tao Xia, and Ke Xu. Privacy leakage on dnns: A survey of model inversion attacks and defenses.arXiv preprint arXiv:2402.04013, 2024
-
[37]
Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Qing- ming Li, Tianyu Du, and Shouling Ji. Enkidu: Univer- sal frequential perturbation for real-time audio privacy protection against voice deepfakes. InProceedings of the 33rd ACM International Conference on Multimedia, pages 11638–11647, 2025
work page 2025
-
[38]
Artificial piezoelectric metamaterials.Progress in Materials Science, page 101434, 2025
Ziyan Gao, Yu Lei, Zhanmiao Li, Jikun Yang, Bo Yu, Xi- aoting Yuan, Zewei Hou, Jiawang Hong, and Shuxiang Dong. Artificial piezoelectric metamaterials.Progress in Materials Science, page 101434, 2025
work page 2025
-
[39]
Reza Ghaffarivardavagh, Jacob Nikolajczyk, Stephan Anderson, and Xin Zhang. Ultra-open acoustic metama- terial silencer based on fano-like interference.Physical Review B, 99(2):024302, 2019
work page 2019
-
[40]
Automatic speech recogni- tion (asr) based approach for speech therapy of aphasic patients: A review
Norezmi Jamal, Shahnoor Shanta, Farhanahani Mah- mud, and MNAH Sha’abani. Automatic speech recogni- tion (asr) based approach for speech therapy of aphasic patients: A review. InAIP Conference Proceedings, volume 1883, page 020028. AIP Publishing LLC, 2017. 15
work page 2017
-
[41]
Asr-based speech intelligibility prediction: A review.Hearing Research, 426:108606, 2022
Mahdie Karbasi and Dorothea Kolossa. Asr-based speech intelligibility prediction: A review.Hearing Research, 426:108606, 2022
work page 2022
-
[42]
Vulnera- bility of mems gyroscopes to targeted acoustic attacks
Shadi Khazaaleh, Georgios Korres, Mohammed Eid, Mahmoud Rasras, and Mohammed F Daqaq. Vulnera- bility of mems gyroscopes to targeted acoustic attacks. IEEE Access, 7:89534–89543, 2019
work page 2019
-
[43]
A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis
Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, and Takao Kobayashi. A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), volume 1, pages 161–164. IEEE, 1998
work page 1998
-
[44]
Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, and Takao Kobayashi. A 16 kb/s wideband celp- based speech coder using mel-generalized cepstral anal- ysis.IEICE transactions on information and systems, 83(4):876–883, 2000
work page 2000
-
[45]
Ntu-npu system for voice privacy 2024 challenge.emotion, 1:2
Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, and Eng Siong Chng. Ntu-npu system for voice privacy 2024 challenge.emotion, 1:2
work page 2024
-
[46]
Yunzhong Lei, Jiu Hui Wu, Libo Wang, Yao Huang, and Jiamin Niu. Deep sub-wavelength acoustic transmission enhancement and whisper via the monopole resonance in meta-cavities.Applied Acoustics, 203:109227, 2023
work page 2023
-
[47]
Head-orienting behaviors during simultaneous speech detection and localization
Angkana Lertpoompunya, Erol J Ozmeral, Nathan C Higgins, and David A Eddins. Head-orienting behaviors during simultaneous speech detection and localization. Frontiers in Psychology, 15:1425972, 2024
work page 2024
-
[48]
Practical adversarial attacks against speaker recognition systems
Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. Practical adversarial attacks against speaker recognition systems. InProceedings of the 21st international workshop on mobile computing systems and applications, pages 9–14, 2020
work page 2020
-
[49]
Lv-auth: Lip motion fusion for voiceprint authentica- tion
Wei Liu, Xiaojing Zhu, Qin Liu, Peng Li, and Man Zhou. Lv-auth: Lip motion fusion for voiceprint authentica- tion. InInternational Conference on Wireless Artificial Intelligent Computing Systems and Applications, pages 295–307. Springer, 2024
work page 2024
-
[50]
Joshua S Lloyd, Cole G Ludwikowski, Cyrus Malik, and Chen Shen. Mitigating inaudible ultrasound attacks on voice assistants with acoustic metamaterials.IEEE Access, 11:36464–36470, 2023
work page 2023
-
[51]
Guancong Ma, Xiying Fan, Ping Sheng, and Mathias Fink. Shaping reverberating sound fields with an ac- tively tunable metasurface.Proceedings of the National Academy of Sciences, 115(26):6638–6643, 2018
work page 2018
-
[52]
Guancong Ma and Ping Sheng. Acoustic metamateri- als: From local resonances to broad horizons.Science advances, 2(2):e1501595, 2016
work page 2016
-
[53]
Mouth sounds: A review of acoustic applications and methodologies.Applied Sciences, 13(7):4331, 2023
Norberto E Naal-Ruiz, Erick A Gonzalez-Rodriguez, Gustavo Navas-Reascos, Rebeca Romo-De Leon, Ale- jandro Solorio, Luz M Alonso-Valerdi, and David I Ibarra-Zarate. Mouth sounds: A review of acoustic applications and methodologies.Applied Sciences, 13(7):4331, 2023
work page 2023
-
[54]
Mobile 3d augmented-reality system for ultrasound ap- plications
Cameron Lowell Palmer, Bjørn Olav Haugen, Eva Teg- nander, Sturla H Eik-Nes, Hans Torp, and Gabriel Kiss. Mobile 3d augmented-reality system for ultrasound ap- plications. In2015 IEEE International Ultrasonics Sym- posium (IUS), pages 1–4. IEEE, 2015
work page 2015
-
[55]
Speaker identity and voice qual- ity: Modeling human responses and automatic speaker recognition
Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patri- cia A Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, and Abeer Alwan. Speaker identity and voice qual- ity: Modeling human responses and automatic speaker recognition. InInterspeech, pages 1044–1048, 2016
work page 2016
-
[56]
Karla Pizzi, Franziska Boenisch, Ugur Sahin, and Kon- stantin Böttinger. Introducing model inversion at- tacks on automatic speaker recognition.arXiv preprint arXiv:2301.03206, 2023
-
[57]
Real-time psychoacoustic frequency masking compensation for audio signals with overlap- ping spectra
Giorgio Presti, Nicola Degiorgi, Amedeo Fresia, Anto- nio Servetti, et al. Real-time psychoacoustic frequency masking compensation for audio signals with overlap- ping spectra. InProceedings of the 21st Sound and Music Computing Conference, pages 439–444. SMC, 2024
work page 2024
-
[58]
Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, and Xiang-Yang Li. Speech sanitizer: Speech content desensitization and voice anonymiza- tion.IEEE Transactions on Dependable and Secure Computing, 18(6):2631–2642, 2019
work page 2019
-
[59]
Rozeha A Rashid, Nur Hija Mahalin, Mohd Adib Sari- jari, and Ahmad Aizuddin Abdul Aziz. Security system using biometric technology: Design and implementation of voice recognition system (vrs). In2008 international conference on computer and communication engineer- ing, pages 898–902. IEEE, 2008
work page 2008
-
[60]
Zhiwen Ren, Yuehang Cheng, Mingji Chen, Xujin Yuan, and Daining Fang. A compact multifunctional metas- tructure for low-frequency broadband sound absorption and crash energy dissipation.Materials & Design, 215:110462, 2022
work page 2022
-
[61]
Douglas A Reynolds, Thomas F Quatieri, and Robert B Dunn. Speaker verification using adapted gaussian mix- ture models.Digital signal processing, 10(1-3):19–41, 2000. 16
work page 2000
-
[62]
A 13.0 kbit/s wideband speech codec based on sb-acelp
Jürgen Schnitzler. A 13.0 kbit/s wideband speech codec based on sb-acelp. InProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Sig- nal Processing, ICASSP’98 (Cat. No. 98CH36181), vol- ume 1, pages 157–160. IEEE, 1998
work page 1998
-
[63]
Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems
Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. InProceedings of the 36th An- nual Computer Security Applications Conference, pages 843–855, 2020
work page 2020
-
[64]
Yi Shen, Monica L Folkerts, and Virgina M Richards. Head movements while recognizing speech arriving from behind.The Journal of the Acoustical Society of America, 141(2):EL108–EL114, 2017
work page 2017
-
[65]
Jiahao Shi and AH Akbarzadeh. Architected cellular piezoelectric metamaterials: Thermo-electro- mechanical properties.Acta Materialia, 163:91–121, 2019
work page 2019
-
[66]
X-vectors: Ro- bust dnn embeddings for speaker recognition
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. X-vectors: Ro- bust dnn embeddings for speaker recognition. In2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5329–5333. IEEE, 2018
work page 2018
-
[67]
Robert J Summers, Peter J Bailey, and Brian Roberts. Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.Jour- nal of the Association for Research in Otolaryngology, 13(2):269–280, 2012
work page 2012
-
[68]
Soroosh Tayebi Arasteh, Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Tobias Weise, Kai Pack- häuser, Maria Schuster, Elmar Noeth, Andreas Maier, and Seung Hee Yang. Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech.Communications Medicine, 4(1):182, 2024
work page 2024
-
[69]
Soroosh Tayebi Arasteh, Mahshad Lotfinia, Paula An- drea Perez-Toro, Tomas Arias-Vergara, Mahtab Ranji, Juan Rafael Orozco-Arroyave, Maria Schuster, Andreas Maier, and Seung Hee Yang. Differential privacy en- ables fair and accurate ai-based analysis of speech dis- orders while protecting patient data.npj Artificial Intel- ligence, 1(1):37, 2025
work page 2025
-
[70]
From one stolen utterance: Assessing the risks of voice cloning in the aigc era
Kun Wang, Meng Chen, Li Lu, Jingwen Feng, Qianniu Chen, Zhongjie Ba, Kui Ren, and Chun Chen. From one stolen utterance: Assessing the risks of voice cloning in the aigc era. In2025 IEEE Symposium on Security and Privacy (SP), pages 4663–4681. IEEE, 2025
work page 2025
-
[71]
Ning Wang, PC Ching, Nengheng Zheng, and Tan Lee. Robust speaker recognition using denoised vocal source and vocal tract features.IEEE transactions on audio, speech, and language processing, 19(1):196–205, 2010
work page 2010
-
[72]
Yang Wang, Xiaoyu Wang, Huanyu Dong, Lele Ma, Yue Fu, and Lingxing Zhang. Spider web-inspired acoustic metamaterials with multi-band gaps for low-frequency elastic wave propagation control.Journal of Physics D: Applied Physics, 2025
work page 2025
-
[73]
Vsmask: Defending against voice synthesis attack via real-time predictive perturbation
Yuanda Wang, Hanqing Guo, Guangjing Wang, Bocheng Chen, and Qiben Yan. Vsmask: Defending against voice synthesis attack via real-time predictive perturbation. In Proceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks, pages 239–250, 2023
work page 2023
-
[74]
Micpro: Microphone-based voice privacy protection
Shilin Xiao, Xiaoyu Ji, Chen Yan, Zhicong Zheng, and Wenyuan Xu. Micpro: Microphone-based voice privacy protection. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1302–1316, 2023
work page 2023
-
[75]
Real-time, universal, and robust adversarial attacks against speaker recognition systems
Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, and Bo Yuan. Real-time, universal, and robust adversarial attacks against speaker recognition systems. InICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 1738–1742. IEEE, 2020
work page 2020
-
[76]
EASY: Emotion-aware Speaker Anonymization via Fac- torized Distillation
Jixun Yao, Hexin Liu, Eng Siong Chng, and Lei Xie. EASY: Emotion-aware Speaker Anonymization via Fac- torized Distillation. InInterspeech 2025, pages 3219– 3223, 2025
work page 2025
-
[77]
Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, and Lei Xie. Musa: Multi- lingual speaker anonymization via serial disentangle- ment.IEEE Transactions on Audio, Speech and Lan- guage Processing, 2025
work page 2025
-
[78]
Jin Zhang, Wei Rui, Chengrong Ma, Ying Cheng, Xi- aojun Liu, and Johan Christensen. Remote whisper- ing metamaterial for non-radiative transceiving of ultra- weak sound.Nature Communications, 12(1):3670, 2021
work page 2021
-
[79]
Fang Zheng, LT Li, and Hui Zhang. V oiceprint recogni- tion technology and its application status.Research on information security, 2(1):44–57, 2016
work page 2016
-
[80]
V oiceprint-biometric template design and authentication based on cloud computing security
Hua-Hong Zhu, Qian-Hua He, Hong Tang, and Wei- Hua Cao. V oiceprint-biometric template design and authentication based on cloud computing security. In 2011 International Conference on Cloud and Service Computing, pages 302–308. IEEE, 2011. 17
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.