pith. sign in

arxiv: 2604.20116 · v1 · submitted 2026-04-22 · 💻 cs.SD

Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials

Pith reviewed 2026-05-09 23:42 UTC · model grok-4.3

classification 💻 cs.SD
keywords voiceprint anonymizationacoustic metamaterialsphysical-layer securityspeech privacybiometric protectionreal-time anonymization3D-printable devicessound wave interference
0
0 comments X

The pith

Acoustic metamaterials alter sound waves to anonymize voiceprints before any microphone can capture them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EchoMask as a physical system that disrupts voice biometrics at the source rather than relying on software inside recording devices. It places 3D-printable metamaterial structures near the speaker to generate interference that targets the acoustic features used for voiceprint matching while leaving ordinary speech understandable. The design includes an acoustic model to handle speaker movement and reconfigurable elements that vary the interference over time so attackers cannot easily learn or cancel a fixed pattern. Tests across eight microphones in different environments show the system raises the rate of failed voiceprint matches above 90 percent. Because the approach requires no power, software, machine learning, or device modifications, it offers protection even when microphones or recording software cannot be trusted.

Core claim

EchoMask is the first practical physical-layer system for real-time voiceprint anonymization using acoustic metamaterials. By combining frequency-selective interference to disrupt voiceprint features, an acoustic-field model for stability under movement, and reconfigurable structures that produce time-varying interference, the system prevents capture of clean voiceprints through compromised devices. Experiments demonstrate that it raises the miss-match rate above 90 percent while maintaining high speech intelligibility, and the entire solution is low-cost, power-free, and 3D-printable.

What carries the argument

Reconfigurable acoustic metamaterials that generate frequency-selective and time-varying interference patterns to disrupt voiceprint features before sound reaches the microphone.

If this is right

  • Voiceprint-based authentication fails against speakers using the metamaterial structures even if all recording devices are compromised.
  • No software updates or hardware changes to microphones are needed to achieve anonymization in public or sensitive spaces.
  • Low-cost 3D-printed physical barriers can provide biometric protection that works independently of any digital system.
  • Speech remains usable for normal communication since the interference preserves intelligibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same physical interference approach could be tested on other acoustic biometrics such as emotion detection from voice.
  • Reconfigurable metamaterials might be adapted into wearable or portable forms for individual use in varied settings.
  • Physical-layer methods like this could reduce dependence on software-only privacy tools that require trusted hardware.
  • Combining the structures with everyday objects such as conference tables or podiums might enable widespread passive deployment.

Load-bearing premise

The interference patterns stay stable enough under normal speaker movement and remain unpredictable enough that attackers cannot learn or cancel them.

What would settle it

A demonstration that an attacker using multiple microphones and signal reconstruction can achieve voiceprint matching success rates well above 10 percent on speech processed by EchoMask.

Figures

Figures reproduced from arXiv: 2604.20116 by Xiaojiang Chen, Zhanyong Tang, Zheng Wang, Zhiyuan Ning.

Figure 1
Figure 1. Figure 1: A deployment scenario of ECHOMASK for voiceprint anonymization. placing a passive metamaterial outside the microphone, we distort identity-bearing components of speech before capture while preserving intelligibility. The compact, power-free de￾sign can be attached to off-the-shelf microphones, making it suitable for public and shared recording environments. However, utilizing metamaterials for voiceprint a… view at source ↗
Figure 2
Figure 2. Figure 2: Prototypes for typical microphones (a) and mobile [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Anonymization acoustic field model and (b) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Gains at -90°to 90° with M1 only. (b) Top: Gains at 0°to 90° (M1, M2); Bottom: Gains at -90°to 0° (M1, M2). (c) Gains at -90°to 90° with M1, M2, and M3. u1 u2 u1 Randomized interference structure Driven by u1 variation (a) 515 520 525 530 535 540 545 Frequence (Hz) 72 74 76 78 80 8 mm 9 mm 10 mm 11 mm 12 mm 13 mm 14 mm 15 mm 16 mm 518Hz 523Hz 526Hz 529Hz 532Hz 535Hz 537Hz 521Hz 540Hz (b) [PITH_FULL_IM… view at source ↗
Figure 7
Figure 7. Figure 7: Functional variants of system: compatible with con [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: Impact of different devices on anonymization per [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Impact of speaker differences on anonymization [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Impact of volume on anonymization performance [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 14
Figure 14. Figure 14: (a) Anonymization efficiency of ECHOMASK and (b) the subjective auditory of the anonymized audio. 5.1.2 Efficiency and perceptual quality Processing efficiency. Real-time performance is essential for deployment in live settings such as talks, meetings, and online conferences. We evaluate efficiency using the Real￾time Coefficient (RTC) [30, 74] on audio samples of varying lengths and content. Because ECHO… view at source ↗
Figure 15
Figure 15. Figure 15: Impact of the system on ASR performance (cover [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: (a) Anonymization performance without the dy [PITH_FULL_IMAGE:figures/full_fig_p012_16.png] view at source ↗
Figure 18
Figure 18. Figure 18: (a) Anonymization under varying noise levels, (b) [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Impact of different wind speeds on E [PITH_FULL_IMAGE:figures/full_fig_p012_19.png] view at source ↗
read the original abstract

Voiceprints are widely used for authentication; however, they are easily captured in public settings and cannot be revoked once leaked. Existing anonymization systems operate inside recording devices, which makes them ineffective when microphones or software are untrusted, as in conference rooms, lecture halls, and interviews. We present EchoMask, the first practical physical-layer system for real-time voiceprint anonymization using acoustic metamaterials. By modifying sound waves before they reach the microphone, EchoMask prevents attackers from capturing clean voiceprints through compromised devices. Our design combines three key innovations: frequency-selective interference to disrupt voiceprint features while preserving speech intelligibility, an acoustic-field model to ensure stability under speaker movement, and reconfigurable structures that create time-varying interference to prevent learning or canceling a fixed acoustic pattern. EchoMask is low-cost, power-free, and 3D-printable, requiring no machine learning, software support, or microphone modification. Experiments conducted across eight microphones in diverse environments demonstrate that EchoMask increases the Miss-match Rate, i.e., the fraction of failed voiceprint matching attempts, to over 90%, while maintaining high speech intelligibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript presents EchoMask, a physical-layer voiceprint anonymization system using acoustic metamaterials to modify sound waves before they reach the microphone. It claims three innovations: frequency-selective interference that disrupts voiceprint features while preserving intelligibility, an acoustic-field model ensuring stability under speaker movement, and reconfigurable structures generating time-varying interference to prevent attackers from learning or canceling fixed patterns. The system is low-cost, power-free, and 3D-printable with no ML, software, or microphone modifications required. Experiments across eight microphones in diverse environments are reported to achieve over 90% miss-match rate (fraction of failed voiceprint matching attempts) while maintaining high speech intelligibility.

Significance. If the performance claims and underlying assumptions hold, this would constitute a meaningful advance in voice authentication privacy by shifting anonymization to the physical layer, rendering it effective against untrusted or compromised recording devices in public settings. The passive, hardware-only design and lack of dependence on software or ML distinguish it from prior approaches and could enable practical deployment. Strengths include the emphasis on revocability and low cost; however, significance hinges on rigorous validation of movement stability and resistance to adaptive, multi-session attacks.

major comments (3)
  1. [Experiments] Experiments section: the central claim of >90% miss-match rate is presented without error bars, baseline comparisons to software-based anonymization methods, or explicit data exclusion criteria, preventing assessment of whether the data support the performance assertion across the eight microphones and environments.
  2. [Design] Acoustic-field model (design section): the model is asserted to ensure stability under speaker movement, but no quantitative error bounds, sensitivity analysis, or validation against actual movement trajectories are provided; this directly bears on whether small positional changes invalidate the frequency-selective interference.
  3. [Evaluation] Evaluation of reconfigurability: the manuscript provides no evidence that experiments included multi-session recordings or adaptive attacker models (e.g., averaging or retraining across sessions); without this, the claim that time-varying interference prevents pattern learning or cancellation remains untested and load-bearing for the overall security argument.
minor comments (1)
  1. [Abstract] Abstract: the description of 'high speech intelligibility' would be strengthened by naming the specific metric (e.g., word error rate or MOS) used to quantify it.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater rigor and clarity.

read point-by-point responses
  1. Referee: Experiments section: the central claim of >90% miss-match rate is presented without error bars, baseline comparisons to software-based anonymization methods, or explicit data exclusion criteria, preventing assessment of whether the data support the performance assertion across the eight microphones and environments.

    Authors: We agree that the presentation of results in the Experiments section can be strengthened with additional statistical details and comparisons. In the revised manuscript, we will add error bars to the reported miss-match rates, include baseline comparisons against representative software-based voiceprint anonymization methods, and explicitly describe the data exclusion criteria applied. These changes will enable a clearer assessment of the results across the eight microphones and environments. revision: yes

  2. Referee: Acoustic-field model (design section): the model is asserted to ensure stability under speaker movement, but no quantitative error bounds, sensitivity analysis, or validation against actual movement trajectories are provided; this directly bears on whether small positional changes invalidate the frequency-selective interference.

    Authors: We acknowledge the need for quantitative validation of the acoustic-field model's robustness to movement. We will augment the Design section with a sensitivity analysis that includes quantitative error bounds and validation against recorded movement trajectories from our experiments. This will clarify the positional tolerances under which the frequency-selective interference remains effective. revision: yes

  3. Referee: Evaluation of reconfigurability: the manuscript provides no evidence that experiments included multi-session recordings or adaptive attacker models (e.g., averaging or retraining across sessions); without this, the claim that time-varying interference prevents pattern learning or cancellation remains untested and load-bearing for the overall security argument.

    Authors: We agree that direct evaluation against adaptive, multi-session attacks is important to substantiate the reconfigurability claims. In the revised manuscript, we will add experiments involving multi-session recordings and adaptive attacker models (including averaging and retraining across sessions) to demonstrate that the time-varying interference prevents effective pattern learning or cancellation. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on physical design and empirical measurements, not self-referential derivations

full rationale

The paper describes a hardware system (EchoMask) using acoustic metamaterials for physical-layer anonymization, with three stated innovations (frequency-selective interference, acoustic-field model for movement stability, and reconfigurable time-varying patterns). Success is asserted via experiments measuring miss-match rates (>90%) and intelligibility across eight microphones in varied environments. No equations, predictions, or first-principles derivations are presented in the provided text that reduce by construction to fitted parameters, self-citations, or renamed inputs. The central results are experimental outcomes rather than analytic claims that loop back to their own assumptions. This matches the default expectation of no significant circularity for an engineering/experimental paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on domain assumptions about acoustic propagation and attacker limitations rather than explicit free parameters or new physical entities beyond the designed metamaterial structure.

axioms (2)
  • domain assumption The acoustic-field model accurately predicts system stability under speaker movement.
    Invoked to guarantee performance when the speaker is not stationary.
  • domain assumption Time-varying interference prevents attackers from learning or canceling a fixed acoustic pattern.
    Required for the security argument against adaptive adversaries.
invented entities (1)
  • EchoMask reconfigurable metamaterial structure no independent evidence
    purpose: To produce frequency-selective and time-varying acoustic interference that disrupts voiceprint features.
    New hardware design introduced by the paper; no independent external evidence provided in the abstract.

pith-pipeline@v0.9.0 · 5504 in / 1373 out tokens · 41506 ms · 2026-05-09T23:42:46.975879+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages

  1. [1]

    com/speech-to-text

    Google speech-to-text ai, https://cloud.google. com/speech-to-text. Last accessed: 2024-8-6

  2. [2]

    Last accessed: 2026-2-01

    iflytek, https://console.xfyun.cn/services/ivp. Last accessed: 2026-2-01

  3. [3]

    Last accessed: 2025-1-20

    Huawei, https://consumer.huawei.com/cn/ phones/. Last accessed: 2025-1-20

  4. [4]

    Last accessed: 2026-2-01

    Ecapa-tdnn, https://github.com/TaoRuijie/ ECAPA-TDNN. Last accessed: 2026-2-01

  5. [5]

    Last accessed: 2026-2-01

    se electronics v7, https://seelectronics.com/ products/v7/. Last accessed: 2026-2-01

  6. [6]

    Last accessed: 2026-1-14

    Apple, https://www.apple.com.cn/iphone/. Last accessed: 2026-1-14

  7. [7]

    audio-technica.com.hk/index.php?op= productdetails&pid=478&lang=eng

    Audio-technica at9930, https://www. audio-technica.com.hk/index.php?op= productdetails&pid=478&lang=eng. Last ac- cessed: 2026-2-01

  8. [8]

    Last accessed: 2025-1-20

    Comsol, https://www.comsol.com/. Last accessed: 2025-1-20

  9. [9]

    Last ac- cessed: 2025-1-20

    Google, https://www.google-mobile.cn/. Last ac- cessed: 2025-1-20

  10. [10]

    Last accessed: 2026-1- 14

    Samsung, https://www.samsung.com/hk/ smartphones/galaxy-s24/. Last accessed: 2026-1- 14

  11. [11]

    Last accessed: 2026-2-01

    Shure sv200, https://www.shure.com/en-ASIA/ products/microphones/sv200. Last accessed: 2026-2-01

  12. [12]

    Last accessed: 2026-2-01

    Behringer ta5212, https:// www.sweelee.com.sg/products/ behringer-ta5212-condenser-gooseneck-microphone . Last accessed: 2026-2-01

  13. [13]

    Mel frequency cepstral coefficient: a re- view.ICIDSSD, 2020

    Shalbbya Ali, Safdar Tanweer, Syed Sibtain Khalid, and Naseem Rao. Mel frequency cepstral coefficient: a re- view.ICIDSSD, 2020

  14. [14]

    Text-independent speaker identification using vowel for- mants.Journal of Signal Processing Systems, 82(3):345– 356, 2016

    Noor Almaadeed, Amar Aggoun, and Abbes Amira. Text-independent speaker identification using vowel for- mants.Journal of Signal Processing Systems, 82(3):345– 356, 2016

  15. [15]

    Acoustic metasurfaces

    Badreddine Assouar, Bin Liang, Ying Wu, Yong Li, Jian- Chun Cheng, and Yun Jing. Acoustic metasurfaces. Nature Reviews Materials, 3(12):460–472, 2018

  16. [16]

    3-d sound for virtual reality and multimedia

    Durand R Begault and Leonard J Trejo. 3-d sound for virtual reality and multimedia. Technical report, 2000

  17. [17]

    Integrated psy- choacoustic active noise control and masking.Applied Acoustics, 145:339–348, 2019

    Valiantsin Belyi and Woon-Seng Gan. Integrated psy- choacoustic active noise control and masking.Applied Acoustics, 145:339–348, 2019

  18. [18]

    Techniques for high-quality acelp coding of wideband speech

    Bruno Bessette, Roch Lefebvre, Redwan Salami, Milan Jelinek, Janne Vainio, J Rotola-Pukkila, Hannu Mikkola, and Kari Järvinen. Techniques for high-quality acelp coding of wideband speech. InINTERSPEECH, pages 1997–2000, 2001

  19. [19]

    V oice biometrics: Deep learning-based voiceprint authentication system

    Andrew Boles and Paul Rad. V oice biometrics: Deep learning-based voiceprint authentication system. In 2017 12th system of systems engineering conference (SoSE), pages 1–6. IEEE, 2017. 14

  20. [20]

    The effect of the frequency and energetic content of broadband noise on the lombard effect and speech intelligibility

    Pasquale Bottalico and Silvia Murgia. The effect of the frequency and energetic content of broadband noise on the lombard effect and speech intelligibility. InAcous- tics, volume 5, pages 898–908. MDPI, 2023

  21. [21]

    Numerical evaluation of urban geometry’s control of wind movements in out- door spaces during winter period

    S Bouketta and Y Bouchahm. Numerical evaluation of urban geometry’s control of wind movements in out- door spaces during winter period. case of mediterranean climate.Renewable Energy, 146:1062–1069, 2020

  22. [22]

    The contribution of head movement to the externalization and internalization of sounds.PloS one, 8(12):e83068, 2013

    W Owen Brimijoin, Alan W Boyd, and Michael A Akeroyd. The contribution of head movement to the externalization and internalization of sounds.PloS one, 8(12):e83068, 2013

  23. [23]

    Songbsab: A dual prevention approach against singing voice conversion based illegal song covers

    Guangke Chen and Yedi Zhang. Songbsab: A dual prevention approach against singing voice conversion based illegal song covers. In32nd Annual Network and Distributed System Security Symposium, 2025

  24. [24]

    Saic: Integration of speech anonymization and identity classification

    Ming Cheng, Xingjian Diao, Shitong Cheng, and Wen- jun Liu. Saic: Integration of speech anonymization and identity classification. InAI for Health Equity and Fair- ness: Leveraging AI to Address Social Determinants of Health, pages 295–306. Springer, 2024

  25. [25]

    Ultra-sparse metasurface for high reflection of low- frequency sound based on artificial mie resonances.Na- ture materials, 14(10):1013–1019, 2015

    Y Cheng, C Zhou, BG Yuan, DJ Wu, Q Wei, and XJ Liu. Ultra-sparse metasurface for high reflection of low- frequency sound based on artificial mie resonances.Na- ture materials, 14(10):1013–1019, 2015

  26. [26]

    Electromagnetic interference attacks on cyber-physical systems: Theory, demonstra- tion, and defense

    Gokcen Yilmaz Dayanikli. Electromagnetic interference attacks on cyber-physical systems: Theory, demonstra- tion, and defense. 2021

  27. [27]

    Anal- ysis of temperature, air humidity and wind conditions for the needs of outdoor thermal comfort

    Ewelina Dec, Bo˙zena Babiarz, and Robert Sekret. Anal- ysis of temperature, air humidity and wind conditions for the needs of outdoor thermal comfort. InE3S Web of Conferences, volume 44, page 00028. EDP Sciences, 2018

  28. [28]

    Front-end factor analy- sis for speaker verification.IEEE Transactions on Au- dio, Speech, and Language Processing, 19(4):788–798, 2010

    Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Du- mouchel, and Pierre Ouellet. Front-end factor analy- sis for speaker verification.IEEE Transactions on Au- dio, Speech, and Language Processing, 19(4):788–798, 2010

  29. [29]

    Fencesit- ter: Black-box, content-agnostic, and synchronization- free enrollment-phase attacks on speaker recognition systems

    Jiangyi Deng, Yanjiao Chen, and Wenyuan Xu. Fencesit- ter: Black-box, content-agnostic, and synchronization- free enrollment-phase attacks on speaker recognition systems. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, page 755–767, New York, NY , USA, 2022. Association for Computing Machinery

  30. [30]

    {V- Cloak}: Intelligibility-, naturalness-& {Timbre- Preserving}{Real-Time} voice anonymization

    Jiangyi Deng, Fei Teng, Yanjiao Chen, Xiaofu Chen, Zhaohui Wang, and Wenyuan Xu. {V- Cloak}: Intelligibility-, naturalness-& {Timbre- Preserving}{Real-Time} voice anonymization. In32nd USENIX Security Symposium (USENIX Security 23), pages 5181–5198, 2023

  31. [31]

    Your voice assistant is mine: How to abuse speakers to steal information and control your phone

    Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang. Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the 4th ACM workshop on security and privacy in smartphones & mobile devices, pages 63–74, 2014

  32. [32]

    Navigating the tradeoff between personal pri- vacy and data utility in speech anonymization for clinical research.npj Digital Medicine, 8(1):616, 2025

    Catherine Diaz-Asper, Lars Ailo Bongo, and Brita Elvevåg. Navigating the tradeoff between personal pri- vacy and data utility in speech anonymization for clinical research.npj Digital Medicine, 8(1):616, 2025

  33. [33]

    Devel- opment and characterization of a flexible soundproofing metapanel for noise reduction

    Jang Dongil, Kang Sanha, Kim Jinyoung, Kim Hyeonghoon, Lee Sinwoo, and Kim Bongjoong. Devel- opment and characterization of a flexible soundproofing metapanel for noise reduction. 2024

  34. [34]

    Silicon listening

    Christina Dörfling. Silicon listening. mems, near- ultrasound, and machine listening beyond ai.Sound Studies, 11(2):314–338, 2025

  35. [35]

    Alexa versus alexa: Controlling smart speak- ers by self-issuing voice commands

    Sergio Esposito, Daniele Sgandurra, and Giampaolo Bella. Alexa versus alexa: Controlling smart speak- ers by self-issuing voice commands. InProceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pages 1064–1078, 2022

  36. [36]

    Privacy leakage on dnns: A survey of model inversion attacks and defenses

    Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, Shu-Tao Xia, and Ke Xu. Privacy leakage on dnns: A survey of model inversion attacks and defenses.arXiv preprint arXiv:2402.04013, 2024

  37. [37]

    Enkidu: Univer- sal frequential perturbation for real-time audio privacy protection against voice deepfakes

    Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Qing- ming Li, Tianyu Du, and Shouling Ji. Enkidu: Univer- sal frequential perturbation for real-time audio privacy protection against voice deepfakes. InProceedings of the 33rd ACM International Conference on Multimedia, pages 11638–11647, 2025

  38. [38]

    Artificial piezoelectric metamaterials.Progress in Materials Science, page 101434, 2025

    Ziyan Gao, Yu Lei, Zhanmiao Li, Jikun Yang, Bo Yu, Xi- aoting Yuan, Zewei Hou, Jiawang Hong, and Shuxiang Dong. Artificial piezoelectric metamaterials.Progress in Materials Science, page 101434, 2025

  39. [39]

    Ultra-open acoustic metama- terial silencer based on fano-like interference.Physical Review B, 99(2):024302, 2019

    Reza Ghaffarivardavagh, Jacob Nikolajczyk, Stephan Anderson, and Xin Zhang. Ultra-open acoustic metama- terial silencer based on fano-like interference.Physical Review B, 99(2):024302, 2019

  40. [40]

    Automatic speech recogni- tion (asr) based approach for speech therapy of aphasic patients: A review

    Norezmi Jamal, Shahnoor Shanta, Farhanahani Mah- mud, and MNAH Sha’abani. Automatic speech recogni- tion (asr) based approach for speech therapy of aphasic patients: A review. InAIP Conference Proceedings, volume 1883, page 020028. AIP Publishing LLC, 2017. 15

  41. [41]

    Asr-based speech intelligibility prediction: A review.Hearing Research, 426:108606, 2022

    Mahdie Karbasi and Dorothea Kolossa. Asr-based speech intelligibility prediction: A review.Hearing Research, 426:108606, 2022

  42. [42]

    Vulnera- bility of mems gyroscopes to targeted acoustic attacks

    Shadi Khazaaleh, Georgios Korres, Mohammed Eid, Mahmoud Rasras, and Mohammed F Daqaq. Vulnera- bility of mems gyroscopes to targeted acoustic attacks. IEEE Access, 7:89534–89543, 2019

  43. [43]

    A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis

    Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, and Takao Kobayashi. A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), volume 1, pages 161–164. IEEE, 1998

  44. [44]

    A 16 kb/s wideband celp- based speech coder using mel-generalized cepstral anal- ysis.IEICE transactions on information and systems, 83(4):876–883, 2000

    Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, and Takao Kobayashi. A 16 kb/s wideband celp- based speech coder using mel-generalized cepstral anal- ysis.IEICE transactions on information and systems, 83(4):876–883, 2000

  45. [45]

    Ntu-npu system for voice privacy 2024 challenge.emotion, 1:2

    Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, and Eng Siong Chng. Ntu-npu system for voice privacy 2024 challenge.emotion, 1:2

  46. [46]

    Deep sub-wavelength acoustic transmission enhancement and whisper via the monopole resonance in meta-cavities.Applied Acoustics, 203:109227, 2023

    Yunzhong Lei, Jiu Hui Wu, Libo Wang, Yao Huang, and Jiamin Niu. Deep sub-wavelength acoustic transmission enhancement and whisper via the monopole resonance in meta-cavities.Applied Acoustics, 203:109227, 2023

  47. [47]

    Head-orienting behaviors during simultaneous speech detection and localization

    Angkana Lertpoompunya, Erol J Ozmeral, Nathan C Higgins, and David A Eddins. Head-orienting behaviors during simultaneous speech detection and localization. Frontiers in Psychology, 15:1425972, 2024

  48. [48]

    Practical adversarial attacks against speaker recognition systems

    Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. Practical adversarial attacks against speaker recognition systems. InProceedings of the 21st international workshop on mobile computing systems and applications, pages 9–14, 2020

  49. [49]

    Lv-auth: Lip motion fusion for voiceprint authentica- tion

    Wei Liu, Xiaojing Zhu, Qin Liu, Peng Li, and Man Zhou. Lv-auth: Lip motion fusion for voiceprint authentica- tion. InInternational Conference on Wireless Artificial Intelligent Computing Systems and Applications, pages 295–307. Springer, 2024

  50. [50]

    Mitigating inaudible ultrasound attacks on voice assistants with acoustic metamaterials.IEEE Access, 11:36464–36470, 2023

    Joshua S Lloyd, Cole G Ludwikowski, Cyrus Malik, and Chen Shen. Mitigating inaudible ultrasound attacks on voice assistants with acoustic metamaterials.IEEE Access, 11:36464–36470, 2023

  51. [51]

    Shaping reverberating sound fields with an ac- tively tunable metasurface.Proceedings of the National Academy of Sciences, 115(26):6638–6643, 2018

    Guancong Ma, Xiying Fan, Ping Sheng, and Mathias Fink. Shaping reverberating sound fields with an ac- tively tunable metasurface.Proceedings of the National Academy of Sciences, 115(26):6638–6643, 2018

  52. [52]

    Acoustic metamateri- als: From local resonances to broad horizons.Science advances, 2(2):e1501595, 2016

    Guancong Ma and Ping Sheng. Acoustic metamateri- als: From local resonances to broad horizons.Science advances, 2(2):e1501595, 2016

  53. [53]

    Mouth sounds: A review of acoustic applications and methodologies.Applied Sciences, 13(7):4331, 2023

    Norberto E Naal-Ruiz, Erick A Gonzalez-Rodriguez, Gustavo Navas-Reascos, Rebeca Romo-De Leon, Ale- jandro Solorio, Luz M Alonso-Valerdi, and David I Ibarra-Zarate. Mouth sounds: A review of acoustic applications and methodologies.Applied Sciences, 13(7):4331, 2023

  54. [54]

    Mobile 3d augmented-reality system for ultrasound ap- plications

    Cameron Lowell Palmer, Bjørn Olav Haugen, Eva Teg- nander, Sturla H Eik-Nes, Hans Torp, and Gabriel Kiss. Mobile 3d augmented-reality system for ultrasound ap- plications. In2015 IEEE International Ultrasonics Sym- posium (IUS), pages 1–4. IEEE, 2015

  55. [55]

    Speaker identity and voice qual- ity: Modeling human responses and automatic speaker recognition

    Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patri- cia A Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, and Abeer Alwan. Speaker identity and voice qual- ity: Modeling human responses and automatic speaker recognition. InInterspeech, pages 1044–1048, 2016

  56. [56]

    Introducing model inversion at- tacks on automatic speaker recognition.arXiv preprint arXiv:2301.03206, 2023

    Karla Pizzi, Franziska Boenisch, Ugur Sahin, and Kon- stantin Böttinger. Introducing model inversion at- tacks on automatic speaker recognition.arXiv preprint arXiv:2301.03206, 2023

  57. [57]

    Real-time psychoacoustic frequency masking compensation for audio signals with overlap- ping spectra

    Giorgio Presti, Nicola Degiorgi, Amedeo Fresia, Anto- nio Servetti, et al. Real-time psychoacoustic frequency masking compensation for audio signals with overlap- ping spectra. InProceedings of the 21st Sound and Music Computing Conference, pages 439–444. SMC, 2024

  58. [58]

    Speech sanitizer: Speech content desensitization and voice anonymiza- tion.IEEE Transactions on Dependable and Secure Computing, 18(6):2631–2642, 2019

    Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, and Xiang-Yang Li. Speech sanitizer: Speech content desensitization and voice anonymiza- tion.IEEE Transactions on Dependable and Secure Computing, 18(6):2631–2642, 2019

  59. [59]

    Security system using biometric technology: Design and implementation of voice recognition system (vrs)

    Rozeha A Rashid, Nur Hija Mahalin, Mohd Adib Sari- jari, and Ahmad Aizuddin Abdul Aziz. Security system using biometric technology: Design and implementation of voice recognition system (vrs). In2008 international conference on computer and communication engineer- ing, pages 898–902. IEEE, 2008

  60. [60]

    A compact multifunctional metas- tructure for low-frequency broadband sound absorption and crash energy dissipation.Materials & Design, 215:110462, 2022

    Zhiwen Ren, Yuehang Cheng, Mingji Chen, Xujin Yuan, and Daining Fang. A compact multifunctional metas- tructure for low-frequency broadband sound absorption and crash energy dissipation.Materials & Design, 215:110462, 2022

  61. [61]

    Speaker verification using adapted gaussian mix- ture models.Digital signal processing, 10(1-3):19–41, 2000

    Douglas A Reynolds, Thomas F Quatieri, and Robert B Dunn. Speaker verification using adapted gaussian mix- ture models.Digital signal processing, 10(1-3):19–41, 2000. 16

  62. [62]

    A 13.0 kbit/s wideband speech codec based on sb-acelp

    Jürgen Schnitzler. A 13.0 kbit/s wideband speech codec based on sb-acelp. InProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Sig- nal Processing, ICASSP’98 (Cat. No. 98CH36181), vol- ume 1, pages 157–160. IEEE, 1998

  63. [63]

    Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems

    Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. InProceedings of the 36th An- nual Computer Security Applications Conference, pages 843–855, 2020

  64. [64]

    Head movements while recognizing speech arriving from behind.The Journal of the Acoustical Society of America, 141(2):EL108–EL114, 2017

    Yi Shen, Monica L Folkerts, and Virgina M Richards. Head movements while recognizing speech arriving from behind.The Journal of the Acoustical Society of America, 141(2):EL108–EL114, 2017

  65. [65]

    Architected cellular piezoelectric metamaterials: Thermo-electro- mechanical properties.Acta Materialia, 163:91–121, 2019

    Jiahao Shi and AH Akbarzadeh. Architected cellular piezoelectric metamaterials: Thermo-electro- mechanical properties.Acta Materialia, 163:91–121, 2019

  66. [66]

    X-vectors: Ro- bust dnn embeddings for speaker recognition

    David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. X-vectors: Ro- bust dnn embeddings for speaker recognition. In2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5329–5333. IEEE, 2018

  67. [67]

    Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.Jour- nal of the Association for Research in Otolaryngology, 13(2):269–280, 2012

    Robert J Summers, Peter J Bailey, and Brian Roberts. Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.Jour- nal of the Association for Research in Otolaryngology, 13(2):269–280, 2012

  68. [68]

    Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech.Communications Medicine, 4(1):182, 2024

    Soroosh Tayebi Arasteh, Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Tobias Weise, Kai Pack- häuser, Maria Schuster, Elmar Noeth, Andreas Maier, and Seung Hee Yang. Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech.Communications Medicine, 4(1):182, 2024

  69. [69]

    Differential privacy en- ables fair and accurate ai-based analysis of speech dis- orders while protecting patient data.npj Artificial Intel- ligence, 1(1):37, 2025

    Soroosh Tayebi Arasteh, Mahshad Lotfinia, Paula An- drea Perez-Toro, Tomas Arias-Vergara, Mahtab Ranji, Juan Rafael Orozco-Arroyave, Maria Schuster, Andreas Maier, and Seung Hee Yang. Differential privacy en- ables fair and accurate ai-based analysis of speech dis- orders while protecting patient data.npj Artificial Intel- ligence, 1(1):37, 2025

  70. [70]

    From one stolen utterance: Assessing the risks of voice cloning in the aigc era

    Kun Wang, Meng Chen, Li Lu, Jingwen Feng, Qianniu Chen, Zhongjie Ba, Kui Ren, and Chun Chen. From one stolen utterance: Assessing the risks of voice cloning in the aigc era. In2025 IEEE Symposium on Security and Privacy (SP), pages 4663–4681. IEEE, 2025

  71. [71]

    Robust speaker recognition using denoised vocal source and vocal tract features.IEEE transactions on audio, speech, and language processing, 19(1):196–205, 2010

    Ning Wang, PC Ching, Nengheng Zheng, and Tan Lee. Robust speaker recognition using denoised vocal source and vocal tract features.IEEE transactions on audio, speech, and language processing, 19(1):196–205, 2010

  72. [72]

    Spider web-inspired acoustic metamaterials with multi-band gaps for low-frequency elastic wave propagation control.Journal of Physics D: Applied Physics, 2025

    Yang Wang, Xiaoyu Wang, Huanyu Dong, Lele Ma, Yue Fu, and Lingxing Zhang. Spider web-inspired acoustic metamaterials with multi-band gaps for low-frequency elastic wave propagation control.Journal of Physics D: Applied Physics, 2025

  73. [73]

    Vsmask: Defending against voice synthesis attack via real-time predictive perturbation

    Yuanda Wang, Hanqing Guo, Guangjing Wang, Bocheng Chen, and Qiben Yan. Vsmask: Defending against voice synthesis attack via real-time predictive perturbation. In Proceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks, pages 239–250, 2023

  74. [74]

    Micpro: Microphone-based voice privacy protection

    Shilin Xiao, Xiaoyu Ji, Chen Yan, Zhicong Zheng, and Wenyuan Xu. Micpro: Microphone-based voice privacy protection. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1302–1316, 2023

  75. [75]

    Real-time, universal, and robust adversarial attacks against speaker recognition systems

    Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, and Bo Yuan. Real-time, universal, and robust adversarial attacks against speaker recognition systems. InICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 1738–1742. IEEE, 2020

  76. [76]

    EASY: Emotion-aware Speaker Anonymization via Fac- torized Distillation

    Jixun Yao, Hexin Liu, Eng Siong Chng, and Lei Xie. EASY: Emotion-aware Speaker Anonymization via Fac- torized Distillation. InInterspeech 2025, pages 3219– 3223, 2025

  77. [77]

    Musa: Multi- lingual speaker anonymization via serial disentangle- ment.IEEE Transactions on Audio, Speech and Lan- guage Processing, 2025

    Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, and Lei Xie. Musa: Multi- lingual speaker anonymization via serial disentangle- ment.IEEE Transactions on Audio, Speech and Lan- guage Processing, 2025

  78. [78]

    Remote whisper- ing metamaterial for non-radiative transceiving of ultra- weak sound.Nature Communications, 12(1):3670, 2021

    Jin Zhang, Wei Rui, Chengrong Ma, Ying Cheng, Xi- aojun Liu, and Johan Christensen. Remote whisper- ing metamaterial for non-radiative transceiving of ultra- weak sound.Nature Communications, 12(1):3670, 2021

  79. [79]

    V oiceprint recogni- tion technology and its application status.Research on information security, 2(1):44–57, 2016

    Fang Zheng, LT Li, and Hui Zhang. V oiceprint recogni- tion technology and its application status.Research on information security, 2(1):44–57, 2016

  80. [80]

    V oiceprint-biometric template design and authentication based on cloud computing security

    Hua-Hong Zhu, Qian-Hua He, Hong Tang, and Wei- Hua Cao. V oiceprint-biometric template design and authentication based on cloud computing security. In 2011 International Conference on Cloud and Service Computing, pages 302–308. IEEE, 2011. 17

Showing first 80 references.