Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials

Xiaojiang Chen; Zhanyong Tang; Zheng Wang; Zhiyuan Ning

arxiv: 2604.20116 · v1 · submitted 2026-04-22 · 💻 cs.SD

Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials

Zhiyuan Ning , Zhanyong Tang , Xiaojiang Chen , Zheng Wang This is my paper

Pith reviewed 2026-05-09 23:42 UTC · model grok-4.3

classification 💻 cs.SD

keywords voiceprint anonymizationacoustic metamaterialsphysical-layer securityspeech privacybiometric protectionreal-time anonymization3D-printable devicessound wave interference

0 comments

The pith

Acoustic metamaterials alter sound waves to anonymize voiceprints before any microphone can capture them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EchoMask as a physical system that disrupts voice biometrics at the source rather than relying on software inside recording devices. It places 3D-printable metamaterial structures near the speaker to generate interference that targets the acoustic features used for voiceprint matching while leaving ordinary speech understandable. The design includes an acoustic model to handle speaker movement and reconfigurable elements that vary the interference over time so attackers cannot easily learn or cancel a fixed pattern. Tests across eight microphones in different environments show the system raises the rate of failed voiceprint matches above 90 percent. Because the approach requires no power, software, machine learning, or device modifications, it offers protection even when microphones or recording software cannot be trusted.

Core claim

EchoMask is the first practical physical-layer system for real-time voiceprint anonymization using acoustic metamaterials. By combining frequency-selective interference to disrupt voiceprint features, an acoustic-field model for stability under movement, and reconfigurable structures that produce time-varying interference, the system prevents capture of clean voiceprints through compromised devices. Experiments demonstrate that it raises the miss-match rate above 90 percent while maintaining high speech intelligibility, and the entire solution is low-cost, power-free, and 3D-printable.

What carries the argument

Reconfigurable acoustic metamaterials that generate frequency-selective and time-varying interference patterns to disrupt voiceprint features before sound reaches the microphone.

If this is right

Voiceprint-based authentication fails against speakers using the metamaterial structures even if all recording devices are compromised.
No software updates or hardware changes to microphones are needed to achieve anonymization in public or sensitive spaces.
Low-cost 3D-printed physical barriers can provide biometric protection that works independently of any digital system.
Speech remains usable for normal communication since the interference preserves intelligibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same physical interference approach could be tested on other acoustic biometrics such as emotion detection from voice.
Reconfigurable metamaterials might be adapted into wearable or portable forms for individual use in varied settings.
Physical-layer methods like this could reduce dependence on software-only privacy tools that require trusted hardware.
Combining the structures with everyday objects such as conference tables or podiums might enable widespread passive deployment.

Load-bearing premise

The interference patterns stay stable enough under normal speaker movement and remain unpredictable enough that attackers cannot learn or cancel them.

What would settle it

A demonstration that an attacker using multiple microphones and signal reconstruction can achieve voiceprint matching success rates well above 10 percent on speech processed by EchoMask.

Figures

Figures reproduced from arXiv: 2604.20116 by Xiaojiang Chen, Zhanyong Tang, Zheng Wang, Zhiyuan Ning.

**Figure 1.** Figure 1: A deployment scenario of ECHOMASK for voiceprint anonymization. placing a passive metamaterial outside the microphone, we distort identity-bearing components of speech before capture while preserving intelligibility. The compact, power-free design can be attached to off-the-shelf microphones, making it suitable for public and shared recording environments. However, utilizing metamaterials for voiceprint a… view at source ↗

**Figure 2.** Figure 2: Prototypes for typical microphones (a) and mobile [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: (a) Anonymization acoustic field model and (b) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: (a) Gains at -90°to 90° with M1 only. (b) Top: Gains at 0°to 90° (M1, M2); Bottom: Gains at -90°to 0° (M1, M2). (c) Gains at -90°to 90° with M1, M2, and M3. u1 u2 u1 Randomized interference structure Driven by u1 variation (a) 515 520 525 530 535 540 545 Frequence (Hz) 72 74 76 78 80 8 mm 9 mm 10 mm 11 mm 12 mm 13 mm 14 mm 15 mm 16 mm 518Hz 523Hz 526Hz 529Hz 532Hz 535Hz 537Hz 521Hz 540Hz (b) [PITH_FULL_IM… view at source ↗

**Figure 7.** Figure 7: Functional variants of system: compatible with con [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 10.** Figure 10: Impact of different devices on anonymization per [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Impact of speaker differences on anonymization [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

**Figure 12.** Figure 12: Impact of volume on anonymization performance [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

**Figure 14.** Figure 14: (a) Anonymization efficiency of ECHOMASK and (b) the subjective auditory of the anonymized audio. 5.1.2 Efficiency and perceptual quality Processing efficiency. Real-time performance is essential for deployment in live settings such as talks, meetings, and online conferences. We evaluate efficiency using the Realtime Coefficient (RTC) [30, 74] on audio samples of varying lengths and content. Because ECHO… view at source ↗

**Figure 15.** Figure 15: Impact of the system on ASR performance (cover [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗

**Figure 16.** Figure 16: (a) Anonymization performance without the dy [PITH_FULL_IMAGE:figures/full_fig_p012_16.png] view at source ↗

**Figure 18.** Figure 18: (a) Anonymization under varying noise levels, (b) [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗

**Figure 19.** Figure 19: Impact of different wind speeds on E [PITH_FULL_IMAGE:figures/full_fig_p012_19.png] view at source ↗

read the original abstract

Voiceprints are widely used for authentication; however, they are easily captured in public settings and cannot be revoked once leaked. Existing anonymization systems operate inside recording devices, which makes them ineffective when microphones or software are untrusted, as in conference rooms, lecture halls, and interviews. We present EchoMask, the first practical physical-layer system for real-time voiceprint anonymization using acoustic metamaterials. By modifying sound waves before they reach the microphone, EchoMask prevents attackers from capturing clean voiceprints through compromised devices. Our design combines three key innovations: frequency-selective interference to disrupt voiceprint features while preserving speech intelligibility, an acoustic-field model to ensure stability under speaker movement, and reconfigurable structures that create time-varying interference to prevent learning or canceling a fixed acoustic pattern. EchoMask is low-cost, power-free, and 3D-printable, requiring no machine learning, software support, or microphone modification. Experiments conducted across eight microphones in diverse environments demonstrate that EchoMask increases the Miss-match Rate, i.e., the fraction of failed voiceprint matching attempts, to over 90%, while maintaining high speech intelligibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EchoMask introduces a practical physical metamaterial device for real-time voiceprint anonymization before the mic, with a clean design but experimental claims that still need checks on movement stability and adaptive attacks.

read the letter

This paper's main contribution is EchoMask, a physical device made from acoustic metamaterials that alters sound waves to anonymize voiceprints in real time. It targets scenarios where mics or recording software can't be trusted. The work stands out for integrating frequency-selective interference to scramble speaker features without hurting intelligibility, an acoustic-field model to handle speaker movement, and reconfigurable structures for time-varying interference that stops pattern learning. Being low-cost, power-free, and 3D-printable adds practical appeal, and no machine learning or device mods are required. Experiments with eight microphones in different environments reportedly push the miss-match rate above 90% while keeping speech understandable. If the data collection and analysis are solid, this is a useful step for physical privacy tools. The soft spots center on the assumptions flagged in the stress test. The acoustic model must keep performance stable as the speaker moves, and the reconfiguration has to block adaptive attackers who might record multiple times and average or retrain. The provided abstract lacks specifics on testing movement paths or such attacks, so those need to be addressed clearly in the full paper to back the central claim. Other details like baselines and error bars would help too, but they seem secondary. This is for specialists in acoustic privacy, biometric security, and metamaterial applications. Anyone exploring hardware defenses against voice biometrics would get ideas from it. The paper merits peer review. The novelty and experimental angle make it worth a referee's time, even if revisions are needed on the evaluation of robustness.

Referee Report

3 major / 1 minor

Summary. The manuscript presents EchoMask, a physical-layer voiceprint anonymization system using acoustic metamaterials to modify sound waves before they reach the microphone. It claims three innovations: frequency-selective interference that disrupts voiceprint features while preserving intelligibility, an acoustic-field model ensuring stability under speaker movement, and reconfigurable structures generating time-varying interference to prevent attackers from learning or canceling fixed patterns. The system is low-cost, power-free, and 3D-printable with no ML, software, or microphone modifications required. Experiments across eight microphones in diverse environments are reported to achieve over 90% miss-match rate (fraction of failed voiceprint matching attempts) while maintaining high speech intelligibility.

Significance. If the performance claims and underlying assumptions hold, this would constitute a meaningful advance in voice authentication privacy by shifting anonymization to the physical layer, rendering it effective against untrusted or compromised recording devices in public settings. The passive, hardware-only design and lack of dependence on software or ML distinguish it from prior approaches and could enable practical deployment. Strengths include the emphasis on revocability and low cost; however, significance hinges on rigorous validation of movement stability and resistance to adaptive, multi-session attacks.

major comments (3)

[Experiments] Experiments section: the central claim of >90% miss-match rate is presented without error bars, baseline comparisons to software-based anonymization methods, or explicit data exclusion criteria, preventing assessment of whether the data support the performance assertion across the eight microphones and environments.
[Design] Acoustic-field model (design section): the model is asserted to ensure stability under speaker movement, but no quantitative error bounds, sensitivity analysis, or validation against actual movement trajectories are provided; this directly bears on whether small positional changes invalidate the frequency-selective interference.
[Evaluation] Evaluation of reconfigurability: the manuscript provides no evidence that experiments included multi-session recordings or adaptive attacker models (e.g., averaging or retraining across sessions); without this, the claim that time-varying interference prevents pattern learning or cancellation remains untested and load-bearing for the overall security argument.

minor comments (1)

[Abstract] Abstract: the description of 'high speech intelligibility' would be strengthened by naming the specific metric (e.g., word error rate or MOS) used to quantify it.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and will revise the manuscript to incorporate the suggested improvements for greater rigor and clarity.

read point-by-point responses

Referee: Experiments section: the central claim of >90% miss-match rate is presented without error bars, baseline comparisons to software-based anonymization methods, or explicit data exclusion criteria, preventing assessment of whether the data support the performance assertion across the eight microphones and environments.

Authors: We agree that the presentation of results in the Experiments section can be strengthened with additional statistical details and comparisons. In the revised manuscript, we will add error bars to the reported miss-match rates, include baseline comparisons against representative software-based voiceprint anonymization methods, and explicitly describe the data exclusion criteria applied. These changes will enable a clearer assessment of the results across the eight microphones and environments. revision: yes
Referee: Acoustic-field model (design section): the model is asserted to ensure stability under speaker movement, but no quantitative error bounds, sensitivity analysis, or validation against actual movement trajectories are provided; this directly bears on whether small positional changes invalidate the frequency-selective interference.

Authors: We acknowledge the need for quantitative validation of the acoustic-field model's robustness to movement. We will augment the Design section with a sensitivity analysis that includes quantitative error bounds and validation against recorded movement trajectories from our experiments. This will clarify the positional tolerances under which the frequency-selective interference remains effective. revision: yes
Referee: Evaluation of reconfigurability: the manuscript provides no evidence that experiments included multi-session recordings or adaptive attacker models (e.g., averaging or retraining across sessions); without this, the claim that time-varying interference prevents pattern learning or cancellation remains untested and load-bearing for the overall security argument.

Authors: We agree that direct evaluation against adaptive, multi-session attacks is important to substantiate the reconfigurability claims. In the revised manuscript, we will add experiments involving multi-session recordings and adaptive attacker models (including averaging and retraining across sessions) to demonstrate that the time-varying interference prevents effective pattern learning or cancellation. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on physical design and empirical measurements, not self-referential derivations

full rationale

The paper describes a hardware system (EchoMask) using acoustic metamaterials for physical-layer anonymization, with three stated innovations (frequency-selective interference, acoustic-field model for movement stability, and reconfigurable time-varying patterns). Success is asserted via experiments measuring miss-match rates (>90%) and intelligibility across eight microphones in varied environments. No equations, predictions, or first-principles derivations are presented in the provided text that reduce by construction to fitted parameters, self-citations, or renamed inputs. The central results are experimental outcomes rather than analytic claims that loop back to their own assumptions. This matches the default expectation of no significant circularity for an engineering/experimental paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on domain assumptions about acoustic propagation and attacker limitations rather than explicit free parameters or new physical entities beyond the designed metamaterial structure.

axioms (2)

domain assumption The acoustic-field model accurately predicts system stability under speaker movement.
Invoked to guarantee performance when the speaker is not stationary.
domain assumption Time-varying interference prevents attackers from learning or canceling a fixed acoustic pattern.
Required for the security argument against adaptive adversaries.

invented entities (1)

EchoMask reconfigurable metamaterial structure no independent evidence
purpose: To produce frequency-selective and time-varying acoustic interference that disrupts voiceprint features.
New hardware design introduced by the paper; no independent external evidence provided in the abstract.

pith-pipeline@v0.9.0 · 5504 in / 1373 out tokens · 41506 ms · 2026-05-09T23:42:46.975879+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages

[1]

com/speech-to-text

Google speech-to-text ai, https://cloud.google. com/speech-to-text. Last accessed: 2024-8-6

work page 2024
[2]

Last accessed: 2026-2-01

iflytek, https://console.xfyun.cn/services/ivp. Last accessed: 2026-2-01

work page 2026
[3]

Last accessed: 2025-1-20

Huawei, https://consumer.huawei.com/cn/ phones/. Last accessed: 2025-1-20

work page 2025
[4]

Last accessed: 2026-2-01

Ecapa-tdnn, https://github.com/TaoRuijie/ ECAPA-TDNN. Last accessed: 2026-2-01

work page 2026
[5]

Last accessed: 2026-2-01

se electronics v7, https://seelectronics.com/ products/v7/. Last accessed: 2026-2-01

work page 2026
[6]

Last accessed: 2026-1-14

Apple, https://www.apple.com.cn/iphone/. Last accessed: 2026-1-14

work page 2026
[7]

audio-technica.com.hk/index.php?op= productdetails&pid=478&lang=eng

Audio-technica at9930, https://www. audio-technica.com.hk/index.php?op= productdetails&pid=478&lang=eng. Last ac- cessed: 2026-2-01

work page 2026
[8]

Last accessed: 2025-1-20

Comsol, https://www.comsol.com/. Last accessed: 2025-1-20

work page 2025
[9]

Last ac- cessed: 2025-1-20

Google, https://www.google-mobile.cn/. Last ac- cessed: 2025-1-20

work page 2025
[10]

Last accessed: 2026-1- 14

Samsung, https://www.samsung.com/hk/ smartphones/galaxy-s24/. Last accessed: 2026-1- 14

work page 2026
[11]

Last accessed: 2026-2-01

Shure sv200, https://www.shure.com/en-ASIA/ products/microphones/sv200. Last accessed: 2026-2-01

work page 2026
[12]

Last accessed: 2026-2-01

Behringer ta5212, https:// www.sweelee.com.sg/products/ behringer-ta5212-condenser-gooseneck-microphone . Last accessed: 2026-2-01

work page 2026
[13]

Mel frequency cepstral coefficient: a re- view.ICIDSSD, 2020

Shalbbya Ali, Safdar Tanweer, Syed Sibtain Khalid, and Naseem Rao. Mel frequency cepstral coefficient: a re- view.ICIDSSD, 2020

work page 2020
[14]

Text-independent speaker identification using vowel for- mants.Journal of Signal Processing Systems, 82(3):345– 356, 2016

Noor Almaadeed, Amar Aggoun, and Abbes Amira. Text-independent speaker identification using vowel for- mants.Journal of Signal Processing Systems, 82(3):345– 356, 2016

work page 2016
[15]

Acoustic metasurfaces

Badreddine Assouar, Bin Liang, Ying Wu, Yong Li, Jian- Chun Cheng, and Yun Jing. Acoustic metasurfaces. Nature Reviews Materials, 3(12):460–472, 2018

work page 2018
[16]

3-d sound for virtual reality and multimedia

Durand R Begault and Leonard J Trejo. 3-d sound for virtual reality and multimedia. Technical report, 2000

work page 2000
[17]

Integrated psy- choacoustic active noise control and masking.Applied Acoustics, 145:339–348, 2019

Valiantsin Belyi and Woon-Seng Gan. Integrated psy- choacoustic active noise control and masking.Applied Acoustics, 145:339–348, 2019

work page 2019
[18]

Techniques for high-quality acelp coding of wideband speech

Bruno Bessette, Roch Lefebvre, Redwan Salami, Milan Jelinek, Janne Vainio, J Rotola-Pukkila, Hannu Mikkola, and Kari Järvinen. Techniques for high-quality acelp coding of wideband speech. InINTERSPEECH, pages 1997–2000, 2001

work page 1997
[19]

V oice biometrics: Deep learning-based voiceprint authentication system

Andrew Boles and Paul Rad. V oice biometrics: Deep learning-based voiceprint authentication system. In 2017 12th system of systems engineering conference (SoSE), pages 1–6. IEEE, 2017. 14

work page 2017
[20]

The effect of the frequency and energetic content of broadband noise on the lombard effect and speech intelligibility

Pasquale Bottalico and Silvia Murgia. The effect of the frequency and energetic content of broadband noise on the lombard effect and speech intelligibility. InAcous- tics, volume 5, pages 898–908. MDPI, 2023

work page 2023
[21]

Numerical evaluation of urban geometry’s control of wind movements in out- door spaces during winter period

S Bouketta and Y Bouchahm. Numerical evaluation of urban geometry’s control of wind movements in out- door spaces during winter period. case of mediterranean climate.Renewable Energy, 146:1062–1069, 2020

work page 2020
[22]

The contribution of head movement to the externalization and internalization of sounds.PloS one, 8(12):e83068, 2013

W Owen Brimijoin, Alan W Boyd, and Michael A Akeroyd. The contribution of head movement to the externalization and internalization of sounds.PloS one, 8(12):e83068, 2013

work page 2013
[23]

Songbsab: A dual prevention approach against singing voice conversion based illegal song covers

Guangke Chen and Yedi Zhang. Songbsab: A dual prevention approach against singing voice conversion based illegal song covers. In32nd Annual Network and Distributed System Security Symposium, 2025

work page 2025
[24]

Saic: Integration of speech anonymization and identity classification

Ming Cheng, Xingjian Diao, Shitong Cheng, and Wen- jun Liu. Saic: Integration of speech anonymization and identity classification. InAI for Health Equity and Fair- ness: Leveraging AI to Address Social Determinants of Health, pages 295–306. Springer, 2024

work page 2024
[25]

Ultra-sparse metasurface for high reflection of low- frequency sound based on artificial mie resonances.Na- ture materials, 14(10):1013–1019, 2015

Y Cheng, C Zhou, BG Yuan, DJ Wu, Q Wei, and XJ Liu. Ultra-sparse metasurface for high reflection of low- frequency sound based on artificial mie resonances.Na- ture materials, 14(10):1013–1019, 2015

work page 2015
[26]

Electromagnetic interference attacks on cyber-physical systems: Theory, demonstra- tion, and defense

Gokcen Yilmaz Dayanikli. Electromagnetic interference attacks on cyber-physical systems: Theory, demonstra- tion, and defense. 2021

work page 2021
[27]

Anal- ysis of temperature, air humidity and wind conditions for the needs of outdoor thermal comfort

Ewelina Dec, Bo˙zena Babiarz, and Robert Sekret. Anal- ysis of temperature, air humidity and wind conditions for the needs of outdoor thermal comfort. InE3S Web of Conferences, volume 44, page 00028. EDP Sciences, 2018

work page 2018
[28]

Front-end factor analy- sis for speaker verification.IEEE Transactions on Au- dio, Speech, and Language Processing, 19(4):788–798, 2010

Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Du- mouchel, and Pierre Ouellet. Front-end factor analy- sis for speaker verification.IEEE Transactions on Au- dio, Speech, and Language Processing, 19(4):788–798, 2010

work page 2010
[29]

Fencesit- ter: Black-box, content-agnostic, and synchronization- free enrollment-phase attacks on speaker recognition systems

Jiangyi Deng, Yanjiao Chen, and Wenyuan Xu. Fencesit- ter: Black-box, content-agnostic, and synchronization- free enrollment-phase attacks on speaker recognition systems. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, page 755–767, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022
[30]

{V- Cloak}: Intelligibility-, naturalness-& {Timbre- Preserving}{Real-Time} voice anonymization

Jiangyi Deng, Fei Teng, Yanjiao Chen, Xiaofu Chen, Zhaohui Wang, and Wenyuan Xu. {V- Cloak}: Intelligibility-, naturalness-& {Timbre- Preserving}{Real-Time} voice anonymization. In32nd USENIX Security Symposium (USENIX Security 23), pages 5181–5198, 2023

work page 2023
[31]

Your voice assistant is mine: How to abuse speakers to steal information and control your phone

Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang. Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the 4th ACM workshop on security and privacy in smartphones & mobile devices, pages 63–74, 2014

work page 2014
[32]

Navigating the tradeoff between personal pri- vacy and data utility in speech anonymization for clinical research.npj Digital Medicine, 8(1):616, 2025

Catherine Diaz-Asper, Lars Ailo Bongo, and Brita Elvevåg. Navigating the tradeoff between personal pri- vacy and data utility in speech anonymization for clinical research.npj Digital Medicine, 8(1):616, 2025

work page 2025
[33]

Devel- opment and characterization of a flexible soundproofing metapanel for noise reduction

Jang Dongil, Kang Sanha, Kim Jinyoung, Kim Hyeonghoon, Lee Sinwoo, and Kim Bongjoong. Devel- opment and characterization of a flexible soundproofing metapanel for noise reduction. 2024

work page 2024
[34]

Silicon listening

Christina Dörfling. Silicon listening. mems, near- ultrasound, and machine listening beyond ai.Sound Studies, 11(2):314–338, 2025

work page 2025
[35]

Alexa versus alexa: Controlling smart speak- ers by self-issuing voice commands

Sergio Esposito, Daniele Sgandurra, and Giampaolo Bella. Alexa versus alexa: Controlling smart speak- ers by self-issuing voice commands. InProceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pages 1064–1078, 2022

work page 2022
[36]

Privacy leakage on dnns: A survey of model inversion attacks and defenses

Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, Shu-Tao Xia, and Ke Xu. Privacy leakage on dnns: A survey of model inversion attacks and defenses.arXiv preprint arXiv:2402.04013, 2024

work page arXiv 2024
[37]

Enkidu: Univer- sal frequential perturbation for real-time audio privacy protection against voice deepfakes

Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Qing- ming Li, Tianyu Du, and Shouling Ji. Enkidu: Univer- sal frequential perturbation for real-time audio privacy protection against voice deepfakes. InProceedings of the 33rd ACM International Conference on Multimedia, pages 11638–11647, 2025

work page 2025
[38]

Artificial piezoelectric metamaterials.Progress in Materials Science, page 101434, 2025

Ziyan Gao, Yu Lei, Zhanmiao Li, Jikun Yang, Bo Yu, Xi- aoting Yuan, Zewei Hou, Jiawang Hong, and Shuxiang Dong. Artificial piezoelectric metamaterials.Progress in Materials Science, page 101434, 2025

work page 2025
[39]

Ultra-open acoustic metama- terial silencer based on fano-like interference.Physical Review B, 99(2):024302, 2019

Reza Ghaffarivardavagh, Jacob Nikolajczyk, Stephan Anderson, and Xin Zhang. Ultra-open acoustic metama- terial silencer based on fano-like interference.Physical Review B, 99(2):024302, 2019

work page 2019
[40]

Automatic speech recogni- tion (asr) based approach for speech therapy of aphasic patients: A review

Norezmi Jamal, Shahnoor Shanta, Farhanahani Mah- mud, and MNAH Sha’abani. Automatic speech recogni- tion (asr) based approach for speech therapy of aphasic patients: A review. InAIP Conference Proceedings, volume 1883, page 020028. AIP Publishing LLC, 2017. 15

work page 2017
[41]

Asr-based speech intelligibility prediction: A review.Hearing Research, 426:108606, 2022

Mahdie Karbasi and Dorothea Kolossa. Asr-based speech intelligibility prediction: A review.Hearing Research, 426:108606, 2022

work page 2022
[42]

Vulnera- bility of mems gyroscopes to targeted acoustic attacks

Shadi Khazaaleh, Georgios Korres, Mohammed Eid, Mahmoud Rasras, and Mohammed F Daqaq. Vulnera- bility of mems gyroscopes to targeted acoustic attacks. IEEE Access, 7:89534–89543, 2019

work page 2019
[43]

A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis

Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, and Takao Kobayashi. A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), volume 1, pages 161–164. IEEE, 1998

work page 1998
[44]

A 16 kb/s wideband celp- based speech coder using mel-generalized cepstral anal- ysis.IEICE transactions on information and systems, 83(4):876–883, 2000

Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, and Takao Kobayashi. A 16 kb/s wideband celp- based speech coder using mel-generalized cepstral anal- ysis.IEICE transactions on information and systems, 83(4):876–883, 2000

work page 2000
[45]

Ntu-npu system for voice privacy 2024 challenge.emotion, 1:2

Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, and Eng Siong Chng. Ntu-npu system for voice privacy 2024 challenge.emotion, 1:2

work page 2024
[46]

Deep sub-wavelength acoustic transmission enhancement and whisper via the monopole resonance in meta-cavities.Applied Acoustics, 203:109227, 2023

Yunzhong Lei, Jiu Hui Wu, Libo Wang, Yao Huang, and Jiamin Niu. Deep sub-wavelength acoustic transmission enhancement and whisper via the monopole resonance in meta-cavities.Applied Acoustics, 203:109227, 2023

work page 2023
[47]

Head-orienting behaviors during simultaneous speech detection and localization

Angkana Lertpoompunya, Erol J Ozmeral, Nathan C Higgins, and David A Eddins. Head-orienting behaviors during simultaneous speech detection and localization. Frontiers in Psychology, 15:1425972, 2024

work page 2024
[48]

Practical adversarial attacks against speaker recognition systems

Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. Practical adversarial attacks against speaker recognition systems. InProceedings of the 21st international workshop on mobile computing systems and applications, pages 9–14, 2020

work page 2020
[49]

Lv-auth: Lip motion fusion for voiceprint authentica- tion

Wei Liu, Xiaojing Zhu, Qin Liu, Peng Li, and Man Zhou. Lv-auth: Lip motion fusion for voiceprint authentica- tion. InInternational Conference on Wireless Artificial Intelligent Computing Systems and Applications, pages 295–307. Springer, 2024

work page 2024
[50]

Mitigating inaudible ultrasound attacks on voice assistants with acoustic metamaterials.IEEE Access, 11:36464–36470, 2023

Joshua S Lloyd, Cole G Ludwikowski, Cyrus Malik, and Chen Shen. Mitigating inaudible ultrasound attacks on voice assistants with acoustic metamaterials.IEEE Access, 11:36464–36470, 2023

work page 2023
[51]

Shaping reverberating sound fields with an ac- tively tunable metasurface.Proceedings of the National Academy of Sciences, 115(26):6638–6643, 2018

Guancong Ma, Xiying Fan, Ping Sheng, and Mathias Fink. Shaping reverberating sound fields with an ac- tively tunable metasurface.Proceedings of the National Academy of Sciences, 115(26):6638–6643, 2018

work page 2018
[52]

Acoustic metamateri- als: From local resonances to broad horizons.Science advances, 2(2):e1501595, 2016

Guancong Ma and Ping Sheng. Acoustic metamateri- als: From local resonances to broad horizons.Science advances, 2(2):e1501595, 2016

work page 2016
[53]

Mouth sounds: A review of acoustic applications and methodologies.Applied Sciences, 13(7):4331, 2023

Norberto E Naal-Ruiz, Erick A Gonzalez-Rodriguez, Gustavo Navas-Reascos, Rebeca Romo-De Leon, Ale- jandro Solorio, Luz M Alonso-Valerdi, and David I Ibarra-Zarate. Mouth sounds: A review of acoustic applications and methodologies.Applied Sciences, 13(7):4331, 2023

work page 2023
[54]

Mobile 3d augmented-reality system for ultrasound ap- plications

Cameron Lowell Palmer, Bjørn Olav Haugen, Eva Teg- nander, Sturla H Eik-Nes, Hans Torp, and Gabriel Kiss. Mobile 3d augmented-reality system for ultrasound ap- plications. In2015 IEEE International Ultrasonics Sym- posium (IUS), pages 1–4. IEEE, 2015

work page 2015
[55]

Speaker identity and voice qual- ity: Modeling human responses and automatic speaker recognition

Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patri- cia A Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, and Abeer Alwan. Speaker identity and voice qual- ity: Modeling human responses and automatic speaker recognition. InInterspeech, pages 1044–1048, 2016

work page 2016
[56]

Introducing model inversion at- tacks on automatic speaker recognition.arXiv preprint arXiv:2301.03206, 2023

Karla Pizzi, Franziska Boenisch, Ugur Sahin, and Kon- stantin Böttinger. Introducing model inversion at- tacks on automatic speaker recognition.arXiv preprint arXiv:2301.03206, 2023

work page arXiv 2023
[57]

Real-time psychoacoustic frequency masking compensation for audio signals with overlap- ping spectra

Giorgio Presti, Nicola Degiorgi, Amedeo Fresia, Anto- nio Servetti, et al. Real-time psychoacoustic frequency masking compensation for audio signals with overlap- ping spectra. InProceedings of the 21st Sound and Music Computing Conference, pages 439–444. SMC, 2024

work page 2024
[58]

Speech sanitizer: Speech content desensitization and voice anonymiza- tion.IEEE Transactions on Dependable and Secure Computing, 18(6):2631–2642, 2019

Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, and Xiang-Yang Li. Speech sanitizer: Speech content desensitization and voice anonymiza- tion.IEEE Transactions on Dependable and Secure Computing, 18(6):2631–2642, 2019

work page 2019
[59]

Security system using biometric technology: Design and implementation of voice recognition system (vrs)

Rozeha A Rashid, Nur Hija Mahalin, Mohd Adib Sari- jari, and Ahmad Aizuddin Abdul Aziz. Security system using biometric technology: Design and implementation of voice recognition system (vrs). In2008 international conference on computer and communication engineer- ing, pages 898–902. IEEE, 2008

work page 2008
[60]

A compact multifunctional metas- tructure for low-frequency broadband sound absorption and crash energy dissipation.Materials & Design, 215:110462, 2022

Zhiwen Ren, Yuehang Cheng, Mingji Chen, Xujin Yuan, and Daining Fang. A compact multifunctional metas- tructure for low-frequency broadband sound absorption and crash energy dissipation.Materials & Design, 215:110462, 2022

work page 2022
[61]

Speaker verification using adapted gaussian mix- ture models.Digital signal processing, 10(1-3):19–41, 2000

Douglas A Reynolds, Thomas F Quatieri, and Robert B Dunn. Speaker verification using adapted gaussian mix- ture models.Digital signal processing, 10(1-3):19–41, 2000. 16

work page 2000
[62]

A 13.0 kbit/s wideband speech codec based on sb-acelp

Jürgen Schnitzler. A 13.0 kbit/s wideband speech codec based on sb-acelp. InProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Sig- nal Processing, ICASSP’98 (Cat. No. 98CH36181), vol- ume 1, pages 157–160. IEEE, 1998

work page 1998
[63]

Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems

Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. InProceedings of the 36th An- nual Computer Security Applications Conference, pages 843–855, 2020

work page 2020
[64]

Head movements while recognizing speech arriving from behind.The Journal of the Acoustical Society of America, 141(2):EL108–EL114, 2017

Yi Shen, Monica L Folkerts, and Virgina M Richards. Head movements while recognizing speech arriving from behind.The Journal of the Acoustical Society of America, 141(2):EL108–EL114, 2017

work page 2017
[65]

Architected cellular piezoelectric metamaterials: Thermo-electro- mechanical properties.Acta Materialia, 163:91–121, 2019

Jiahao Shi and AH Akbarzadeh. Architected cellular piezoelectric metamaterials: Thermo-electro- mechanical properties.Acta Materialia, 163:91–121, 2019

work page 2019
[66]

X-vectors: Ro- bust dnn embeddings for speaker recognition

David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. X-vectors: Ro- bust dnn embeddings for speaker recognition. In2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5329–5333. IEEE, 2018

work page 2018
[67]

Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.Jour- nal of the Association for Research in Otolaryngology, 13(2):269–280, 2012

Robert J Summers, Peter J Bailey, and Brian Roberts. Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.Jour- nal of the Association for Research in Otolaryngology, 13(2):269–280, 2012

work page 2012
[68]

Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech.Communications Medicine, 4(1):182, 2024

Soroosh Tayebi Arasteh, Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Tobias Weise, Kai Pack- häuser, Maria Schuster, Elmar Noeth, Andreas Maier, and Seung Hee Yang. Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech.Communications Medicine, 4(1):182, 2024

work page 2024
[69]

Differential privacy en- ables fair and accurate ai-based analysis of speech dis- orders while protecting patient data.npj Artificial Intel- ligence, 1(1):37, 2025

Soroosh Tayebi Arasteh, Mahshad Lotfinia, Paula An- drea Perez-Toro, Tomas Arias-Vergara, Mahtab Ranji, Juan Rafael Orozco-Arroyave, Maria Schuster, Andreas Maier, and Seung Hee Yang. Differential privacy en- ables fair and accurate ai-based analysis of speech dis- orders while protecting patient data.npj Artificial Intel- ligence, 1(1):37, 2025

work page 2025
[70]

From one stolen utterance: Assessing the risks of voice cloning in the aigc era

Kun Wang, Meng Chen, Li Lu, Jingwen Feng, Qianniu Chen, Zhongjie Ba, Kui Ren, and Chun Chen. From one stolen utterance: Assessing the risks of voice cloning in the aigc era. In2025 IEEE Symposium on Security and Privacy (SP), pages 4663–4681. IEEE, 2025

work page 2025
[71]

Robust speaker recognition using denoised vocal source and vocal tract features.IEEE transactions on audio, speech, and language processing, 19(1):196–205, 2010

Ning Wang, PC Ching, Nengheng Zheng, and Tan Lee. Robust speaker recognition using denoised vocal source and vocal tract features.IEEE transactions on audio, speech, and language processing, 19(1):196–205, 2010

work page 2010
[72]

Spider web-inspired acoustic metamaterials with multi-band gaps for low-frequency elastic wave propagation control.Journal of Physics D: Applied Physics, 2025

Yang Wang, Xiaoyu Wang, Huanyu Dong, Lele Ma, Yue Fu, and Lingxing Zhang. Spider web-inspired acoustic metamaterials with multi-band gaps for low-frequency elastic wave propagation control.Journal of Physics D: Applied Physics, 2025

work page 2025
[73]

Vsmask: Defending against voice synthesis attack via real-time predictive perturbation

Yuanda Wang, Hanqing Guo, Guangjing Wang, Bocheng Chen, and Qiben Yan. Vsmask: Defending against voice synthesis attack via real-time predictive perturbation. In Proceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks, pages 239–250, 2023

work page 2023
[74]

Micpro: Microphone-based voice privacy protection

Shilin Xiao, Xiaoyu Ji, Chen Yan, Zhicong Zheng, and Wenyuan Xu. Micpro: Microphone-based voice privacy protection. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1302–1316, 2023

work page 2023
[75]

Real-time, universal, and robust adversarial attacks against speaker recognition systems

Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, and Bo Yuan. Real-time, universal, and robust adversarial attacks against speaker recognition systems. InICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 1738–1742. IEEE, 2020

work page 2020
[76]

EASY: Emotion-aware Speaker Anonymization via Fac- torized Distillation

Jixun Yao, Hexin Liu, Eng Siong Chng, and Lei Xie. EASY: Emotion-aware Speaker Anonymization via Fac- torized Distillation. InInterspeech 2025, pages 3219– 3223, 2025

work page 2025
[77]

Musa: Multi- lingual speaker anonymization via serial disentangle- ment.IEEE Transactions on Audio, Speech and Lan- guage Processing, 2025

Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, and Lei Xie. Musa: Multi- lingual speaker anonymization via serial disentangle- ment.IEEE Transactions on Audio, Speech and Lan- guage Processing, 2025

work page 2025
[78]

Remote whisper- ing metamaterial for non-radiative transceiving of ultra- weak sound.Nature Communications, 12(1):3670, 2021

Jin Zhang, Wei Rui, Chengrong Ma, Ying Cheng, Xi- aojun Liu, and Johan Christensen. Remote whisper- ing metamaterial for non-radiative transceiving of ultra- weak sound.Nature Communications, 12(1):3670, 2021

work page 2021
[79]

V oiceprint recogni- tion technology and its application status.Research on information security, 2(1):44–57, 2016

Fang Zheng, LT Li, and Hui Zhang. V oiceprint recogni- tion technology and its application status.Research on information security, 2(1):44–57, 2016

work page 2016
[80]

V oiceprint-biometric template design and authentication based on cloud computing security

Hua-Hong Zhu, Qian-Hua He, Hong Tang, and Wei- Hua Cao. V oiceprint-biometric template design and authentication based on cloud computing security. In 2011 International Conference on Cloud and Service Computing, pages 302–308. IEEE, 2011. 17

work page 2011

Showing first 80 references.

[1] [1]

com/speech-to-text

Google speech-to-text ai, https://cloud.google. com/speech-to-text. Last accessed: 2024-8-6

work page 2024

[2] [2]

Last accessed: 2026-2-01

iflytek, https://console.xfyun.cn/services/ivp. Last accessed: 2026-2-01

work page 2026

[3] [3]

Last accessed: 2025-1-20

Huawei, https://consumer.huawei.com/cn/ phones/. Last accessed: 2025-1-20

work page 2025

[4] [4]

Last accessed: 2026-2-01

Ecapa-tdnn, https://github.com/TaoRuijie/ ECAPA-TDNN. Last accessed: 2026-2-01

work page 2026

[5] [5]

Last accessed: 2026-2-01

se electronics v7, https://seelectronics.com/ products/v7/. Last accessed: 2026-2-01

work page 2026

[6] [6]

Last accessed: 2026-1-14

Apple, https://www.apple.com.cn/iphone/. Last accessed: 2026-1-14

work page 2026

[7] [7]

audio-technica.com.hk/index.php?op= productdetails&pid=478&lang=eng

Audio-technica at9930, https://www. audio-technica.com.hk/index.php?op= productdetails&pid=478&lang=eng. Last ac- cessed: 2026-2-01

work page 2026

[8] [8]

Last accessed: 2025-1-20

Comsol, https://www.comsol.com/. Last accessed: 2025-1-20

work page 2025

[9] [9]

Last ac- cessed: 2025-1-20

Google, https://www.google-mobile.cn/. Last ac- cessed: 2025-1-20

work page 2025

[10] [10]

Last accessed: 2026-1- 14

Samsung, https://www.samsung.com/hk/ smartphones/galaxy-s24/. Last accessed: 2026-1- 14

work page 2026

[11] [11]

Last accessed: 2026-2-01

Shure sv200, https://www.shure.com/en-ASIA/ products/microphones/sv200. Last accessed: 2026-2-01

work page 2026

[12] [12]

Last accessed: 2026-2-01

Behringer ta5212, https:// www.sweelee.com.sg/products/ behringer-ta5212-condenser-gooseneck-microphone . Last accessed: 2026-2-01

work page 2026

[13] [13]

Mel frequency cepstral coefficient: a re- view.ICIDSSD, 2020

Shalbbya Ali, Safdar Tanweer, Syed Sibtain Khalid, and Naseem Rao. Mel frequency cepstral coefficient: a re- view.ICIDSSD, 2020

work page 2020

[14] [14]

Text-independent speaker identification using vowel for- mants.Journal of Signal Processing Systems, 82(3):345– 356, 2016

Noor Almaadeed, Amar Aggoun, and Abbes Amira. Text-independent speaker identification using vowel for- mants.Journal of Signal Processing Systems, 82(3):345– 356, 2016

work page 2016

[15] [15]

Acoustic metasurfaces

Badreddine Assouar, Bin Liang, Ying Wu, Yong Li, Jian- Chun Cheng, and Yun Jing. Acoustic metasurfaces. Nature Reviews Materials, 3(12):460–472, 2018

work page 2018

[16] [16]

3-d sound for virtual reality and multimedia

Durand R Begault and Leonard J Trejo. 3-d sound for virtual reality and multimedia. Technical report, 2000

work page 2000

[17] [17]

Integrated psy- choacoustic active noise control and masking.Applied Acoustics, 145:339–348, 2019

Valiantsin Belyi and Woon-Seng Gan. Integrated psy- choacoustic active noise control and masking.Applied Acoustics, 145:339–348, 2019

work page 2019

[18] [18]

Techniques for high-quality acelp coding of wideband speech

Bruno Bessette, Roch Lefebvre, Redwan Salami, Milan Jelinek, Janne Vainio, J Rotola-Pukkila, Hannu Mikkola, and Kari Järvinen. Techniques for high-quality acelp coding of wideband speech. InINTERSPEECH, pages 1997–2000, 2001

work page 1997

[19] [19]

V oice biometrics: Deep learning-based voiceprint authentication system

Andrew Boles and Paul Rad. V oice biometrics: Deep learning-based voiceprint authentication system. In 2017 12th system of systems engineering conference (SoSE), pages 1–6. IEEE, 2017. 14

work page 2017

[20] [20]

The effect of the frequency and energetic content of broadband noise on the lombard effect and speech intelligibility

Pasquale Bottalico and Silvia Murgia. The effect of the frequency and energetic content of broadband noise on the lombard effect and speech intelligibility. InAcous- tics, volume 5, pages 898–908. MDPI, 2023

work page 2023

[21] [21]

Numerical evaluation of urban geometry’s control of wind movements in out- door spaces during winter period

S Bouketta and Y Bouchahm. Numerical evaluation of urban geometry’s control of wind movements in out- door spaces during winter period. case of mediterranean climate.Renewable Energy, 146:1062–1069, 2020

work page 2020

[22] [22]

The contribution of head movement to the externalization and internalization of sounds.PloS one, 8(12):e83068, 2013

W Owen Brimijoin, Alan W Boyd, and Michael A Akeroyd. The contribution of head movement to the externalization and internalization of sounds.PloS one, 8(12):e83068, 2013

work page 2013

[23] [23]

Songbsab: A dual prevention approach against singing voice conversion based illegal song covers

Guangke Chen and Yedi Zhang. Songbsab: A dual prevention approach against singing voice conversion based illegal song covers. In32nd Annual Network and Distributed System Security Symposium, 2025

work page 2025

[24] [24]

Saic: Integration of speech anonymization and identity classification

Ming Cheng, Xingjian Diao, Shitong Cheng, and Wen- jun Liu. Saic: Integration of speech anonymization and identity classification. InAI for Health Equity and Fair- ness: Leveraging AI to Address Social Determinants of Health, pages 295–306. Springer, 2024

work page 2024

[25] [25]

Ultra-sparse metasurface for high reflection of low- frequency sound based on artificial mie resonances.Na- ture materials, 14(10):1013–1019, 2015

Y Cheng, C Zhou, BG Yuan, DJ Wu, Q Wei, and XJ Liu. Ultra-sparse metasurface for high reflection of low- frequency sound based on artificial mie resonances.Na- ture materials, 14(10):1013–1019, 2015

work page 2015

[26] [26]

Electromagnetic interference attacks on cyber-physical systems: Theory, demonstra- tion, and defense

Gokcen Yilmaz Dayanikli. Electromagnetic interference attacks on cyber-physical systems: Theory, demonstra- tion, and defense. 2021

work page 2021

[27] [27]

Anal- ysis of temperature, air humidity and wind conditions for the needs of outdoor thermal comfort

Ewelina Dec, Bo˙zena Babiarz, and Robert Sekret. Anal- ysis of temperature, air humidity and wind conditions for the needs of outdoor thermal comfort. InE3S Web of Conferences, volume 44, page 00028. EDP Sciences, 2018

work page 2018

[28] [28]

Front-end factor analy- sis for speaker verification.IEEE Transactions on Au- dio, Speech, and Language Processing, 19(4):788–798, 2010

Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Du- mouchel, and Pierre Ouellet. Front-end factor analy- sis for speaker verification.IEEE Transactions on Au- dio, Speech, and Language Processing, 19(4):788–798, 2010

work page 2010

[29] [29]

Fencesit- ter: Black-box, content-agnostic, and synchronization- free enrollment-phase attacks on speaker recognition systems

Jiangyi Deng, Yanjiao Chen, and Wenyuan Xu. Fencesit- ter: Black-box, content-agnostic, and synchronization- free enrollment-phase attacks on speaker recognition systems. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, page 755–767, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022

[30] [30]

{V- Cloak}: Intelligibility-, naturalness-& {Timbre- Preserving}{Real-Time} voice anonymization

Jiangyi Deng, Fei Teng, Yanjiao Chen, Xiaofu Chen, Zhaohui Wang, and Wenyuan Xu. {V- Cloak}: Intelligibility-, naturalness-& {Timbre- Preserving}{Real-Time} voice anonymization. In32nd USENIX Security Symposium (USENIX Security 23), pages 5181–5198, 2023

work page 2023

[31] [31]

Your voice assistant is mine: How to abuse speakers to steal information and control your phone

Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang. Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the 4th ACM workshop on security and privacy in smartphones & mobile devices, pages 63–74, 2014

work page 2014

[32] [32]

Navigating the tradeoff between personal pri- vacy and data utility in speech anonymization for clinical research.npj Digital Medicine, 8(1):616, 2025

Catherine Diaz-Asper, Lars Ailo Bongo, and Brita Elvevåg. Navigating the tradeoff between personal pri- vacy and data utility in speech anonymization for clinical research.npj Digital Medicine, 8(1):616, 2025

work page 2025

[33] [33]

Devel- opment and characterization of a flexible soundproofing metapanel for noise reduction

Jang Dongil, Kang Sanha, Kim Jinyoung, Kim Hyeonghoon, Lee Sinwoo, and Kim Bongjoong. Devel- opment and characterization of a flexible soundproofing metapanel for noise reduction. 2024

work page 2024

[34] [34]

Silicon listening

Christina Dörfling. Silicon listening. mems, near- ultrasound, and machine listening beyond ai.Sound Studies, 11(2):314–338, 2025

work page 2025

[35] [35]

Alexa versus alexa: Controlling smart speak- ers by self-issuing voice commands

Sergio Esposito, Daniele Sgandurra, and Giampaolo Bella. Alexa versus alexa: Controlling smart speak- ers by self-issuing voice commands. InProceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pages 1064–1078, 2022

work page 2022

[36] [36]

Privacy leakage on dnns: A survey of model inversion attacks and defenses

Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, Shu-Tao Xia, and Ke Xu. Privacy leakage on dnns: A survey of model inversion attacks and defenses.arXiv preprint arXiv:2402.04013, 2024

work page arXiv 2024

[37] [37]

Enkidu: Univer- sal frequential perturbation for real-time audio privacy protection against voice deepfakes

Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Qing- ming Li, Tianyu Du, and Shouling Ji. Enkidu: Univer- sal frequential perturbation for real-time audio privacy protection against voice deepfakes. InProceedings of the 33rd ACM International Conference on Multimedia, pages 11638–11647, 2025

work page 2025

[38] [38]

Artificial piezoelectric metamaterials.Progress in Materials Science, page 101434, 2025

Ziyan Gao, Yu Lei, Zhanmiao Li, Jikun Yang, Bo Yu, Xi- aoting Yuan, Zewei Hou, Jiawang Hong, and Shuxiang Dong. Artificial piezoelectric metamaterials.Progress in Materials Science, page 101434, 2025

work page 2025

[39] [39]

Ultra-open acoustic metama- terial silencer based on fano-like interference.Physical Review B, 99(2):024302, 2019

Reza Ghaffarivardavagh, Jacob Nikolajczyk, Stephan Anderson, and Xin Zhang. Ultra-open acoustic metama- terial silencer based on fano-like interference.Physical Review B, 99(2):024302, 2019

work page 2019

[40] [40]

Automatic speech recogni- tion (asr) based approach for speech therapy of aphasic patients: A review

Norezmi Jamal, Shahnoor Shanta, Farhanahani Mah- mud, and MNAH Sha’abani. Automatic speech recogni- tion (asr) based approach for speech therapy of aphasic patients: A review. InAIP Conference Proceedings, volume 1883, page 020028. AIP Publishing LLC, 2017. 15

work page 2017

[41] [41]

Asr-based speech intelligibility prediction: A review.Hearing Research, 426:108606, 2022

Mahdie Karbasi and Dorothea Kolossa. Asr-based speech intelligibility prediction: A review.Hearing Research, 426:108606, 2022

work page 2022

[42] [42]

Vulnera- bility of mems gyroscopes to targeted acoustic attacks

Shadi Khazaaleh, Georgios Korres, Mohammed Eid, Mahmoud Rasras, and Mohammed F Daqaq. Vulnera- bility of mems gyroscopes to targeted acoustic attacks. IEEE Access, 7:89534–89543, 2019

work page 2019

[43] [43]

A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis

Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, and Takao Kobayashi. A wideband celp speech coder at 16 kbit/s based on mel-generalized cepstral analysis. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), volume 1, pages 161–164. IEEE, 1998

work page 1998

[44] [44]

A 16 kb/s wideband celp- based speech coder using mel-generalized cepstral anal- ysis.IEICE transactions on information and systems, 83(4):876–883, 2000

Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, and Takao Kobayashi. A 16 kb/s wideband celp- based speech coder using mel-generalized cepstral anal- ysis.IEICE transactions on information and systems, 83(4):876–883, 2000

work page 2000

[45] [45]

Ntu-npu system for voice privacy 2024 challenge.emotion, 1:2

Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, and Eng Siong Chng. Ntu-npu system for voice privacy 2024 challenge.emotion, 1:2

work page 2024

[46] [46]

Deep sub-wavelength acoustic transmission enhancement and whisper via the monopole resonance in meta-cavities.Applied Acoustics, 203:109227, 2023

Yunzhong Lei, Jiu Hui Wu, Libo Wang, Yao Huang, and Jiamin Niu. Deep sub-wavelength acoustic transmission enhancement and whisper via the monopole resonance in meta-cavities.Applied Acoustics, 203:109227, 2023

work page 2023

[47] [47]

Head-orienting behaviors during simultaneous speech detection and localization

Angkana Lertpoompunya, Erol J Ozmeral, Nathan C Higgins, and David A Eddins. Head-orienting behaviors during simultaneous speech detection and localization. Frontiers in Psychology, 15:1425972, 2024

work page 2024

[48] [48]

Practical adversarial attacks against speaker recognition systems

Zhuohang Li, Cong Shi, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. Practical adversarial attacks against speaker recognition systems. InProceedings of the 21st international workshop on mobile computing systems and applications, pages 9–14, 2020

work page 2020

[49] [49]

Lv-auth: Lip motion fusion for voiceprint authentica- tion

Wei Liu, Xiaojing Zhu, Qin Liu, Peng Li, and Man Zhou. Lv-auth: Lip motion fusion for voiceprint authentica- tion. InInternational Conference on Wireless Artificial Intelligent Computing Systems and Applications, pages 295–307. Springer, 2024

work page 2024

[50] [50]

Mitigating inaudible ultrasound attacks on voice assistants with acoustic metamaterials.IEEE Access, 11:36464–36470, 2023

Joshua S Lloyd, Cole G Ludwikowski, Cyrus Malik, and Chen Shen. Mitigating inaudible ultrasound attacks on voice assistants with acoustic metamaterials.IEEE Access, 11:36464–36470, 2023

work page 2023

[51] [51]

Shaping reverberating sound fields with an ac- tively tunable metasurface.Proceedings of the National Academy of Sciences, 115(26):6638–6643, 2018

Guancong Ma, Xiying Fan, Ping Sheng, and Mathias Fink. Shaping reverberating sound fields with an ac- tively tunable metasurface.Proceedings of the National Academy of Sciences, 115(26):6638–6643, 2018

work page 2018

[52] [52]

Acoustic metamateri- als: From local resonances to broad horizons.Science advances, 2(2):e1501595, 2016

Guancong Ma and Ping Sheng. Acoustic metamateri- als: From local resonances to broad horizons.Science advances, 2(2):e1501595, 2016

work page 2016

[53] [53]

Mouth sounds: A review of acoustic applications and methodologies.Applied Sciences, 13(7):4331, 2023

Norberto E Naal-Ruiz, Erick A Gonzalez-Rodriguez, Gustavo Navas-Reascos, Rebeca Romo-De Leon, Ale- jandro Solorio, Luz M Alonso-Valerdi, and David I Ibarra-Zarate. Mouth sounds: A review of acoustic applications and methodologies.Applied Sciences, 13(7):4331, 2023

work page 2023

[54] [54]

Mobile 3d augmented-reality system for ultrasound ap- plications

Cameron Lowell Palmer, Bjørn Olav Haugen, Eva Teg- nander, Sturla H Eik-Nes, Hans Torp, and Gabriel Kiss. Mobile 3d augmented-reality system for ultrasound ap- plications. In2015 IEEE International Ultrasonics Sym- posium (IUS), pages 1–4. IEEE, 2015

work page 2015

[55] [55]

Speaker identity and voice qual- ity: Modeling human responses and automatic speaker recognition

Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patri- cia A Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, and Abeer Alwan. Speaker identity and voice qual- ity: Modeling human responses and automatic speaker recognition. InInterspeech, pages 1044–1048, 2016

work page 2016

[56] [56]

Introducing model inversion at- tacks on automatic speaker recognition.arXiv preprint arXiv:2301.03206, 2023

Karla Pizzi, Franziska Boenisch, Ugur Sahin, and Kon- stantin Böttinger. Introducing model inversion at- tacks on automatic speaker recognition.arXiv preprint arXiv:2301.03206, 2023

work page arXiv 2023

[57] [57]

Real-time psychoacoustic frequency masking compensation for audio signals with overlap- ping spectra

Giorgio Presti, Nicola Degiorgi, Amedeo Fresia, Anto- nio Servetti, et al. Real-time psychoacoustic frequency masking compensation for audio signals with overlap- ping spectra. InProceedings of the 21st Sound and Music Computing Conference, pages 439–444. SMC, 2024

work page 2024

[58] [58]

Speech sanitizer: Speech content desensitization and voice anonymiza- tion.IEEE Transactions on Dependable and Secure Computing, 18(6):2631–2642, 2019

Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, and Xiang-Yang Li. Speech sanitizer: Speech content desensitization and voice anonymiza- tion.IEEE Transactions on Dependable and Secure Computing, 18(6):2631–2642, 2019

work page 2019

[59] [59]

Security system using biometric technology: Design and implementation of voice recognition system (vrs)

Rozeha A Rashid, Nur Hija Mahalin, Mohd Adib Sari- jari, and Ahmad Aizuddin Abdul Aziz. Security system using biometric technology: Design and implementation of voice recognition system (vrs). In2008 international conference on computer and communication engineer- ing, pages 898–902. IEEE, 2008

work page 2008

[60] [60]

A compact multifunctional metas- tructure for low-frequency broadband sound absorption and crash energy dissipation.Materials & Design, 215:110462, 2022

Zhiwen Ren, Yuehang Cheng, Mingji Chen, Xujin Yuan, and Daining Fang. A compact multifunctional metas- tructure for low-frequency broadband sound absorption and crash energy dissipation.Materials & Design, 215:110462, 2022

work page 2022

[61] [61]

Speaker verification using adapted gaussian mix- ture models.Digital signal processing, 10(1-3):19–41, 2000

Douglas A Reynolds, Thomas F Quatieri, and Robert B Dunn. Speaker verification using adapted gaussian mix- ture models.Digital signal processing, 10(1-3):19–41, 2000. 16

work page 2000

[62] [62]

A 13.0 kbit/s wideband speech codec based on sb-acelp

Jürgen Schnitzler. A 13.0 kbit/s wideband speech codec based on sb-acelp. InProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Sig- nal Processing, ICASSP’98 (Cat. No. 98CH36181), vol- ume 1, pages 157–160. IEEE, 1998

work page 1998

[63] [63]

Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems

Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. InProceedings of the 36th An- nual Computer Security Applications Conference, pages 843–855, 2020

work page 2020

[64] [64]

Head movements while recognizing speech arriving from behind.The Journal of the Acoustical Society of America, 141(2):EL108–EL114, 2017

Yi Shen, Monica L Folkerts, and Virgina M Richards. Head movements while recognizing speech arriving from behind.The Journal of the Acoustical Society of America, 141(2):EL108–EL114, 2017

work page 2017

[65] [65]

Architected cellular piezoelectric metamaterials: Thermo-electro- mechanical properties.Acta Materialia, 163:91–121, 2019

Jiahao Shi and AH Akbarzadeh. Architected cellular piezoelectric metamaterials: Thermo-electro- mechanical properties.Acta Materialia, 163:91–121, 2019

work page 2019

[66] [66]

X-vectors: Ro- bust dnn embeddings for speaker recognition

David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. X-vectors: Ro- bust dnn embeddings for speaker recognition. In2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5329–5333. IEEE, 2018

work page 2018

[67] [67]

Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.Jour- nal of the Association for Research in Otolaryngology, 13(2):269–280, 2012

Robert J Summers, Peter J Bailey, and Brian Roberts. Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.Jour- nal of the Association for Research in Otolaryngology, 13(2):269–280, 2012

work page 2012

[68] [68]

Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech.Communications Medicine, 4(1):182, 2024

Soroosh Tayebi Arasteh, Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Tobias Weise, Kai Pack- häuser, Maria Schuster, Elmar Noeth, Andreas Maier, and Seung Hee Yang. Addressing challenges in speaker anonymization to maintain utility while ensuring privacy of pathological speech.Communications Medicine, 4(1):182, 2024

work page 2024

[69] [69]

Differential privacy en- ables fair and accurate ai-based analysis of speech dis- orders while protecting patient data.npj Artificial Intel- ligence, 1(1):37, 2025

Soroosh Tayebi Arasteh, Mahshad Lotfinia, Paula An- drea Perez-Toro, Tomas Arias-Vergara, Mahtab Ranji, Juan Rafael Orozco-Arroyave, Maria Schuster, Andreas Maier, and Seung Hee Yang. Differential privacy en- ables fair and accurate ai-based analysis of speech dis- orders while protecting patient data.npj Artificial Intel- ligence, 1(1):37, 2025

work page 2025

[70] [70]

From one stolen utterance: Assessing the risks of voice cloning in the aigc era

Kun Wang, Meng Chen, Li Lu, Jingwen Feng, Qianniu Chen, Zhongjie Ba, Kui Ren, and Chun Chen. From one stolen utterance: Assessing the risks of voice cloning in the aigc era. In2025 IEEE Symposium on Security and Privacy (SP), pages 4663–4681. IEEE, 2025

work page 2025

[71] [71]

Robust speaker recognition using denoised vocal source and vocal tract features.IEEE transactions on audio, speech, and language processing, 19(1):196–205, 2010

Ning Wang, PC Ching, Nengheng Zheng, and Tan Lee. Robust speaker recognition using denoised vocal source and vocal tract features.IEEE transactions on audio, speech, and language processing, 19(1):196–205, 2010

work page 2010

[72] [72]

Spider web-inspired acoustic metamaterials with multi-band gaps for low-frequency elastic wave propagation control.Journal of Physics D: Applied Physics, 2025

Yang Wang, Xiaoyu Wang, Huanyu Dong, Lele Ma, Yue Fu, and Lingxing Zhang. Spider web-inspired acoustic metamaterials with multi-band gaps for low-frequency elastic wave propagation control.Journal of Physics D: Applied Physics, 2025

work page 2025

[73] [73]

Vsmask: Defending against voice synthesis attack via real-time predictive perturbation

Yuanda Wang, Hanqing Guo, Guangjing Wang, Bocheng Chen, and Qiben Yan. Vsmask: Defending against voice synthesis attack via real-time predictive perturbation. In Proceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks, pages 239–250, 2023

work page 2023

[74] [74]

Micpro: Microphone-based voice privacy protection

Shilin Xiao, Xiaoyu Ji, Chen Yan, Zhicong Zheng, and Wenyuan Xu. Micpro: Microphone-based voice privacy protection. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1302–1316, 2023

work page 2023

[75] [75]

Real-time, universal, and robust adversarial attacks against speaker recognition systems

Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, and Bo Yuan. Real-time, universal, and robust adversarial attacks against speaker recognition systems. InICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 1738–1742. IEEE, 2020

work page 2020

[76] [76]

EASY: Emotion-aware Speaker Anonymization via Fac- torized Distillation

Jixun Yao, Hexin Liu, Eng Siong Chng, and Lei Xie. EASY: Emotion-aware Speaker Anonymization via Fac- torized Distillation. InInterspeech 2025, pages 3219– 3223, 2025

work page 2025

[77] [77]

Musa: Multi- lingual speaker anonymization via serial disentangle- ment.IEEE Transactions on Audio, Speech and Lan- guage Processing, 2025

Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, and Lei Xie. Musa: Multi- lingual speaker anonymization via serial disentangle- ment.IEEE Transactions on Audio, Speech and Lan- guage Processing, 2025

work page 2025

[78] [78]

Remote whisper- ing metamaterial for non-radiative transceiving of ultra- weak sound.Nature Communications, 12(1):3670, 2021

Jin Zhang, Wei Rui, Chengrong Ma, Ying Cheng, Xi- aojun Liu, and Johan Christensen. Remote whisper- ing metamaterial for non-radiative transceiving of ultra- weak sound.Nature Communications, 12(1):3670, 2021

work page 2021

[79] [79]

V oiceprint recogni- tion technology and its application status.Research on information security, 2(1):44–57, 2016

Fang Zheng, LT Li, and Hui Zhang. V oiceprint recogni- tion technology and its application status.Research on information security, 2(1):44–57, 2016

work page 2016

[80] [80]

V oiceprint-biometric template design and authentication based on cloud computing security

Hua-Hong Zhu, Qian-Hua He, Hong Tang, and Wei- Hua Cao. V oiceprint-biometric template design and authentication based on cloud computing security. In 2011 International Conference on Cloud and Service Computing, pages 302–308. IEEE, 2011. 17

work page 2011