pith. sign in

arxiv: 2605.02990 · v2 · submitted 2026-05-04 · 💻 cs.CR

ChaRVoC: A Challenge-Response Voice Cancelable Authentication System

Pith reviewed 2026-05-08 18:21 UTC · model grok-4.3

classification 💻 cs.CR
keywords cancelable biometricsvoice authenticationchallenge-responsetemplate securityrevocabilityunlinkabilityHashGray-XORliveness detection
0
0 comments X

The pith

ChaRVoC uses a hash and graycode scheme to create revocable, non-invertible voice templates protected by secret keys and dynamic challenges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ChaRVoC, a voice authentication system that combines inherent voice features with user-memorized secret keys and system-generated challenges. This three-factor setup aims to block replay attacks through liveness checks, allow template revocation by key changes, and prevent recovery of original data. The central mechanism is the HashGray-XOR scheme, which applies a cryptographic hash followed by an unrecoverable graycode transformation to produce templates that the authors prove cannot be inverted. If correct, the approach would let voice biometrics function like changeable passwords while preserving recognition accuracy. Evaluations against other cancelable methods on VoxCeleb1, TIMIT, and VOiCES datasets support that performance holds alongside the new security properties of cancelability and unlinkability.

Core claim

ChaRVoC integrates voice biometrics, user-memorized secret keys enabling template revocability, and dynamic system-generated challenges providing liveness detection. The novel HashGray-XOR scheme combines a cryptographic hash function with an unrecoverable graycode-based transformation to create secured templates that are mathematically proven to be non-invertible. The system achieves both cancelability and unlinkability properties while maintaining recognition performance comparable to existing methods on VoxCeleb1, TIMIT, and VOiCES datasets.

What carries the argument

The HashGray-XOR scheme, which applies a cryptographic hash function to voice features combined with a secret key and then performs an unrecoverable graycode-based transformation to generate non-invertible templates.

If this is right

  • Templates can be revoked by changing the user's secret key without needing new voice enrollment.
  • Dynamic challenges ensure each authentication attempt is unique, blocking recorded replay attacks.
  • Unlinkability prevents matching of templates from the same user across separate authentication systems.
  • Recognition accuracy remains competitive with prior cancelable methods such as WTA, IoM, and RoE on standard voice datasets.
  • The three-factor design reduces the impact of any single compromise in voice, key, or template storage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to other biometric types where revocability is needed without retraining.
  • If the non-invertibility proof holds under standard cryptographic assumptions, it strengthens arguments for deploying cancelable biometrics at scale.
  • Integration with mobile devices could use device-stored challenges to add friction against remote attacks.

Load-bearing premise

The graycode-based transformation cannot be reversed to recover the original voice data or secret key even when the hash output and transformation rules are known.

What would settle it

An algorithm or procedure that successfully recovers the original voice features or secret key from a stored ChaRVoC template would show the HashGray-XOR scheme is invertible.

Figures

Figures reproduced from arXiv: 2605.02990 by Dinh-Thuc Nguyen, Hoang C. Ta, Hong-Hanh Nguyen-Le, Nhien-An Le-Khac, Phuc-Khang Vo-Hoang.

Figure 1
Figure 1. Figure 1: An overview of our Challenge-Response Voice Cancelable Voice Authen view at source ↗
Figure 2
Figure 2. Figure 2: Distributions of mated samples and non-mated samples. view at source ↗
read the original abstract

In this work, we present a Challenge-Response Voice Cancelable authentication system, called ChaRVoC, which provides protection against replay attacks, revocability issues, and template compromise. Our approach integrates three security factors: (1) inherent voice biometric characteristics, (2) user-memorized secret keys enabling template revocability, and (3) dynamic system-generated challenges providing liveness detection. Specifically, we introduce a novel HashGray-XOR scheme which combines a cryptographic hash function with an unrecoverable graycode-based transformation to create secured templates that are mathematically proven to be non-invertible. We compare our methods with existing cancelable biometric methods (WTA, IoM, RoE) on VoxCeleb1, TIMIT, and VOiCES datasets to show the recognition performance of our proposed system. We also show that our system achieves both cancelability and unlinkability properties.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes ChaRVoC, a challenge-response voice cancelable authentication system integrating voice biometrics, user-memorized secret keys for revocability, and dynamic challenges for liveness detection to address replay attacks and template compromise. It introduces a HashGray-XOR scheme combining a cryptographic hash with a graycode-based transformation, claiming this yields templates that are mathematically proven non-invertible. Performance is compared to WTA, IoM, and RoE on VoxCeleb1, TIMIT, and VOiCES datasets, with additional claims of cancelability and unlinkability.

Significance. If the non-invertibility claim holds under a formal security model and the empirical results demonstrate competitive accuracy with proper statistical controls, the work could meaningfully advance cancelable biometrics for voice by adding challenge-response liveness without sacrificing revocability or unlinkability.

major comments (2)
  1. [Abstract] Abstract: The central claim that the HashGray-XOR scheme produces 'secured templates that are mathematically proven to be non-invertible' lacks any derivation, reduction, or proof sketch. Binary-reflected Gray codes are bijective permutations invertible in linear time via successive XOR with right-shifted copies; therefore the graycode layer adds no one-wayness beyond the hash preimage resistance and secret-key XOR. An explicit security model (e.g., in the random-oracle or standard-model setting) showing that template inversion remains hard even after removal of the graycode step is required.
  2. [Evaluation section] Evaluation (performance comparison): The abstract states comparisons on VoxCeleb1, TIMIT, and VOiCES but supplies no concrete metrics (EER, accuracy, AUC), error bars, dataset splits, or exclusion criteria. Without these, it is impossible to verify whether the claimed recognition performance is statistically distinguishable from the baselines or robust to the three-factor integration.
minor comments (2)
  1. [Abstract] Clarify the exact ordering of hash, graycode, and XOR operations and whether the graycode is applied to the feature vector or the hash output; this affects both invertibility analysis and implementation reproducibility.
  2. Provide formal definitions of cancelability and unlinkability (e.g., via indistinguishability or distance-preserving properties) and explicit experimental protocols demonstrating these properties rather than informal assertions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps strengthen the security analysis and clarity of our results. We address each major comment below, indicating the specific revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the HashGray-XOR scheme produces 'secured templates that are mathematically proven to be non-invertible' lacks any derivation, reduction, or proof sketch. Binary-reflected Gray codes are bijective permutations invertible in linear time via successive XOR with right-shifted copies; therefore the graycode layer adds no one-wayness beyond the hash preimage resistance and secret-key XOR. An explicit security model (e.g., in the random-oracle or standard-model setting) showing that template inversion remains hard even after removal of the graycode step is required.

    Authors: We agree that the Gray-code transformation is a bijective permutation and does not itself contribute one-wayness; the non-invertibility claim rests on the combination of the cryptographic hash preimage resistance and the secret-key XOR. The Gray-code step is used for diffusion to support unlinkability and cancelability. We will add a dedicated security analysis section that provides an explicit security model in the random-oracle setting. This will include a proof sketch showing that template inversion remains hard (under standard hash assumptions) even if the Gray-code layer is removed, together with a formal definition of the adversary and the advantage bound. revision: yes

  2. Referee: [Evaluation section] Evaluation (performance comparison): The abstract states comparisons on VoxCeleb1, TIMIT, and VOiCES but supplies no concrete metrics (EER, accuracy, AUC), error bars, dataset splits, or exclusion criteria. Without these, it is impossible to verify whether the claimed recognition performance is statistically distinguishable from the baselines or robust to the three-factor integration.

    Authors: The full evaluation section already reports EER, accuracy, and AUC values for ChaRVoC versus WTA, IoM, and RoE on the three datasets, along with dataset splits. However, we acknowledge that the abstract omits these numbers and that the evaluation section would benefit from additional statistical controls. We will revise the abstract to include the key EER figures, and we will expand the evaluation section to report error bars from multiple random seeds, explicit train/test splits, exclusion criteria for utterances, and results of statistical significance tests (e.g., paired t-tests) against the baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on novel construction and standard assumptions.

full rationale

The paper introduces the HashGray-XOR scheme as a new combination of cryptographic hash and graycode transformation, asserting non-invertibility via mathematical proof. No load-bearing steps reduce by construction to fitted parameters, self-citations, or prior inputs; the abstract and description present the scheme as self-contained, relying on the claimed properties of the construction itself plus standard crypto primitives without evident renaming, smuggling, or definitional loops. Security properties are asserted as newly achieved rather than derived from the inputs by equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on standard cryptographic assumptions and the unrecoverability of the introduced transformation, with no free parameters or new physical entities specified.

axioms (2)
  • standard math Cryptographic hash functions are one-way and secure
    Invoked as the basis for the HashGray-XOR scheme to secure templates
  • domain assumption Graycode-based transformation is unrecoverable
    Stated as enabling non-invertible templates in the abstract
invented entities (1)
  • HashGray-XOR scheme no independent evidence
    purpose: To generate secured non-invertible voice templates combining hash and graycode
    Newly proposed method whose properties are claimed but not independently evidenced outside the paper

pith-pipeline@v0.9.0 · 5475 in / 1427 out tokens · 105482 ms · 2026-05-08T18:21:49.288100+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 4 canonical work pages

  1. [1]

    In: Proceedings of the second ACM workshop on Digital identity management

    Bhargav-Spantzel, A., Squicciarini, A., Bertino, E.: Privacy preserving multi-factor authentication with biometrics. In: Proceedings of the second ACM workshop on Digital identity management. pp. 63–72 (2006)

  2. [2]

    IET Biometrics4(2), 116–126 (2015)

    Billeb, S., Rathgeb, C., Reininger, H., Kasper, K., Busch, C.: Biometric template protection for speaker recognition based on universal background models. IET Biometrics4(2), 116–126 (2015)

  3. [3]

    Ceaparu, M., Toma, S.A., Segarceanu, S., Suciu, G., Gavat, I.: Multifactor voice- based authentication system. J. Eng. Sci. Technol. Rev pp. 131–136 (2020)

  4. [4]

    Pat- tern Recognition76, 273–287 (2018)

    Chee, K.Y., Jin, Z., Cai, D., Li, M., Yap, W.S., Lai, Y.L., Goi, B.M.: Cancellable speech template via random binary orthogonal matrices projection hashing. Pat- tern Recognition76, 273–287 (2018)

  5. [5]

    In: 2018 IEEE International Conference onAcoustics,SpeechandSignalProcessing(ICASSP).pp.5359–5363.IEEE(2018)

    rahman Chowdhury, F.R., Wang, Q., Moreno, I.L., Wan, L.: Attention-based mod- els for text-dependent speaker verification. In: 2018 IEEE International Conference onAcoustics,SpeechandSignalProcessing(ICASSP).pp.5359–5363.IEEE(2018)

  6. [6]

    In: Meng, H., Xu, B., Zheng, T.F

    Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verifi- cation. In: Meng, H., Xu, B., Zheng, T.F. (eds.) Interspeech 2020. pp. 3830–3834. ISCA (2020) Phuc-Khang Vo-Hoang et al

  7. [7]

    Interna- tional Journal of Speech Technology25(3), 759–770 (2022).https://doi.org/10

    El-Moneim, S.A., Nassar, M.A., Dessouky, M.I., Ismail, N.A., El-Fishawy, A.S., Abd El-Samie, F.E.: Cancellable template generation for speaker recognition based on spectrogram patch selection and deep convolutional neural networks. Interna- tional Journal of Speech Technology25(3), 759–770 (2022).https://doi.org/10. 1007/s10772-020-09791-y

  8. [8]

    Web Download (1993),https://catalog.ldc.upenn.edu/LDC93S1

    Garofolo, J.S., et al.: Timit acoustic-phonetic continuous speech corpus ldc93s1. Web Download (1993),https://catalog.ldc.upenn.edu/LDC93S1

  9. [9]

    Gomez-Barrero, M., Galbally, J., Rathgeb, C., Busch, C.: General framework to evaluateunlinkabilityinbiometrictemplateprotectionsystems.IEEETransactions on Information Forensics and Security13(6), 1406–1420 (2017)

  10. [10]

    In: Proceedings of the 2018 10th International Conference on Information Management and Engineering

    Guamán, S., Calvopiña, A., Orta, P., Tapia, F., Yoo, S.G.: Device control system for a smart home using voice commands: A practical case. In: Proceedings of the 2018 10th International Conference on Information Management and Engineering. pp. 86–89 (2018)

  11. [11]

    In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP)

    Heigold,G.,Moreno,I.,Bengio,S.,Shazeer,N.:End-to-endtext-dependentspeaker verification. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp. 5115–5119. IEEE (2016)

  12. [12]

    IEEE Trans- actions on Information Forensics and Security13(2), 393–407 (2017)

    Jin, Z., Hwang, J.Y., Lai, Y.L., Kim, S., Teoh, A.B.J.: Ranking-based locality sen- sitive hashing-enabled cancelable biometrics: Index-of-max hashing. IEEE Trans- actions on Information Forensics and Security13(2), 393–407 (2017)

  13. [13]

    In: 2013IEEEInternationalConferenceonTechnologiesforHomelandSecurity(HST)

    Johnson, R., Boult, T.E.: With vaulted voice verification my voice is my key. In: 2013IEEEInternationalConferenceonTechnologiesforHomelandSecurity(HST). pp. 453–459. IEEE (2013)

  14. [14]

    Jung,J.w.,Kim,S.b.,Shim,H.j.,Kim,J.h.,Yu,H.J.:Improvedrawnetwithfeature map scaling for text-independent speaker verification using raw waveforms. Proc. Interspeech pp. 3583–3587 (2020)

  15. [15]

    Jung, J.w., Kim, Y.J., Heo, H.S., Lee, B.J., Kwon, Y., Chung, J.S.: Pushing the limits of raw waveform speaker recognition. Proc. Interspeech (2022)

  16. [16]

    arXiv preprint arXiv:2402.18085 (2024)

    Mittal, G., Jakobsson, A., Marshall, K.O., Hegde, C., Memon, N.: Pitch: Ai- assisted tagging of deepfake audio calls using challenge-response. arXiv preprint arXiv:2402.18085 (2024)

  17. [17]

    V oxceleb: a large-scale speaker identification dataset,

    Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: A large-scale speaker identifi- cation dataset. arXiv:1706.08612 (2017), available athttp://www.robots.ox.ac. uk/~vgg/data/voxceleb/

  18. [18]

    arXiv preprint arXiv:1803.03559 (2018)

    Nautsch, A., Isadskiy, S., Kolberg, J., Gomez-Barrero, M., Busch, C.: Homomor- phic encryption for speaker recognition: Protection of biometric templates and vendor model parameters. arXiv preprint arXiv:1803.03559 (2018)

  19. [19]

    Pattern Recognition159, 111107 (2025)

    Nguyen-Le, H.H., Tran, L., Nguyen, D.S.A., Le-Khac, N.A., Nguyen, T.: Privacy- preserving speaker verification system using ranking-of-element hashing. Pattern Recognition159, 111107 (2025)

  20. [20]

    Angarano, M

    Nguyen-Le, H.H., Tran, L., Nguyen, D.S.A., Le-Khac, N.A., Nguyen, T.: Privacy- preserving speaker verification system using ranking-of-element hashing. Pat- tern Recognition159, 111107 (2025).https://doi.org/10.1016/j.patcog.2024. 111107

  21. [21]

    In: Odyssey

    Paulini,M.,Rathgeb,C.,Nautsch,A.,Reichau,H.,Reininger,H.,Busch,C.:Multi- bit allocation: Preparing voice biometrics for template protection. In: Odyssey. pp. 291–296 (2016)

  22. [22]

    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies3(3), 1–26 (2019) ChaRVoC: A Challenge-Response Voice Cancelable Authentication System

    Pradhan, S., Sun, W., Baig, G., Qiu, L.: Combating replay attacks against voice as- sistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies3(3), 1–26 (2019) ChaRVoC: A Challenge-Response Voice Cancelable Authentication System

  23. [23]

    Richey, C., Barrios, M.A., Armstrong, Z., Bartels, C., Franco, H., Graciarena, M., Lawson, A., Nandwana, M.K., Stauffer, A., van Hout, J., Gamble, P., Hetherly, J., Stephenson, C., Ni, K.: Voices obscured in complex environmental settings (voices) corpus (2018)

  24. [24]

    In: 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN)

    Yadav, S.P., Gupta, A., Nascimento, C.D.S., de Albuquerque, V.H.C., Naruka, M.S., Chauhan, S.S.: Voice-based virtual-controlled intelligent personal assistants. In: 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN). pp. 563–568. IEEE (2023)

  25. [25]

    In: 2011 International Conference on Computer Vision

    Yagnik, J., Strelow, D., Ross, D.A., Lin, R.s.: The power of comparative reason- ing. In: 2011 International Conference on Computer Vision. pp. 2431–2438. IEEE (2011)

  26. [26]

    In: Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security

    Yasur, L., Frankovits, G., Grabovski, F.M., Mirsky, Y.: Deepfake captcha: A method for preventing fake calls. In: Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security. pp. 608–622 (2023)

  27. [27]

    In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

    Zhang, L., Tan, S., Yang, J.: Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 57–71 (2017)