ChaRVoC: A Challenge-Response Voice Cancelable Authentication System
Pith reviewed 2026-05-08 18:21 UTC · model grok-4.3
The pith
ChaRVoC uses a hash and graycode scheme to create revocable, non-invertible voice templates protected by secret keys and dynamic challenges.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ChaRVoC integrates voice biometrics, user-memorized secret keys enabling template revocability, and dynamic system-generated challenges providing liveness detection. The novel HashGray-XOR scheme combines a cryptographic hash function with an unrecoverable graycode-based transformation to create secured templates that are mathematically proven to be non-invertible. The system achieves both cancelability and unlinkability properties while maintaining recognition performance comparable to existing methods on VoxCeleb1, TIMIT, and VOiCES datasets.
What carries the argument
The HashGray-XOR scheme, which applies a cryptographic hash function to voice features combined with a secret key and then performs an unrecoverable graycode-based transformation to generate non-invertible templates.
If this is right
- Templates can be revoked by changing the user's secret key without needing new voice enrollment.
- Dynamic challenges ensure each authentication attempt is unique, blocking recorded replay attacks.
- Unlinkability prevents matching of templates from the same user across separate authentication systems.
- Recognition accuracy remains competitive with prior cancelable methods such as WTA, IoM, and RoE on standard voice datasets.
- The three-factor design reduces the impact of any single compromise in voice, key, or template storage.
Where Pith is reading between the lines
- The method could extend to other biometric types where revocability is needed without retraining.
- If the non-invertibility proof holds under standard cryptographic assumptions, it strengthens arguments for deploying cancelable biometrics at scale.
- Integration with mobile devices could use device-stored challenges to add friction against remote attacks.
Load-bearing premise
The graycode-based transformation cannot be reversed to recover the original voice data or secret key even when the hash output and transformation rules are known.
What would settle it
An algorithm or procedure that successfully recovers the original voice features or secret key from a stored ChaRVoC template would show the HashGray-XOR scheme is invertible.
Figures
read the original abstract
In this work, we present a Challenge-Response Voice Cancelable authentication system, called ChaRVoC, which provides protection against replay attacks, revocability issues, and template compromise. Our approach integrates three security factors: (1) inherent voice biometric characteristics, (2) user-memorized secret keys enabling template revocability, and (3) dynamic system-generated challenges providing liveness detection. Specifically, we introduce a novel HashGray-XOR scheme which combines a cryptographic hash function with an unrecoverable graycode-based transformation to create secured templates that are mathematically proven to be non-invertible. We compare our methods with existing cancelable biometric methods (WTA, IoM, RoE) on VoxCeleb1, TIMIT, and VOiCES datasets to show the recognition performance of our proposed system. We also show that our system achieves both cancelability and unlinkability properties.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ChaRVoC, a challenge-response voice cancelable authentication system integrating voice biometrics, user-memorized secret keys for revocability, and dynamic challenges for liveness detection to address replay attacks and template compromise. It introduces a HashGray-XOR scheme combining a cryptographic hash with a graycode-based transformation, claiming this yields templates that are mathematically proven non-invertible. Performance is compared to WTA, IoM, and RoE on VoxCeleb1, TIMIT, and VOiCES datasets, with additional claims of cancelability and unlinkability.
Significance. If the non-invertibility claim holds under a formal security model and the empirical results demonstrate competitive accuracy with proper statistical controls, the work could meaningfully advance cancelable biometrics for voice by adding challenge-response liveness without sacrificing revocability or unlinkability.
major comments (2)
- [Abstract] Abstract: The central claim that the HashGray-XOR scheme produces 'secured templates that are mathematically proven to be non-invertible' lacks any derivation, reduction, or proof sketch. Binary-reflected Gray codes are bijective permutations invertible in linear time via successive XOR with right-shifted copies; therefore the graycode layer adds no one-wayness beyond the hash preimage resistance and secret-key XOR. An explicit security model (e.g., in the random-oracle or standard-model setting) showing that template inversion remains hard even after removal of the graycode step is required.
- [Evaluation section] Evaluation (performance comparison): The abstract states comparisons on VoxCeleb1, TIMIT, and VOiCES but supplies no concrete metrics (EER, accuracy, AUC), error bars, dataset splits, or exclusion criteria. Without these, it is impossible to verify whether the claimed recognition performance is statistically distinguishable from the baselines or robust to the three-factor integration.
minor comments (2)
- [Abstract] Clarify the exact ordering of hash, graycode, and XOR operations and whether the graycode is applied to the feature vector or the hash output; this affects both invertibility analysis and implementation reproducibility.
- Provide formal definitions of cancelability and unlinkability (e.g., via indistinguishability or distance-preserving properties) and explicit experimental protocols demonstrating these properties rather than informal assertions.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which helps strengthen the security analysis and clarity of our results. We address each major comment below, indicating the specific revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the HashGray-XOR scheme produces 'secured templates that are mathematically proven to be non-invertible' lacks any derivation, reduction, or proof sketch. Binary-reflected Gray codes are bijective permutations invertible in linear time via successive XOR with right-shifted copies; therefore the graycode layer adds no one-wayness beyond the hash preimage resistance and secret-key XOR. An explicit security model (e.g., in the random-oracle or standard-model setting) showing that template inversion remains hard even after removal of the graycode step is required.
Authors: We agree that the Gray-code transformation is a bijective permutation and does not itself contribute one-wayness; the non-invertibility claim rests on the combination of the cryptographic hash preimage resistance and the secret-key XOR. The Gray-code step is used for diffusion to support unlinkability and cancelability. We will add a dedicated security analysis section that provides an explicit security model in the random-oracle setting. This will include a proof sketch showing that template inversion remains hard (under standard hash assumptions) even if the Gray-code layer is removed, together with a formal definition of the adversary and the advantage bound. revision: yes
-
Referee: [Evaluation section] Evaluation (performance comparison): The abstract states comparisons on VoxCeleb1, TIMIT, and VOiCES but supplies no concrete metrics (EER, accuracy, AUC), error bars, dataset splits, or exclusion criteria. Without these, it is impossible to verify whether the claimed recognition performance is statistically distinguishable from the baselines or robust to the three-factor integration.
Authors: The full evaluation section already reports EER, accuracy, and AUC values for ChaRVoC versus WTA, IoM, and RoE on the three datasets, along with dataset splits. However, we acknowledge that the abstract omits these numbers and that the evaluation section would benefit from additional statistical controls. We will revise the abstract to include the key EER figures, and we will expand the evaluation section to report error bars from multiple random seeds, explicit train/test splits, exclusion criteria for utterances, and results of statistical significance tests (e.g., paired t-tests) against the baselines. revision: yes
Circularity Check
No significant circularity; claims rest on novel construction and standard assumptions.
full rationale
The paper introduces the HashGray-XOR scheme as a new combination of cryptographic hash and graycode transformation, asserting non-invertibility via mathematical proof. No load-bearing steps reduce by construction to fitted parameters, self-citations, or prior inputs; the abstract and description present the scheme as self-contained, relying on the claimed properties of the construction itself plus standard crypto primitives without evident renaming, smuggling, or definitional loops. Security properties are asserted as newly achieved rather than derived from the inputs by equivalence.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Cryptographic hash functions are one-way and secure
- domain assumption Graycode-based transformation is unrecoverable
invented entities (1)
-
HashGray-XOR scheme
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith.Cost (Jcost)washburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
t = f(k,v) = H(k) ⊕ T(v) ... H is the cryptographic hash function, T is the unrecoverable graycode-based function
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the second ACM workshop on Digital identity management
Bhargav-Spantzel, A., Squicciarini, A., Bertino, E.: Privacy preserving multi-factor authentication with biometrics. In: Proceedings of the second ACM workshop on Digital identity management. pp. 63–72 (2006)
2006
-
[2]
IET Biometrics4(2), 116–126 (2015)
Billeb, S., Rathgeb, C., Reininger, H., Kasper, K., Busch, C.: Biometric template protection for speaker recognition based on universal background models. IET Biometrics4(2), 116–126 (2015)
2015
-
[3]
Ceaparu, M., Toma, S.A., Segarceanu, S., Suciu, G., Gavat, I.: Multifactor voice- based authentication system. J. Eng. Sci. Technol. Rev pp. 131–136 (2020)
2020
-
[4]
Pat- tern Recognition76, 273–287 (2018)
Chee, K.Y., Jin, Z., Cai, D., Li, M., Yap, W.S., Lai, Y.L., Goi, B.M.: Cancellable speech template via random binary orthogonal matrices projection hashing. Pat- tern Recognition76, 273–287 (2018)
2018
-
[5]
In: 2018 IEEE International Conference onAcoustics,SpeechandSignalProcessing(ICASSP).pp.5359–5363.IEEE(2018)
rahman Chowdhury, F.R., Wang, Q., Moreno, I.L., Wan, L.: Attention-based mod- els for text-dependent speaker verification. In: 2018 IEEE International Conference onAcoustics,SpeechandSignalProcessing(ICASSP).pp.5359–5363.IEEE(2018)
2018
-
[6]
In: Meng, H., Xu, B., Zheng, T.F
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verifi- cation. In: Meng, H., Xu, B., Zheng, T.F. (eds.) Interspeech 2020. pp. 3830–3834. ISCA (2020) Phuc-Khang Vo-Hoang et al
2020
-
[7]
Interna- tional Journal of Speech Technology25(3), 759–770 (2022).https://doi.org/10
El-Moneim, S.A., Nassar, M.A., Dessouky, M.I., Ismail, N.A., El-Fishawy, A.S., Abd El-Samie, F.E.: Cancellable template generation for speaker recognition based on spectrogram patch selection and deep convolutional neural networks. Interna- tional Journal of Speech Technology25(3), 759–770 (2022).https://doi.org/10. 1007/s10772-020-09791-y
2022
-
[8]
Web Download (1993),https://catalog.ldc.upenn.edu/LDC93S1
Garofolo, J.S., et al.: Timit acoustic-phonetic continuous speech corpus ldc93s1. Web Download (1993),https://catalog.ldc.upenn.edu/LDC93S1
1993
-
[9]
Gomez-Barrero, M., Galbally, J., Rathgeb, C., Busch, C.: General framework to evaluateunlinkabilityinbiometrictemplateprotectionsystems.IEEETransactions on Information Forensics and Security13(6), 1406–1420 (2017)
2017
-
[10]
In: Proceedings of the 2018 10th International Conference on Information Management and Engineering
Guamán, S., Calvopiña, A., Orta, P., Tapia, F., Yoo, S.G.: Device control system for a smart home using voice commands: A practical case. In: Proceedings of the 2018 10th International Conference on Information Management and Engineering. pp. 86–89 (2018)
2018
-
[11]
In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP)
Heigold,G.,Moreno,I.,Bengio,S.,Shazeer,N.:End-to-endtext-dependentspeaker verification. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). pp. 5115–5119. IEEE (2016)
2016
-
[12]
IEEE Trans- actions on Information Forensics and Security13(2), 393–407 (2017)
Jin, Z., Hwang, J.Y., Lai, Y.L., Kim, S., Teoh, A.B.J.: Ranking-based locality sen- sitive hashing-enabled cancelable biometrics: Index-of-max hashing. IEEE Trans- actions on Information Forensics and Security13(2), 393–407 (2017)
2017
-
[13]
In: 2013IEEEInternationalConferenceonTechnologiesforHomelandSecurity(HST)
Johnson, R., Boult, T.E.: With vaulted voice verification my voice is my key. In: 2013IEEEInternationalConferenceonTechnologiesforHomelandSecurity(HST). pp. 453–459. IEEE (2013)
2013
-
[14]
Jung,J.w.,Kim,S.b.,Shim,H.j.,Kim,J.h.,Yu,H.J.:Improvedrawnetwithfeature map scaling for text-independent speaker verification using raw waveforms. Proc. Interspeech pp. 3583–3587 (2020)
2020
-
[15]
Jung, J.w., Kim, Y.J., Heo, H.S., Lee, B.J., Kwon, Y., Chung, J.S.: Pushing the limits of raw waveform speaker recognition. Proc. Interspeech (2022)
2022
-
[16]
arXiv preprint arXiv:2402.18085 (2024)
Mittal, G., Jakobsson, A., Marshall, K.O., Hegde, C., Memon, N.: Pitch: Ai- assisted tagging of deepfake audio calls using challenge-response. arXiv preprint arXiv:2402.18085 (2024)
-
[17]
V oxceleb: a large-scale speaker identification dataset,
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: A large-scale speaker identifi- cation dataset. arXiv:1706.08612 (2017), available athttp://www.robots.ox.ac. uk/~vgg/data/voxceleb/
-
[18]
arXiv preprint arXiv:1803.03559 (2018)
Nautsch, A., Isadskiy, S., Kolberg, J., Gomez-Barrero, M., Busch, C.: Homomor- phic encryption for speaker recognition: Protection of biometric templates and vendor model parameters. arXiv preprint arXiv:1803.03559 (2018)
-
[19]
Pattern Recognition159, 111107 (2025)
Nguyen-Le, H.H., Tran, L., Nguyen, D.S.A., Le-Khac, N.A., Nguyen, T.: Privacy- preserving speaker verification system using ranking-of-element hashing. Pattern Recognition159, 111107 (2025)
2025
-
[20]
Nguyen-Le, H.H., Tran, L., Nguyen, D.S.A., Le-Khac, N.A., Nguyen, T.: Privacy- preserving speaker verification system using ranking-of-element hashing. Pat- tern Recognition159, 111107 (2025).https://doi.org/10.1016/j.patcog.2024. 111107
-
[21]
In: Odyssey
Paulini,M.,Rathgeb,C.,Nautsch,A.,Reichau,H.,Reininger,H.,Busch,C.:Multi- bit allocation: Preparing voice biometrics for template protection. In: Odyssey. pp. 291–296 (2016)
2016
-
[22]
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies3(3), 1–26 (2019) ChaRVoC: A Challenge-Response Voice Cancelable Authentication System
Pradhan, S., Sun, W., Baig, G., Qiu, L.: Combating replay attacks against voice as- sistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies3(3), 1–26 (2019) ChaRVoC: A Challenge-Response Voice Cancelable Authentication System
2019
-
[23]
Richey, C., Barrios, M.A., Armstrong, Z., Bartels, C., Franco, H., Graciarena, M., Lawson, A., Nandwana, M.K., Stauffer, A., van Hout, J., Gamble, P., Hetherly, J., Stephenson, C., Ni, K.: Voices obscured in complex environmental settings (voices) corpus (2018)
2018
-
[24]
In: 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN)
Yadav, S.P., Gupta, A., Nascimento, C.D.S., de Albuquerque, V.H.C., Naruka, M.S., Chauhan, S.S.: Voice-based virtual-controlled intelligent personal assistants. In: 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN). pp. 563–568. IEEE (2023)
2023
-
[25]
In: 2011 International Conference on Computer Vision
Yagnik, J., Strelow, D., Ross, D.A., Lin, R.s.: The power of comparative reason- ing. In: 2011 International Conference on Computer Vision. pp. 2431–2438. IEEE (2011)
2011
-
[26]
In: Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security
Yasur, L., Frankovits, G., Grabovski, F.M., Mirsky, Y.: Deepfake captcha: A method for preventing fake calls. In: Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security. pp. 608–622 (2023)
2023
-
[27]
In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
Zhang, L., Tan, S., Yang, J.: Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 57–71 (2017)
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.