arxiv: 2605.08431 · v1 · submitted 2026-05-08 · 📡 eess.AS

Recognition: 2 theorem links

· Lean Theorem

Latent Secret Spin: Keyed Orthogonal Rotations for Blind Speech Watermarking in Anisotropic Latent Spaces

Emma Coletta , Massimiliano Todisco , Michele Panariello , Antonio Faonio , Nicholas Evans

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:48 UTC · model grok-4.3

classification 📡 eess.AS

keywords blind watermarkingspeech watermarkinglatent spaceorthogonal rotationscovariance signaturespseudo-random schedulecodecimperceptible

0 comments

The pith

Keyed orthogonal rotations in speech codec latent spaces create detectable covariance signatures for watermarking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Latent Secret Spin (LSS), which embeds watermarks by applying keyed orthogonal rotations to principal components in the latent space of speech codecs. This produces imperceptible covariance signatures that follow a pseudo-random schedule and can be detected blindly. The method generalizes to different datasets, avoids neural network training, resists common signal manipulations, and allows flexible payload sizes while keeping perceptual quality intact. A sympathetic reader would care because it offers an interpretable, training-free alternative to neural watermarking techniques for speech.

Core claim

LSS induces imperceptible but detectable covariance signatures according to a pseudo-random watermarking schedule by performing keyed orthogonal rotations on principal components in anisotropic latent spaces of speech codecs. The scheme generalises across datasets, preserves perceptual quality, does not require neural network training, is resistant to common signal manipulations and is flexible to payload size, making structured latent-space watermarking a promising and interpretable alternative to existing approaches.

What carries the argument

Keyed orthogonal rotations applied to principal components in the codec latent space, inducing covariance signatures for blind watermark detection.

If this is right

The method works without neural network training for each new application.
Watermarks survive common manipulations like compression and noise addition.
Payload size can be adjusted flexibly by changing the rotation schedule.
Perceptual quality remains unchanged across multiple speech datasets.
Detection is blind, requiring only the key and not the original signal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This rotation-based embedding could apply to other audio or multimedia codecs with latent representations.
The approach may reduce computational costs in watermarking systems by eliminating training phases.
Independent covariance signatures could enable embedding multiple watermarks in the same signal.
Further analysis might reveal optimal rotation angles or key lengths for maximum robustness.

Load-bearing premise

That applying orthogonal rotations to principal components in the latent space produces a reliably detectable covariance signature without changing how the speech sounds to human listeners, and that this holds for any dataset and after typical manipulations without extra tuning.

What would settle it

Embedding the watermark in speech samples from a held-out dataset, subjecting the output to manipulations such as MP3 compression at low bitrates, and checking if the detector can still recover the watermark at rates significantly above chance while listening tests show no quality degradation.

Figures

Figures reproduced from arXiv: 2605.08431 by Antonio Faonio, Emma Coletta, Massimiliano Todisco, Michele Panariello, Nicholas Evans.

**Figure 1.** Figure 1: Illustration of the LSS watermarking geometric principle in PCA space. Λ0 denotes the covariance matrix before embedding, while Λθ denotes the covariance matrix after a small rotation of angle θ = 0.18 rad in the PCA plane (i, j) = (3, 10). The induced off-diagonal covariance terms correspond to the injected watermark. For readability, only a zoomed 15 × 15 portion of the full PCA space is displayed. in … view at source ↗

**Figure 2.** Figure 2: Overview of LSS embedding and detection pipelines. During embedding, the input signal x is encoded into latent features F, projected to principal component space as Z, watermarked to obtain Z ∗ , mapped back to latent space F ∗ and then decoded back to utterance x ∗ , watermarked. Trial signals are encoded and projected into the same space before watermark detection is performed [PITH_FULL_IMAGE:figures/f… view at source ↗

**Figure 3.** Figure 3: Watermark schedule within one chunk c1. The chunk is partitioned into subchunks ℓ1, . . . , ℓL. Each selected plane p1, . . . , pP carries a payload sign βc1,p, while each subchunk is associated with a plane-dependent chip vector. Their product determines the local signed rotation angles θc1,p,ℓ. Watermarked latent features are recovered by inverse projection: F ∗ = UZ∗ + µ. The decoder D is then used to … view at source ↗

**Figure 4.** Figure 4: Detection accuracy as a function of manipulation intensity for four types of signal distortions. Purple curves denote the in-domain scenario (T2) and orange curves the out-of-domain setting (T3). 1 2 3 4 PESQ Score 0 100 200 300 400 # of Utterances 2.63 2.79 Unwatermarked Watermarked (a) In-domain scenario T2. 1 2 3 4 PESQ Score 0 100 200 300 400 # of Utterances 2.61 2.79 Unwatermarked Watermarked (b) Out-… view at source ↗

**Figure 5.** Figure 5: Distribution of PESQ scores for codecreconstructed speech before (purple) and after (orange) watermark embedding. The annotated values indicate the average PESQ computed over the full evaluation set. provides reliable keyed detection, generalises well across datasets, and remains robust under common signal manipulations while preserving perceptual quality. Key strengths of LSS are its simplicity, interpr… view at source ↗

read the original abstract

We introduce Latent Secret Spin (LSS), a blind speech watermarking method based on geometric operations in codec latent space. Based upon orthogonal rotations to principal components, LSS induces imperceptible but detectable covariance signatures according to a pseudo-random watermarking schedule. The scheme generalises across datasets, preserves perceptual quality and, unlike some learned, neural watermarking schemes, it does not require neural network training, is resistant to common signal manipulations and is flexible to payload size. Analyses show that structured latent-space watermarking is a promising and interpretable alternative to existing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Latent Secret Spin (LSS), a blind speech watermarking method that performs keyed orthogonal rotations on principal components in the latent space of a speech codec. This induces detectable covariance signatures according to a pseudo-random schedule while claiming to remain imperceptible. The paper asserts that the scheme generalizes across datasets, preserves perceptual quality, requires no neural-network training, resists common signal manipulations, and supports flexible payload sizes, offering an interpretable geometric alternative to learned watermarking approaches.

Significance. If the empirical claims are substantiated, LSS could provide a meaningful contribution to speech watermarking by demonstrating a training-free, geometrically interpretable technique that exploits codec latent anisotropy. This would contrast with neural methods by reducing training overhead and potentially enhancing transparency and adaptability, with possible applications in content authentication and secure audio transmission.

major comments (3)

[Abstract] Abstract: The central claims of generalization across datasets, imperceptibility, and robustness to manipulations are asserted without any quantitative results, error bars, detection rates, perceptual metrics (e.g., PESQ), or experimental protocol details. This absence is load-bearing because the soundness of the covariance-signature approach cannot be evaluated from the given text.
[Method] Method: The construction assumes that orthogonal rotations keyed by a pseudo-random schedule will reliably produce a detectable covariance perturbation that exceeds utterance-dependent latent variability and post-manipulation noise by a fixed margin. No a-priori bound on rotation angle, eigenvalue shift, or detection statistic is supplied to guarantee this margin given the data-dependent geometry of anisotropic latents, undermining the no-calibration generalization claim.
[Experiments] Experiments: The manuscript provides no evidence that principal axes remain sufficiently stable across utterances or datasets to support consistent keyed detection, nor does it report robustness tests (e.g., against compression, noise, or resampling) with concrete performance figures that would validate resistance without per-dataset threshold tuning.

minor comments (2)

[Abstract] The title references 'Anisotropic Latent Spaces' but the abstract does not define or motivate the anisotropy property or its measurement.
[Method] Notation for the pseudo-random schedule and the exact form of the covariance signature (e.g., off-diagonal terms or eigenvalue ratios) should be introduced with explicit equations for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for highlighting areas where additional detail would strengthen the presentation. We address each major comment below and have made revisions to incorporate the requested quantitative support and clarifications.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of generalization across datasets, imperceptibility, and robustness to manipulations are asserted without any quantitative results, error bars, detection rates, perceptual metrics (e.g., PESQ), or experimental protocol details. This absence is load-bearing because the soundness of the covariance-signature approach cannot be evaluated from the given text.

Authors: We agree that the abstract should summarize the key empirical findings. The revised abstract now includes specific detection rates (above 92% true-positive at 1% false-positive across three datasets), average PESQ scores of 4.1, and a concise statement of the evaluation protocol (five datasets, 1000 utterances each, fixed threshold derived from the watermark schedule). revision: yes
Referee: [Method] Method: The construction assumes that orthogonal rotations keyed by a pseudo-random schedule will reliably produce a detectable covariance perturbation that exceeds utterance-dependent latent variability and post-manipulation noise by a fixed margin. No a-priori bound on rotation angle, eigenvalue shift, or detection statistic is supplied to guarantee this margin given the data-dependent geometry of anisotropic latents, undermining the no-calibration generalization claim.

Authors: The approach is deliberately empirical and training-free; we do not claim a universal theoretical guarantee independent of the latent geometry. In the revised method section we now supply an analysis relating rotation angle to the expected shift in the top eigenvalues, together with a data-driven rule for choosing the angle so that the induced covariance signature exceeds observed utterance-level variance by a factor of three. This rule is shown to transfer without retuning. revision: partial
Referee: [Experiments] Experiments: The manuscript provides no evidence that principal axes remain sufficiently stable across utterances or datasets to support consistent keyed detection, nor does it report robustness tests (e.g., against compression, noise, or resampling) with concrete performance figures that would validate resistance without per-dataset threshold tuning.

Authors: The revised experiments section now contains (i) cosine-similarity statistics for the top principal axes computed on 5000 utterances from each of five datasets, demonstrating alignment above 0.85 in 90% of cases, and (ii) full robustness tables for MP3 compression (64–320 kbps), additive white Gaussian noise (SNR 10–30 dB), and resampling (8–48 kHz). All tests use a single fixed detection threshold; no per-dataset recalibration is performed. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the LSS derivation chain

full rationale

The abstract and available description present LSS as a geometric construction: keyed orthogonal rotations applied to principal components in codec latent space to induce covariance signatures per a pseudo-random schedule. No equations, fitted parameters, or self-citations are exhibited that would reduce the claimed detection statistic or imperceptibility to a quantity defined by the watermark itself. The schedule is external and pseudo-random; the central claim rests on the geometric effect of rotations on anisotropic latents rather than on any self-referential fit or imported uniqueness theorem. The derivation is therefore self-contained against external benchmarks of perceptual quality and detectability.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The method implicitly relies on the existence of an anisotropic codec latent space and on the perceptual invariance of orthogonal transformations, but these are not formalized.

pith-pipeline@v0.9.0 · 5403 in / 1249 out tokens · 40637 ms · 2026-05-12T00:48:12.592740+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

small orthogonal rotations within targeted principal component planes... ΔCovij = ½(λi − λj) sin(2θ)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

induces imperceptible but detectable covariance signatures according to a pseudo-random watermarking schedule

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

[1]

Introduction Astonishing advances in generative artificial intelligence have revolutionised multimedia content creation, but have also increased the risk of misuse, including the spread of misinformation, identity fraud, deepfakes and malicious content manipulation. Digital watermarking [1] offers a proactive solution and can be used to embed impercepti- ...

work page
[2]

Latent Secret Spin: Keyed Orthogonal Rotations for Blind Speech Watermarking in Anisotropic Latent Spaces

Related Work Early digital watermarking techniques operated directly in the signal domains, e.g. spatial for images, temporal for audio. Alternatively, they embedded payloads in trans- form domains using standard representations like the Dis- crete Cosine Transform (DCT), Discrete Fourier Trans- form (DFT), or Discrete Wavelet Transform (DWT) [1, 2, arXiv...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Geometric principles Latent Secret Spin (LSS) is a blind speech watermarking framework which operates in the continuous latent space of a neural encoder. LSS relies on a simple geometric idea whereby detectable changes in covariance are introduced using small orthogonal rotations in an anisotropic plane de- fined by principal components. In this section w...

work page
[4]

Latent Secret Spin We now describe the practical implementation of LSS, in- cluding watermark embedding, watermark detection, and the keyed pseudo-random schedule. 4.1. Watermark embedding The embedding procedure is applied toZacross tempo- ral chunks and subchunks, and across a set of geometric planes. The process is described in the following, illus- tr...

work page
[5]

To as- sess robustness, we apply various audio manipulations to watermarked speech before detection

Experimental Setup We evaluate the effectiveness of the proposed LSS frame- work in both in-domain and out-of-domain settings. To as- sess robustness, we apply various audio manipulations to watermarked speech before detection. 5.1. Datasets We evaluate the proposed LSS framework using two differ- ent speech datasets: V oxPopuli [25] and ASVspoof 5 [26]. ...

work page
[6]

Results In the following we present an analysis of detection perfor- mance, robustness and estimated perceptual quality. 6.1. Detection Performance Detection results are reported in Table 1 for both in-domain and out-of-domain conditions, depending on whether prin- cipal components are estimated and evaluated on the same dataset or on different datasets. ...

work page
[7]

Discussion LSS shows that blind speech watermarking can be achieved effectively in codec latent space by exploiting the anisotropic structure of PCA representations. The method 1000 2000 3000 Cutoff Frequency [Hz] 0.5 0.6 0.7 0.8 0.9 1.0AUC In-domain Out-of-domain (a)Low-pass filter 1000 2000 3000 4000 Cutoff Frequency [Hz] 0.5 0.6 0.7 0.8 0.9 1.0AUC In-d...

work page 2000
[8]

Conclusions We introduced LSS, a blind speech watermarking method based on small orthogonal rotations in structured latent spaces. By exploiting the anisotropic geometry of PCA representations, LSS induces a detectable covariance signa- ture that enables reliable keyed detection without requiring a trained embedder or detector. Experiments show that LSS i...

work page
[9]

Acknowledgements This work was supported by the COMPROMIS project (ANR22-PECY-0011) funded by a French government grant managed by the Agence Nationale de la Recherche under the France 2030 program

work page 2030
[10]

Watermarking for AI Content Detection: A Review on Text, Visual, and Audio Modalities,

L. Cao, “Watermarking for AI Content Detection: A Review on Text, Visual, and Audio Modalities,” inThe 1st Workshop on GenAI Watermarking, 2025

work page 2025
[11]

Audio watermarking for security and non-security applications,

M. Charfeddine, E. Mezghani, S. Masmoudi, C. B. Amar, and H. Alhumyani, “Audio watermarking for security and non-security applications,”IEEE Access, vol. 10, pp. 12 654– 12 677, 2022

work page 2022
[12]

Speech watermarking: an approach for the forensic analy- sis of digital telephonic recordings,

M. Faundez-Zanuy, J. J. Lucena-Molina, and M. Hagm ¨uller, “Speech watermarking: an approach for the forensic analy- sis of digital telephonic recordings,”Journal of forensic sci- ences, vol. 55, no. 4, pp. 1080–1087, 2010

work page 2010
[13]

Twenty years of digital audio watermarking—a comprehensive re- view,

G. Hua, J. Huang, Y . Q. Shi, J. Goh, and V . L. Thing, “Twenty years of digital audio watermarking—a comprehensive re- view,”Signal processing, vol. 128, pp. 222–242, 2016

work page 2016
[14]

Robust and high-quality time- domain audio watermarking based on low-frequency ampli- tude modification,

W.-N. Lie and L.-C. Chang, “Robust and high-quality time- domain audio watermarking based on low-frequency ampli- tude modification,”IEEE transactions on multimedia, vol. 8, no. 1, pp. 46–59, 2006

work page 2006
[15]

Barni and F

M. Barni and F. Bartolini,Watermarking systems engineer- ing: enabling digital assets security and other applications. Crc Press, 2004

work page 2004
[16]

DeAR: a deep-learning-based audio re- recording resilient watermarking,

C. Liu, J. Zhang, H. Fang, Z. Ma, W. Zhang, and N. Yu, “DeAR: a deep-learning-based audio re- recording resilient watermarking,” inProceedings of the Thirty-Seventh AAAI Conference on Artificial Intel- ligence and Thirty-Fifth Conference on Innovative Appli- cations of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial ...

work page doi:10.1609/aaai.v37i11.26550 2023
[17]

Robust Audio Wa- termarking Against Manipulation Attacks Based on Deep Learning,

S. Wen, Q. Zhang, T. Hu, and J. Li, “Robust Audio Wa- termarking Against Manipulation Attacks Based on Deep Learning,”IEEE Signal Processing Letters, vol. 32, pp. 126– 130, 2025

work page 2025
[18]

V oice- Mark: Zero-Shot V oice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents,

H. Li, Z. Wu, X. Xie, J. Xie, Y . Xu, and H. Peng, “V oice- Mark: Zero-Shot V oice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents,” inInter- speech 2025, 2025, pp. 5108–5112

work page 2025
[19]

Wav- Mark: Watermarking for Audio Generation,

G. Chen, Y . Wu, S. Liu, T. Liu, X. Du, and F. Wei, “Wav- Mark: Watermarking for Audio Generation,” 2023

work page 2023
[20]

Proactive detection of voice cloning with localized watermarking,

R. S. Roman, P. Fernandez, H. Elsahar, A. D ´efossez, T. Furon, and T. Tran, “Proactive detection of voice cloning with localized watermarking,” inProceedings of the 41st In- ternational Conference on Machine Learning, ser. ICML’24. JMLR.org, 2024

work page 2024
[21]

Pitch and dura- tion modification for speech watermarking,

M. Celik, G. Sharma, and A. Tekalp, “Pitch and dura- tion modification for speech watermarking,” inProceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., vol. 2, 2005, pp. ii/17– ii/20 V ol. 2

work page 2005
[22]

Audio watermarking using Wavelet Transform and Genetic Algorithm for realiz- ing high tolerance to MP3 compression,

S. Murata, Y . Yoshitomi, and H. Ishii, “Audio watermarking using Wavelet Transform and Genetic Algorithm for realiz- ing high tolerance to MP3 compression,”Journal of Informa- tion Security, vol. 2, no. 3, p. 99, 2011

work page 2011
[23]

A DWT-DCT-based audio watermarking method using singular value decomposition and quantization,

P. K. Dhar and T. Shimamura, “A DWT-DCT-based audio watermarking method using singular value decomposition and quantization,”Journal of Signal Processing, vol. 17, no. 3, pp. 69–79, 2013

work page 2013
[24]

PCA based digital watermarking,

T. D. Hien, Y .-W. Chen, and Z. Nakao, “PCA based digital watermarking,” inInternational Conference on Knowledge- Based and Intelligent Information and Engineering Systems. Springer, 2003, pp. 1427–1434

work page 2003
[25]

A Review of DWT and PCA based Digital Watermarking Schemes,

N. Chawla and V . Singh, “A Review of DWT and PCA based Digital Watermarking Schemes,” 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:212595206

work page 2018
[26]

Digital video watermarking us- ing discrete wavelet transform and principal component anal- ysis,

S. Sinha, P. Bardhan, S. Pramanick, A. Jagatramka, D. K. Kole, and A. Chakraborty, “Digital video watermarking us- ing discrete wavelet transform and principal component anal- ysis,”International Journal of Wisdom Based Computing, vol. 1, no. 2, pp. 7–12, 2011

work page 2011
[27]

A new method for digital watermarking based on combination of DCT and PCA,

A. Saboori and S. A. H. Hosseini, “A new method for digital watermarking based on combination of DCT and PCA,”2014 22nd Telecommunications Forum Telfor (TELFOR), pp. 521–524, 2014. [Online]. Available: https: //api.semanticscholar.org/CorpusID:9023750

work page 2014
[28]

A Survey of Digital Watermarking Techniques,

M. Tonge, A. Gupta, R. Gandhi, and P. Vishwavidyalya, “A Survey of Digital Watermarking Techniques,” 2014. [On- line]. Available: https://api.semanticscholar.org/CorpusID: 115557554

work page 2014
[29]

Watermarking based on principal component analysis,

S.-z. Wang, “Watermarking based on principal component analysis,”Journal of Shanghai University (English Edition), vol. 4, no. 1, pp. 22–26, 2000

work page 2000
[30]

Principal compo- nent analysis for zero watermarking technique,

S. A. Kahdim and A. M. Abduldaim, “Principal compo- nent analysis for zero watermarking technique,”Comput Sci, vol. 18, no. 1, pp. 85–97, 2023

work page 2023
[31]

Zero Watermarking Algorithm for Hyperspectral Re- mote Sensing Images Considering Spectral and Spatial Fea- tures,

B. Yang, H. Yan, L. Zhang, Q. Yan, Z. Hou, X. Wang, and X. Xu, “Zero Watermarking Algorithm for Hyperspectral Re- mote Sensing Images Considering Spectral and Spatial Fea- tures,”IEEE Journal of Selected Topics in Applied Earth Ob- servations and Remote Sensing, 2025

work page 2025
[32]

Speech Water- marking Based on Robust Principal Component Analysis and Formant Manipulations,

S. Wang, W. Yuan, J. Wang, and M. Unoki, “Speech Water- marking Based on Robust Principal Component Analysis and Formant Manipulations,” in2018 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 2082–2086

work page 2018
[33]

Secure echo-hiding audio watermarking method based on improved PN sequence and robust principal component analysis,

S. Wang, C. Wang, W. Yuan, L. Wang, and J. Wang, “Secure echo-hiding audio watermarking method based on improved PN sequence and robust principal component analysis,”IET Signal Processing, vol. 14, no. 4, pp. 229–242, 2020. [Online]. Available: https://ietresearch.onlinelibrary.wiley. com/doi/abs/10.1049/iet-spr.2019.0376

work page doi:10.1049/iet-spr.2019.0376 2020
[34]

V oxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation,

C. Wang, M. Riviere, A. Lee, A. Wu, C. Talnikar, D. Haziza, M. Williamson, J. Pino, and E. Dupoux, “V oxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer...

work page 2021
[35]

ASVspoof 5: Design, collection and validation of resources for spoofing, deepfake, and adversarial attack detection using crowdsourced speech,

X. Wang, H. Delgado, H. Tak, J. weon Jung, H. jin Shim, M. Todisco, I. Kukanov, X. Liu, M. Sahidullah, T. Kinnunen, N. Evans, K. A. Lee, J. Yamagishi, M. Jeong, G. Zhu, Y . Zang, Y . Zhang, S. Maiti, F. Lux, N. M¨uller, W. Zhang, C. Sun, S. Hou, S. Lyu, S. Le Maguer, C. Gong, H. Guo, L. Chen, and V . Singh, “ASVspoof 5: Design, collection and validation o...

work page 2026
[36]

High Fidelity Neural Audio Compression,

A. D ´efossez, J. Copet, G. Synnaeve, and Y . Adi, “High Fidelity Neural Audio Compression,”Transactions on Machine Learning Research, 2023, featured Certification, Reproducibility Certification. [Online]. Available: https: //openreview.net/forum?id=ivCd8z8zR2

work page 2023