pith. machine review for the scientific record. sign in

arxiv: 2602.17711 · v3 · submitted 2026-02-14 · 💻 cs.SD · eess.AS

Recognition: 2 theorem links

· Lean Theorem

Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical Performance

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:02 UTC · model grok-4.3

classification 💻 cs.SD eess.AS
keywords audio anti-spoofingmulti-branch networksinterpretabilityTreeSHAPspectral signaturesASVspoof 2019operational archetypesbranch attribution
0
0 comments X

The pith

Multi-branch anti-spoofing models operate via four archetypes, with flawed specialization on incorrect branches causing severe error spikes on specific attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a component-level interpretation method for networks like AASIST3 that process audio for spoof detection. Intermediate activations from fourteen branches are summarized by the leading eigenvalues of their covariance operators to form spectral signatures. A CatBoost meta-classifier trained on these signatures supplies TreeSHAP attributions that are turned into normalized contribution shares and per-attack confidence scores. Applied to thirteen attacks in the ASVspoof 2019 set, the scores separate four operational modes and show that high in a wrong branch produces markedly worse equal-error rates. The resulting link between internal strategy and measured reliability identifies structural dependencies that ordinary accuracy numbers miss.

Core claim

By modeling branch activations with covariance operators whose leading eigenvalues serve as spectral signatures, training a CatBoost meta-classifier on those signatures, and extracting TreeSHAP attributions, the analysis quantifies each branch's contribution share and produces a scalar of Cb; these quantities classify the network's behavior on each attack into one of four archetypes, directly tying the chosen archetype to the observed equal error rate.

What carries the argument

Leading eigenvalues of covariance operators on the fourteen branch activations, used as low-dimensional spectral signatures that a CatBoost meta-classifier attributes via TreeSHAP to obtain normalized contribution shares and scores.

Load-bearing premise

The TreeSHAP attributions extracted from the CatBoost meta-classifier on the spectral signatures correctly quantify the branches' true operational contributions without artifacts introduced by the meta-model itself.

What would settle it

Ablating the branches that receive the highest contribution shares on attacks A17 and A18 and finding that the model's equal error rate does not rise as predicted by the reported scores would show that the attributions do not reflect actual decision strategy.

Figures

Figures reproduced from arXiv: 2602.17711 by Grach Mkrtchian, Ivan Viakhirev, Kirill Borodin, Mikhail Gorodnichev.

Figure 1
Figure 1. Figure 1: provides a high-level visual summary of our three-phase analysis framework. In Phase 1, raw intermediate activations from the AASIST3 encoder are compressed into robust spectral signatures (λ1, · · · , λ10) that capture the principal variations of the feature space. These signatures serve as the input for Phase 2, where a gradient-boosted deci￾sion tree (CatBoost) learns to map spectral patterns to specifi… view at source ↗
Figure 2
Figure 2. Figure 2: Identification of the saturation point for the number of retained eigenvalues Neig [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quality–cost trade-off between F1-Macro score and memory consumption across different eigenvalue counts [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance retention and memory savings as a function of the number of retained eigenvalues [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: presents the results of the penalty function ablation study. The heatmap visual￾izes the dominant branch identified for each attack type (A07–A19) under the four different penalty regimes. The consistency of the color coding across each row confirms that the core strategic findings are largely invariant to the specific form of the penalty. For instance, Attacks A09, A14, and A10 consistently identify Branc… view at source ↗
Figure 6
Figure 6. Figure 6: visualizes the operational strategies of the AASIST3 model by mapping each attack into a 2D space defined by the EER and the contribution share of the dominant branch. This visualization provides the empirical rationale for our categorization thresholds: the 1% EER boundary (green dotted line) cleanly separates reliable performance from failure modes, while the 20% contribution threshold (vertical dashed l… view at source ↗
Figure 7
Figure 7. Figure 7: SHAP Distribution for Attack A07. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. A07 Decision Plot A08 Decision Plot [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparative Decision Plots for Attacks A07 (Left) and A08 (Right). 3.6.3. Attacks A11 and A12 (Spectral vs. Mixed) For Attack A11, the model employs Effective Specialization driven by GAT-S (21.9%) and Branch B0 (20.4%), achieving a Confidence Score of 0.76. Notably, [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: SHAP Distribution for Attack A08. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: SHAP Distribution for Attack A09. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. https://doi.org/10.3390/math14020381 [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: SHAP Distribution for Attack A10. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. A09 Decision Plot A10 Decision Plot [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparative Decision Plots for Attacks A09 (Left) and A10 (Right). 3.6.4. Attacks A13 and A14 (Borderline vs. Distinct) Attack A13 represents a borderline case of Ineffective Consensus ( [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: SHAP Distribution for Attack A11. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: SHAP Distribution for Attack A12. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. https://doi.org/10.3390/math14020381 [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Comparative Decision Plots for Attacks A11 (Left) and A12 (Right) [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: SHAP Distribution for Attack A13. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. 3.6.5. Attacks A15 and A16 (Weak vs. Robust Consensus) Attack A15 reveals Ineffective Consensus similar to A08, with diffuse SHAP distribu￾tions ( [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: SHAP Distribution for Attack A14. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. A13 Decision Plot A14 Decision Plot [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Comparative Decision Plots for Attacks A13 (Left) and A14 (Right). 3.6.6. Attacks A17 and A18 (Vulnerability and Failure) These attacks expose the Flawed Specialization vulnerability. For Attack A17 (Fig￾ure 22, the model confidently relies on Branch B1 (20.9% share), yet the high EER (14.26%) indicates this confidence is misplaced. The failure is most high-error for Attack A18, where the model delegates … view at source ↗
Figure 19
Figure 19. Figure 19: SHAP Distribution for Attack A15. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively [PITH_FULL_IMAGE:figures/full_fig_p020_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: SHAP Distribution for Attack A16. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. https://doi.org/10.3390/math14020381 [PITH_FULL_IMAGE:figures/full_fig_p020_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Comparative Decision Plots for Attacks A15 (Left) and A16 (Right) [PITH_FULL_IMAGE:figures/full_fig_p021_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: SHAP Distribution for Attack A17. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. 3.6.7. Attack A19 (Global Spectral Detection) Attack A19 is resolved via Effective Specialization led by the global GAT-S module (19.5% share). The decision trajectories presented on [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
Figure 23
Figure 23. Figure 23: Comparative Decision Plots for Attacks A17 (Left) and A18 (Right) [PITH_FULL_IMAGE:figures/full_fig_p022_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: SHAP Distribution for Attack A18. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. 3.7. Detailed Statistical Data For completeness, this subsection provides the detailed numerical data used for the analysis [PITH_FULL_IMAGE:figures/full_fig_p022_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Decision Plot for Attack A19 [PITH_FULL_IMAGE:figures/full_fig_p023_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: SHAP Distribution for Attack A19. Red and blue distributions represent SHAP values for attack (positive, spoof-indicating) and bonafide (negative, genuine-indicating) samples, respectively. 3.8. Single-Branch Retention Ablation To complement the descriptive SHAP-based strategy analysis, a small functional ablation was conducted to test whether the dominant branch identified by the attribution pipeline is … view at source ↗
read the original abstract

Multi-branch deep neural networks like AASIST3 achieve state-of-the-art comparable performance in audio anti-spoofing, yet their internal decision dynamics remain opaque compared to traditional input-level saliency methods. While existing interpretability efforts largely focus on visualizing input artifacts, the way individual architectural branches cooperate or compete under different spoofing attacks is not well characterized. This paper develops a framework for interpreting AASIST3 at the component level. Intermediate activations from fourteen branches and global attention modules are modeled with covariance operators whose leading eigenvalues form low-dimensional spectral signatures. These signatures train a CatBoost meta-classifier to generate TreeSHAP-based branch attributions, which we convert into normalized contribution shares and confidence scores (Cb) to quantify the model's operational strategy. By analyzing 13 spoofing attacks from the ASVspoof 2019 benchmark, we identify four operational archetypes-ranging from Effective Specialization (e.g., A09, Equal Error Rate (EER) 0.04%, C=1.56) to Ineffective Consensus (e.g., A08, EER 3.14%, C=0.33). Crucially, our analysis exposes a Flawed Specialization mode where the model places high confidence in an incorrect branch, leading to severe performance degradation for attacks A17 and A18 (EER 14.26% and 28.63%, respectively). These quantitative findings link internal architectural strategy directly to empirical reliability, highlighting specific structural dependencies that standard performance metrics overlook.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an interpretability framework for the AASIST3 multi-branch anti-spoofing model. It models intermediate activations from fourteen branches and global attention modules via covariance operators, extracts leading eigenvalues as low-dimensional spectral signatures, trains a CatBoost meta-classifier on these signatures, and derives TreeSHAP-based branch attributions that are converted into normalized contribution shares and per-attack confidence scores Cb. Analysis of 13 attacks from ASVspoof 2019 identifies four operational archetypes (Effective Specialization, Ineffective Consensus, and others) and highlights a Flawed Specialization mode for attacks A17 and A18, where high confidence is placed on an incorrect branch, correlating with elevated EER values (14.26% and 28.63%).

Significance. If the TreeSHAP attributions are shown to reflect causal branch usage inside AASIST3 rather than meta-classifier artifacts, the work would offer a concrete method for linking internal architectural strategies to empirical reliability metrics such as EER, which is valuable for diagnosing failure modes in multi-branch audio anti-spoofing systems and could inform targeted architectural refinements.

major comments (2)
  1. [Interpretability framework (abstract pipeline description)] The core claim that TreeSHAP attributions from the CatBoost meta-classifier quantify AASIST3's true operational strategy (including high confidence in an incorrect branch for A17/A18) rests on an untested equivalence between meta-model feature importance and internal branch computation. No validation, ablation, or causal check is supplied to rule out the possibility that attributions merely separate spectral signatures by attack label.
  2. [Archetype identification and A17/A18 analysis] The Flawed Specialization archetype for A17 and A18 (EER 14.26% and 28.63%) is load-bearing for the paper's central contribution, yet the abstract supplies no error analysis, robustness checks against post-hoc fitting, or comparison to direct branch-activation interventions that would substantiate the attribution of 'incorrect branch' usage.
minor comments (2)
  1. [Abstract] The abstract states that signatures 'train a CatBoost meta-classifier' but provides no details on training protocol, hyperparameter selection, or cross-validation, which would be needed to assess whether the reported Cb scores are stable.
  2. [Archetype definitions] The four archetypes are introduced with example EER and C values but without explicit quantitative thresholds or decision rules used to assign attacks to each archetype, reducing reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our interpretability framework. We address each major comment below, clarifying the methodological basis for our claims while noting where additional material will be incorporated in revision.

read point-by-point responses
  1. Referee: [Interpretability framework (abstract pipeline description)] The core claim that TreeSHAP attributions from the CatBoost meta-classifier quantify AASIST3's true operational strategy (including high confidence in an incorrect branch for A17/A18) rests on an untested equivalence between meta-model feature importance and internal branch computation. No validation, ablation, or causal check is supplied to rule out the possibility that attributions merely separate spectral signatures by attack label.

    Authors: The spectral signatures are obtained directly from the leading eigenvalues of covariance operators applied to the intermediate activations of each of the fourteen branches and global attention modules. These signatures therefore encode the actual computational output of the branches rather than external labels. The CatBoost meta-classifier is trained solely on these signatures to recover the attack identity, and TreeSHAP values measure the marginal contribution of each branch signature to that recovery. This construction ties the attributions to internal branch behavior. We agree that explicit ablation (e.g., zeroing individual branch signatures and measuring meta-classifier accuracy drop) and sensitivity checks would further rule out label-separation artifacts; such experiments will be added to the revised manuscript. revision: partial

  2. Referee: [Archetype identification and A17/A18 analysis] The Flawed Specialization archetype for A17 and A18 (EER 14.26% and 28.63%) is load-bearing for the paper's central contribution, yet the abstract supplies no error analysis, robustness checks against post-hoc fitting, or comparison to direct branch-activation interventions that would substantiate the attribution of 'incorrect branch' usage.

    Authors: Section 4.3 of the full manuscript already correlates the derived per-attack confidence scores Cb with observed EER across all 13 attacks, showing that the Flawed Specialization pattern for A17 and A18 coincides with the highest error rates. Robustness to the number of retained eigenvalues is examined via supplementary figures. Direct branch-activation interventions would require architectural modifications or retraining of AASIST3 and therefore lie outside the post-hoc scope of the present study; we will, however, add a brief discussion of this limitation and a suggested protocol for future causal verification. revision: partial

Circularity Check

0 steps flagged

No circularity: meta-classifier attributions provide independent interpretive layer

full rationale

The paper computes covariance-based spectral signatures directly from AASIST3 branch activations, trains an external CatBoost model on those signatures (to classify attacks or decisions), and applies TreeSHAP to obtain feature attributions that are then normalized into contribution shares and confidence scores. This chain does not reduce any claimed archetype or strategy to the input activations by construction, nor does it rely on self-citation, imported uniqueness theorems, or renaming of known results. The meta-classifier is a distinct model whose decision boundaries and SHAP values constitute an additional analytical step rather than a tautological re-expression of the original network. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the untested premise that covariance structure of branch activations encodes decision strategy; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Covariance operators on intermediate branch activations capture the model's operational decision dynamics
    Invoked when forming spectral signatures from leading eigenvalues to train the meta-classifier.

pith-pipeline@v0.9.0 · 5586 in / 1256 out tokens · 24777 ms · 2026-05-15T22:02:09.181445+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 3 internal anchors

  1. [1]

    ASVspoof 2021: Accelerating progress in spoofed and deepfake speech detection

    Yamagishi, J.; Wang, X.; Todisco, M.; Sahidullah, M.; Patino, J.; Nautsch, A.; Liu, X.; Lee, K.A.; Kinnunen, T.; Evans, N.; et al. ASVspoof 2021: Accelerating progress in spoofed and deepfake speech detection. In Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, Online, 16 September 2021; pp. 47–...

  2. [3]

    ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.Comput

    Wang, X.; Yamagishi, J.; Todisco, M.; Delgado, H.; Nautsch, A.; Evans, N.; Sahidullah, M.; Vestman, V .; Kinnunen, T.; Lee, K.A.; et al. ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech.Comput. Speech Lang.2020, 64, 101114. https://doi.org/https://doi.org/10.1016/j.csl.2020.101114

  3. [4]

    Capsule-based and TCN-based Approaches for Spoofing Detection in Voice Biometry.Eng

    Borodin, K.; Kudryavtsev, V .; Mkrtchian, G.; Gorodnichev, M. Capsule-based and TCN-based Approaches for Spoofing Detection in Voice Biometry.Eng. Technol. Appl. Sci. Res.2024,14, 18409–18414. https://doi.org/10.48084/etasr.8906

  4. [5]

    AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge.arXiv 2024, arXiv:2408.17352

    Borodin, K.; Kudryavtsev, V .; Korzh, D.; Efimenko, A.; Mkrtchian, G.; Gorodnichev, M.; Rogov, O.Y. AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge.arXiv 2024, arXiv:2408.17352

  5. [6]

    t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification

    Kinnunen, T.; Lee, K.A.; Delgado, H.; Evans, N.; Todisco, M.; Sahidullah, M.; Yamagishi, J.; Reynolds, D.A. t-DCF: A Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification.arXiv2019, arXiv:1804.09618

  6. [7]

    Axiomatic attribution for deep networks

    Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, ICML’17, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 3319–3328

  7. [8]

    AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks

    Jung, J.W.; Heo, H.S.; Tak, H.; Shim, H.J.; Chung, J.S.; Lee, B.J.; Yu, H.J.; Evans, N. AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 6367–6371. https://doi.org/1...

  8. [9]

    The spectrum of covariance matrices of randomly connected recurrent neuronal networks with linear dynamics.PLoS Comput

    Hu, Y.; Sompolinsky, H. The spectrum of covariance matrices of randomly connected recurrent neuronal networks with linear dynamics.PLoS Comput. Biol.2022,18, e1010327. https://doi.org/10.1371/journal.pcbi.1010327

  9. [10]

    coVariance Neural Networks.arXiv2023, arXiv:2205.15856

    Sihag, S.; Mateos, G.; McMillan, C.; Ribeiro, A. coVariance Neural Networks.arXiv2023, arXiv:2205.15856

  10. [11]

    Hallucination Detection in LLMs Using Spectral Features of Attention Maps.arXiv2025, arXiv:2502.17598

    Binkowski, J.; Janiak, D.; Sawczyn, A.; Gabrys, B.; Kajdanowicz, T. Hallucination Detection in LLMs Using Spectral Features of Attention Maps.arXiv2025, arXiv:2502.17598

  11. [12]

    Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks

    Harzli, O.E.; Grau, B.C. Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks. 2025. In Proceedings of the NeurIPS 2024 Workshop on Mathematics of Modern Machine Learning (M3L), Vancouver, BC, Canada, 14 December 2024

  12. [13]

    A Unified Approach to Interpreting Model Predictions

    Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions.arXiv2017, arXiv:1705.07874

  13. [14]

    Explaining deep learning models for spoofing and deepfake detection with SHapley Additive exPlanations.arXiv2024, arXiv:2110.03309

    Ge, W.; Patino, J.; Todisco, M.; Evans, N. Explaining deep learning models for spoofing and deepfake detection with SHapley Additive exPlanations.arXiv2024, arXiv:2110.03309

  14. [15]

    An explainable deepfake of speech detection method with spectrograms and waveforms

    Yu, N.; Chen, L.; Leng, T.; Chen, Z.; Yi, X. An explainable deepfake of speech detection method with spectrograms and waveforms. J. Inf. Secur. Appl.2024,81, 103720. https://doi.org/10.1016/j.jisa.2024.103720

  15. [16]

    A Survey on Speech Deepfake Detection.arXiv2025, arXiv:2404.13914

    Li, M.; Ahmadiadli, Y.; Zhang, X.P . A Survey on Speech Deepfake Detection.arXiv2025, arXiv:2404.13914

  16. [17]

    A Probabilistic Re-Interpretation of Confidence Scores in Multi-Exit Models.Entropy2021, 24, 1

    Pomponi, J.; Scardapane, S.; Uncini, A. A Probabilistic Re-Interpretation of Confidence Scores in Multi-Exit Models.Entropy2021, 24, 1. https://doi.org/10.3390/e24010001

  17. [18]

    Measuring Ensemble Diversity and Its Effects on Model Robustness

    Heidemann, L.; Schwaiger, A.; Roscher, K. Measuring Ensemble Diversity and Its Effects on Model Robustness. In Proceedings of the 1st International Workshop on Artificial Intelligence Safety (SafeAI 2021) Co-Located with AAAI 2021, CEUR-WS, Virtually, 8 February 2021; Volume 2916, pp. 65–73

  18. [19]

    Explainable AI for Trees: From Local Explanations to Global Understanding

    Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. Explainable AI for Trees: From Local Explanations to Global Understanding.arXiv2019, arXiv:1905.04610

  19. [20]

    Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic.Sci

    Hajjouz, A.; Avksentieva, E. Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic.Sci. Tech. J. Inf. Technol. Mech. Opt.2025,25, 114–127. https://doi.org/10.17586/2226-1494-2025-25-1 -114-127

  20. [21]

    Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions

    Shen, J.; Pang, R.; Weiss, R.J.; Schuster, M.; Jaitly, N.; Yang, Z.; Chen, Z.; Zhang, Y.; Wang, Y.; Skerrv-Ryan, R.; et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 47...

  21. [22]

    Vocos: Closing the Gap Between Time-Domain and Fourier-Based Neural Vocoders for High-Quality Audio Synthesis

    Siuzdak, H. Vocos: Closing the Gap Between Time-Domain and Fourier-Based Neural Vocoders for High-Quality Audio Synthesis. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024

  22. [23]

    Upsampling Artifacts in Neural Audio Synthesis

    Pons, J.; Pascual, S.; Cengarle, G.; Serrà, J. Upsampling Artifacts in Neural Audio Synthesis. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 6–11 June 2021; pp. 3005–3009. https://doi.org/10.1109/ICASSP39728.2021.9414913

  23. [24]

    End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection

    Tak, H.; weon Jung, J.; Patino, J.; Kamble, M.; Todisco, M.; Evans, N. End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection. In Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, Online, 16 September 2021; pp. 1–8. https://doi.or...

  24. [25]

    A multi-slice attention fusion and multi-view personalized fusion lightweight network for Alzheimer’s disease diagnosis.BMC Med

    Zhang, Q.; Long, Y.; Cai, H.; Yu, S.; Shi, Y.; Tan, X. A multi-slice attention fusion and multi-view personalized fusion lightweight network for Alzheimer’s disease diagnosis.BMC Med. Imaging2024,24, 258. https://doi.org/10.1186/s12880-024-01429-8

  25. [26]

    Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers.arXiv2024, arXiv:2409.03621

    Ben-Artzy, A.; Schwartz, R. Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers.arXiv2024, arXiv:2409.03621

  26. [27]

    Upsampling Artifacts in Neural Audio Synthesis

    Tak, H.; Patino, J.; Todisco, M.; Nautsch, A.; Evans, N.; Larcher, A. End-to-End anti-spoofing with RawNet2. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 6–11 June 2021; pp. 6369–6373. https://doi.org/10.1109/ICASSP39728.2021.9414234

  27. [28]

    A Conformer-Based Classifier for Variable-Length Utterance Processing in Anti-Spoofing

    Rosello, V .; Evans, N. A Conformer-Based Classifier for Variable-Length Utterance Processing in Anti-Spoofing. In Proceedings of the INTERSPEECH, Dublin, Ireland, 20–24 August 2023; pp. 3632–3636. https://doi.org/10.21437/Interspeech.2023-1627

  28. [29]

    ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale

    Wang, X.; Delgado, H.; Tak, H.; Jung, J.-W.; Shim, H.-J.; Todisco, M.; Kukanov, I.; Liu, X.; Sahidullah, M.; Kinnunen, T.H.; et al. ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale. In Proceedings of the Automatic Speaker Verification Spoofing Countermeasures Workshop (ASVspoof 2024), Kos, Greece, 31 August 2024; pp. 1–8. ...

  29. [30]

    Attention-based Mixture of Experts for Robust Speech Deepfake Detection

    D’Alterio, G.; Neghina, M.; Bestagini, P .; Tubaro, S. Attention-based Mixture of Experts for Robust Speech Deepfake Detection. In Proceedings of the IEEE International Workshop on Information Forensics and Security (WIFS), Perth, WA, Australia, 1–4 December 2025

  30. [31]

    Multi-level SSL Feature Gating for Audio Deepfake Detection

    Tran, H.M.; Amsaleg, L.; Ducq, E. Multi-level SSL Feature Gating for Audio Deepfake Detection. InProceedings of the ACM International Conference on Multimedia Retrieval (ICMR), Chicago, IL, USA, 30 June–3 July 2025; ACM: New York, NY, USA, 2025. Disclaimer/Publisher’s Note:The statements, opinions and data contained in all publications are solely those of...