pith. machine review for the scientific record. sign in

arxiv: 2604.02937 · v1 · submitted 2026-04-03 · 💻 cs.SD

Recognition: no theorem link

If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:50 UTC · model grok-4.3

classification 💻 cs.SD
keywords transferability analysisaudio classificationdeepfake detectionminimal sufficient signalsmodel differencesinformation processingmusic genre classification
0
0 comments X

The pith

Transferability analysis of minimal sufficient audio signals reveals information-theoretic differences between models that accuracy metrics miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces transferability analysis to compare how audio classification models process information. It identifies a minimal sufficient signal for a classification in one model and tests whether that same signal produces the same classification in other models. Across tasks like music genre classification, emotion recognition, and deepfake detection, transfer rates vary significantly, with music genre signals transferring successfully only about 26 percent of the time. In deepfake detection, certain models exhibit distinct transferability patterns, labeled flat-earther models, highlighting differences not visible through standard accuracy or precision measures.

Core claim

Given a minimal sufficient signal for a classification on model f, transferability analysis determines whether other models accept this signal as having the same classification. Applied to three tasks, it shows varying transferability rates and identifies models with unique behaviors in deepfake detection that indicate underlying information processing differences.

What carries the argument

Transferability analysis, which checks if a minimal sufficient signal from one model is accepted by others for the same classification.

If this is right

  • Music genre sufficient signals transfer successfully in approximately 26% of cases.
  • Emotion recognition and deepfake detection show higher variance in transferability rates.
  • Some deepfake detection models, called flat-earther models, display different transferability behavior.
  • Transferability analysis uncovers information theoretic differences between models not captured by accuracy and precision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Models with low transferability might rely on different acoustic features than those with high transferability.
  • Transferability could guide selection of models for applications where robustness to signal variations matters.
  • Further analysis might reveal specific audio features that cause non-transfer in certain models.

Load-bearing premise

A minimal sufficient signal for one model's classification can be reliably identified and its transfer to other models reflects meaningful differences in information processing.

What would settle it

An experiment showing that all models accept the same minimal sufficient signals at similar rates would challenge the claim that transferability reveals unique information differences.

Figures

Figures reproduced from arXiv: 2604.02937 by David A. Kelly, Hana Chockler.

Figure 1
Figure 1. Figure 1: The partitioning of ‘disgust’ (a) into sufficient signal (b), sufficient and necessary (complete) signal (c) and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Graphical representation of an audio depth- [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Power spectral density of real and fake data on ASVSpoof2019 (top 1 2 rows) and ITW (bottom two rows). The different between SP1 and the other models in clear. real fake 5 10 Entropy (bits) SP1 real fake 0 5 10 SP2 real fake 0 5 10 SP3 real fake 5 10 Entropy (bits) real fake 5 10 real fake 5 10 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Spectral entropy across real and fake sufficiencies on ASVSpoof2019 (top) and ITW (bottom) and [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

In order to gain fresh insights about the information processing characteristics of different audio classification models, we propose transferability analysis. Given a minimal, sufficient signal for a classification on a model $f$, transferability analysis asks whether other models accept this minimal signal as having the same classification as it did on $f$. We define what it means for a sufficient signal to be transferable and perform a large study over $3$ different classification tasks: music genre, emotion recognition and deepfake detection. We find that transferability rates vary depending on the task, with sufficient signals for music genre being transferable $\approx26\%$ of the time. The other tasks reveal much higher variance in transferability and reveal that some models, in particular on deepfake detection, have different transferability behavior. We call these models `flat-earther' models. We investigate deepfake audio in more depth, and show that transferability analysis also allows to us to discover information theoretic differences between the models which are not captured by the more familiar metrics of accuracy and precision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes transferability analysis as a method to probe information-processing differences among audio classification models. Given a minimal sufficient signal for a classification decision on model f, the approach tests whether other models assign the same label to that signal. Experiments across music genre classification, emotion recognition, and deepfake detection report task-dependent transfer rates (approximately 26% for music genre) and identify a subset of deepfake models, termed 'flat-earther' models, whose transfer behavior deviates from the others. The authors argue that these transfer patterns expose information-theoretic distinctions not visible in accuracy or precision metrics.

Significance. If the empirical findings are reproducible, the work supplies a practical diagnostic for comparing audio models that goes beyond scalar performance numbers. The deepfake results in particular suggest that transferability can surface model-specific sensitivities to minimal cues, which may inform robustness evaluation and model selection in security-sensitive audio tasks.

major comments (3)
  1. [Abstract] Abstract and the description of the method: the central claim that transferability rates reveal information-theoretic differences rests on the reliable extraction of minimal sufficient signals, yet no concrete procedure, optimization criterion, or verification step for identifying these signals is supplied. Without this, the reported 26% transfer rate and the distinction drawn for flat-earther models cannot be assessed for robustness or reproducibility.
  2. [Deepfake experiments] Deepfake detection results: the assertion that transferability uncovers distinctions missed by accuracy and precision requires an explicit comparison (e.g., correlation analysis or controlled ablation) between the two families of metrics; the current presentation leaves open whether the observed variance is an artifact of the signal-minimization process rather than a genuine information-theoretic signal.
  3. [Method definition] Definition of transferability: the paper states that it 'defines what it means for a sufficient signal to be transferable,' but the operational test (threshold on model output, agreement metric, or statistical test) is not specified. This definition is load-bearing for all quantitative claims and must be stated formally before the empirical rates can be interpreted.
minor comments (1)
  1. [Abstract] Abstract contains the grammatical error 'allows to us to discover'; correct to 'allows us to discover'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for improving clarity and reproducibility. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses
  1. Referee: [Abstract] Abstract and the description of the method: the central claim that transferability rates reveal information-theoretic differences rests on the reliable extraction of minimal sufficient signals, yet no concrete procedure, optimization criterion, or verification step for identifying these signals is supplied. Without this, the reported 26% transfer rate and the distinction drawn for flat-earther models cannot be assessed for robustness or reproducibility.

    Authors: We agree that the current manuscript lacks sufficient detail on the extraction of minimal sufficient signals. In the revised version, we will add a dedicated subsection describing the concrete optimization procedure (including the loss function and constraints used to minimize the signal while preserving the classification decision on model f), the specific algorithm employed, and verification steps such as ablation checks and sensitivity analysis to confirm minimality and sufficiency. This will enable readers to reproduce and assess the robustness of the reported transfer rates and model distinctions. revision: yes

  2. Referee: [Deepfake experiments] Deepfake detection results: the assertion that transferability uncovers distinctions missed by accuracy and precision requires an explicit comparison (e.g., correlation analysis or controlled ablation) between the two families of metrics; the current presentation leaves open whether the observed variance is an artifact of the signal-minimization process rather than a genuine information-theoretic signal.

    Authors: We will strengthen this section by adding an explicit comparison between transferability patterns and standard metrics. Specifically, we will include a correlation analysis across models between transfer rates and accuracy/precision values, along with a controlled ablation that varies the signal-minimization parameters while tracking both metric families. This will demonstrate that the observed distinctions for flat-earther models are not artifacts of the minimization process. revision: yes

  3. Referee: [Method definition] Definition of transferability: the paper states that it 'defines what it means for a sufficient signal to be transferable,' but the operational test (threshold on model output, agreement metric, or statistical test) is not specified. This definition is load-bearing for all quantitative claims and must be stated formally before the empirical rates can be interpreted.

    Authors: We acknowledge that the operational definition requires formalization. In the revision, we will state the definition explicitly, including the precise threshold applied to model output probabilities for label agreement, the agreement metric (e.g., exact label match or probabilistic divergence), and any statistical tests used to determine transferability. This formalization will be placed early in the methods section to support all subsequent quantitative claims. revision: yes

Circularity Check

0 steps flagged

Empirical study with no circular derivations

full rationale

The paper defines transferability analysis upfront as an empirical procedure: given a minimal sufficient signal for model f, check whether other models accept the same classification. Results are obtained from a large-scale study across three tasks (music genre, emotion recognition, deepfake detection), reporting observed transfer rates and identifying 'flat-earther' models in deepfake detection. No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or self-citation chains. The information-theoretic differences are presented as direct observations from varying transferability behavior, not derived from prior work by the same authors or renamed known results. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that minimal sufficient signals exist and can be extracted for model classifications; no free parameters or invented entities with independent evidence are described.

axioms (1)
  • domain assumption Minimal sufficient signals can be identified for a given model's classification decision
    This underpins the definition of transferable signals in the proposed analysis.
invented entities (1)
  • flat-earther models no independent evidence
    purpose: To categorize models exhibiting atypical transferability behavior in deepfake detection
    New descriptive term introduced based on observed experimental patterns.

pith-pipeline@v0.9.0 · 5488 in / 1263 out tokens · 73342 ms · 2026-05-13T18:50:31.263175+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 1 internal anchor

  1. [1]

    Lerch,An introduction to audio content analysis: Music Information Retrieval tasks and applications

    A. Lerch,An introduction to audio content analysis: Music Information Retrieval tasks and applications. John Wiley & Sons, 2022

  2. [2]

    A simple method to determine if a music information retrieval system is a “horse

    B. L. Sturm, “A simple method to determine if a music information retrieval system is a “horse”, ”IEEE Transactions on Multimedia, vol. 16, no. 6, pp. 1636–1644, 2014

  3. [4]

    Causal identification of sufficient, necessary and complete explanations in image classification,

    ——, “Causal identification of sufficient, necessary and complete explanations in image classification, ”arXiv preprint arXiv:2507.23497, 2025

  4. [5]

    I am big, you are little; i am right, you are wrong,

    D. A. Kelly, A. Chanchal, and N. Blake, “I am big, you are little; i am right, you are wrong, ” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 817–826

  5. [6]

    J. Y. Halpern,Actual Causality. The MIT Press, 2019

  6. [7]

    I guess that’s why they call it the blues: Causal analysis for audio classifiers,

    D. A. Kelly and H. Chockler, “I guess that’s why they call it the blues: Causal analysis for audio classifiers, ”arXiv preprint arXiv:2601.16675, 2026

  7. [8]

    Pearl,Causality: Models, Reasoning and Inference, 2nd ed

    J. Pearl,Causality: Models, Reasoning and Inference, 2nd ed. USA: Cambridge University Press, 2009

  8. [9]

    Abduction-based explanations for machine learning models,

    A. Ignatiev, N. Narodytska, and J. Marques-Silva, “Abduction-based explanations for machine learning models, ” inThe Thirty-Third AAAI Conference on Artificial Intelligence, AAAI. AAAI Press, 2019, pp. 1511–1519

  9. [10]

    Multiple different explanations for image classifiers,

    H. Chockler, D. A. Kelly, and D. Kroening, “Multiple different explanations for image classifiers, ” inECAI European Conference on Artificial Intelligence, 2025

  10. [11]

    Activation-deactivation: A general framework for robust post-hoc explainable ai,

    A. Chanchal, D. A. Kelly, and H. Chockler, “Activation-deactivation: A general framework for robust post-hoc explainable ai, ”arXiv preprint arXiv:2510.01038, 2025

  11. [12]

    Oppenheim and R

    A. Oppenheim and R. Schafer,Discrete-Time Signal Processing. Pearson Deutschland, 2013. [Online]. Available: https://elibrary.pearson.de/book/99. 150005/9781292038155

  12. [13]

    Causal explanations for image classifiers,

    H. Chockler, D. A. Kelly, D. Kroening, and Y. Sun, “Causal explanations for image classifiers, ”Journal of Artificial Intelligence Research, 2026

  13. [14]

    The ryerson audio-visual database of emotional speech and song (ravdess),

    S. R. Livingstone and F. A. Russo, “The ryerson audio-visual database of emotional speech and song (ravdess), ” Apr. 2018. [Online]. Available: https://doi.org/10.5281/zenodo.1188976

  14. [15]

    Fusion and W

    C. Fusion and W. Cukierski, “Gtzan, ” https://www.kaggle.com/datasets/ andradaolteanu/gtzan-dataset-music-genre-classification, 2011, kaggle

  15. [16]

    The gtzan dataset: Its contents, its faults, their effects on evaluation, and its future use,

    B. L. Sturm, “The gtzan dataset: Its contents, its faults, their effects on evaluation, and its future use, ”arXiv preprint arXiv:1306.1461, 2013

  16. [17]

    Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,

    X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidul- lah, V. Vestman, T. Kinnunen, K. A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. W. g, S. L. Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. M. ka, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka...

  17. [18]

    Does audio deepfake detection generalize?

    N. M. Müller, P. Czempin, F. Dieckmann, A. Froghyar, and K. Böttinger, “Does audio deepfake detection generalize?”Interspeech, 2022

  18. [19]

    Transformers: State-of-the-art natural language processing,

    T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of-the-art natural language processing, ” inProceedings of the 2020 Conference on Empirical Meth...

  19. [20]

    Binary codes capable of correcting deletions, insertions, and reversals,

    V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals, ”Soviet physics. Doklady, vol. 10, pp. 707–710, 1965. [Online]. Available: https://api.semanticscholar.org/CorpusID:60827152

  20. [21]

    An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,

    J. Jensen and C. H. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers, ”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 2009–2022, 2016

  21. [22]

    Ai got your tongue? analysing the sounds of audio deepfake genera- tion methods,

    K. Schäfer, “Ai got your tongue? analysing the sounds of audio deepfake genera- tion methods, ” inProceedings of the 2025 International Conference on Multimedia Retrieval, ser. ICMR ’25. New York, NY, USA: Association for Computing Machinery, 2025, p. 2023–2027

  22. [23]

    Does audio deepfake detection generalize?

    N. M. Müller, P. Czempin, F. Dieckmann, A. Froghyar, and K. Böttinger, “Does audio deepfake detection generalize?” inInterspeech, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247793039

  23. [24]

    On success and simplicity: A second look at transferable targeted attacks,

    Z. Zhao, Z. Liu, and M. Larson, “On success and simplicity: A second look at transferable targeted attacks, ” inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 6115–6128

  24. [25]

    Autonomous language-image generation loops converge to generic visual motifs,

    A. Hintze, F. Proschinger Åström, and J. Schossau, “Autonomous language-image generation loops converge to generic visual motifs, ”Patterns, vol. 7, no. 1, p. 101451, 2026

  25. [26]

    Audio deepfake verification,

    L. Wang, J. Ao, L. Gan, Y. Wang, X. Zhang, and Z. Wu, “Audio deepfake verification, ” 2025. [Online]. Available: https://arxiv.org/abs/2509.08476

  26. [27]

    Pitch imperfect: Detecting audio deepfakes through acoustic prosodic analysis,

    K. Warren, D. Olszewski, S. Layton, K. Butler, C. Gates, and P. Traynor, “Pitch imperfect: Detecting audio deepfakes through acoustic prosodic analysis, ” 2025. [Online]. Available: https://arxiv.org/abs/2502.14726

  27. [28]

    Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks,

    A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita-Rotaru, and F. Roli, “Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks, ” in28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 321–

  28. [29]

    Available: https://www.usenix.org/conference/usenixsecurity19/ presentation/demontis

    [Online]. Available: https://www.usenix.org/conference/usenixsecurity19/ presentation/demontis

  29. [30]

    On success and simplicity: A second look at transferable targeted attacks,

    Z. Zhao, Z. Liu, and M. Larson, “On success and simplicity: A second look at transferable targeted attacks, ”Advances in Neural Information Processing Systems, vol. 34, pp. 6115–6128, 2021

  30. [31]

    Beyond boundaries: A comprehensive survey of transferable attacks on ai systems,

    G. Wang, C. Zhou, Y. Wang, B. Chen, H. Guo, and Q. Yan, “Beyond boundaries: A comprehensive survey of transferable attacks on ai systems, ”arXiv preprint arXiv:2311.11796, 2023

  31. [32]

    Audio adversarial examples: Targeted attacks on speech-to-text,

    N. Carlini and D. Wagner, “Audio adversarial examples: Targeted attacks on speech-to-text, ” in2018 IEEE Security and Privacy Workshops (SPW), 2018, pp. 1–7

  32. [33]

    Sirenattack: Generating adver- sarial audio for end-to-end acoustic systems,

    T. Du, S. Ji, J. Li, Q. Gu, T. Wang, and R. Beyah, “Sirenattack: Generating adver- sarial audio for end-to-end acoustic systems, ” inProceedings of the 15th ACM Asia conference on computer and communications security, 2020, pp. 357–369

  33. [34]

    Targeted adversarial examples for black box audio systems,

    R. Taori, A. Kamsetty, B. Chu, and N. Vemuri, “Targeted adversarial examples for black box audio systems, ” in2019 IEEE security and privacy workshops (SPW). IEEE, 2019, pp. 15–20

  34. [35]

    Transferable adversarial attacks on audio deepfake detection,

    M. U. Farooq, A. Khan, K. Uddin, and K. Malik, “Transferable adversarial attacks on audio deepfake detection, ” 01 2025

  35. [36]

    On the transferability of local model-agnostic explanations of machine learning models to unseen data,

    A. López and E. García-Cuesta, “On the transferability of local model-agnostic explanations of machine learning models to unseen data, ” pp. 1–10, 05 2024

  36. [37]

    On the reasons behind decisions,

    A. Darwiche and A. Hirth, “On the reasons behind decisions, ” inECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020), ser. Frontiers in Artificial Intelligence a...

  37. [38]

    Elimelech, T

    K. Elimelech, T. Yaacov, D. Kelly, H. Chockler, and M. Vardi,Explaining Failures of Cyber-Physical Systems with Actual Causality. IEEE, Jun. 2026

  38. [39]

    Explainable ai for time series classification: A review, taxonomy and research directions,

    A. Theissler, F. Spinnato, U. Schlegel, and R. Guidotti, “Explainable ai for time series classification: A review, taxonomy and research directions, ”IEEE Access, vol. 10, 2022

  39. [40]

    Sig-lime: A signal-based enhancement of lime explanation technique,

    T. A. A. Abdullah, M. S. M. Zahid, A. F. Turki, W. Ali, A. A. Jiman, M. J. Abdulaal, N. M. Sobahi, and E. T. Attar, “Sig-lime: A signal-based enhancement of lime explanation technique, ”IEEE Access, vol. 12, pp. 52 641–52 658, 2024

  40. [41]

    Local interpretable model-agnostic explanations for music content analysis,

    S. Mishra, B. L. T. Sturm, and S. Dixon, “Local interpretable model-agnostic explanations for music content analysis, ” inProceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017, 2017, pp. 537–543

  41. [42]

    Audio explainable artificial intelligence: A review,

    A. Akman and B. W. Schuller, “Audio explainable artificial intelligence: A review, ” Intelligent Computing, vol. 2, p. 0074, 2024

  42. [43]

    Lemons: Listen- able explanations for music recommender systems,

    A. B. Melchiorre, V. Haunschmid, M. Schedl, and G. Widmer, “Lemons: Listen- able explanations for music recommender systems, ” inEuropean Conference on Information Retrieval. Springer, 2021, pp. 531–536

  43. [44]

    audiolime: Listenable explanations using source separation,

    V. Haunschmid, E. Manilow, and G. Widmer, “audiolime: Listenable explanations using source separation, ”arXiv preprint arXiv:2008.00582, 2020

  44. [45]

    Musiclime: Explainable multimodal music understanding,

    T. Sotirou, V. Lyberatos, O. M. Mastromichalakis, and G. Stamou, “Musiclime: Explainable multimodal music understanding, ” 2025. [Online]. Available: https://arxiv.org/abs/2409.10496

  45. [46]

    Classification accuracy is not enough: On the evaluation of music genre recognition systems,

    B. L. Sturm, “Classification accuracy is not enough: On the evaluation of music genre recognition systems, ”Journal of Intelligent Information Systems, vol. 41, no. 3, pp. 371–406, 2013

  46. [47]

    A fourier explanation of ai-music artifacts,

    D. Afchar, G. Meseguer-Brocal, K. Akesbi, and R. Hennequin, “A fourier explanation of ai-music artifacts, ” inInternational Society for Music Information Retrieval Conference, 2025. [Online]. Available: https://api.semanticscholar.org/ CorpusID:280000343

  47. [48]

    Out-of-the-box: Black-box Causal Attacks on Object Detectors

    M. Navaratnarajah, D. A. Kelly, and H. Chockler, “Out-of-the-box: Black-box causal attacks on object detectors, ”arXiv preprint arXiv:2512.03730, 2025

  48. [49]

    Dodge, G

    J. Dodge, G. Ilharco, R. Schwartz, A. Farhadi, H. Hajishirzi, and N. Smith, “Fine-tuning pretrained language models: Weight initializations, data orders, and David A. Kelly and Hana Chockler early stopping, ” 2020. [Online]. Available: https://arxiv.org/abs/2002.06305

  49. [50]

    The multiBERTs: BERT reproductions for robustness analysis,

    T. Sellam, S. Yadlowsky, I. Tenney, J. Wei, N. Saphra, A. D’Amour, T. Linzen, J. Bastings, I. R. Turc, J. Eisenstein, D. Das, and E. Pavlick, “The multiBERTs: BERT reproductions for robustness analysis, ” inInternational Conference on Learning Representations, 2022. [Online]. Available: https: //openreview.net/forum?id=K0E_F0gFDgA

  50. [51]

    Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision,

    D. Picard, “Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision, ”CoRR, vol. abs/2109.08203, 2021. [Online]. Available: https://arxiv.org/abs/2109.08203