arxiv: 2604.02937 · v1 · submitted 2026-04-03 · 💻 cs.SD

Recognition: no theorem link

If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models

David A. Kelly , Hana Chockler

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:50 UTC · model grok-4.3

classification 💻 cs.SD

keywords transferability analysisaudio classificationdeepfake detectionminimal sufficient signalsmodel differencesinformation processingmusic genre classification

0 comments

The pith

Transferability analysis of minimal sufficient audio signals reveals information-theoretic differences between models that accuracy metrics miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces transferability analysis to compare how audio classification models process information. It identifies a minimal sufficient signal for a classification in one model and tests whether that same signal produces the same classification in other models. Across tasks like music genre classification, emotion recognition, and deepfake detection, transfer rates vary significantly, with music genre signals transferring successfully only about 26 percent of the time. In deepfake detection, certain models exhibit distinct transferability patterns, labeled flat-earther models, highlighting differences not visible through standard accuracy or precision measures.

Core claim

Given a minimal sufficient signal for a classification on model f, transferability analysis determines whether other models accept this signal as having the same classification. Applied to three tasks, it shows varying transferability rates and identifies models with unique behaviors in deepfake detection that indicate underlying information processing differences.

What carries the argument

Transferability analysis, which checks if a minimal sufficient signal from one model is accepted by others for the same classification.

If this is right

Music genre sufficient signals transfer successfully in approximately 26% of cases.
Emotion recognition and deepfake detection show higher variance in transferability rates.
Some deepfake detection models, called flat-earther models, display different transferability behavior.
Transferability analysis uncovers information theoretic differences between models not captured by accuracy and precision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models with low transferability might rely on different acoustic features than those with high transferability.
Transferability could guide selection of models for applications where robustness to signal variations matters.
Further analysis might reveal specific audio features that cause non-transfer in certain models.

Load-bearing premise

A minimal sufficient signal for one model's classification can be reliably identified and its transfer to other models reflects meaningful differences in information processing.

What would settle it

An experiment showing that all models accept the same minimal sufficient signals at similar rates would challenge the claim that transferability reveals unique information differences.

Figures

Figures reproduced from arXiv: 2604.02937 by David A. Kelly, Hana Chockler.

**Figure 2.** Figure 2: Graphical representation of an audio depth- [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Power spectral density of real and fake data on ASVSpoof2019 (top 1 2 rows) and ITW (bottom two rows). The different between SP1 and the other models in clear. real fake 5 10 Entropy (bits) SP1 real fake 0 5 10 SP2 real fake 0 5 10 SP3 real fake 5 10 Entropy (bits) real fake 5 10 real fake 5 10 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Spectral entropy across real and fake sufficiencies on ASVSpoof2019 (top) and ITW (bottom) and [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

In order to gain fresh insights about the information processing characteristics of different audio classification models, we propose transferability analysis. Given a minimal, sufficient signal for a classification on a model $f$, transferability analysis asks whether other models accept this minimal signal as having the same classification as it did on $f$. We define what it means for a sufficient signal to be transferable and perform a large study over $3$ different classification tasks: music genre, emotion recognition and deepfake detection. We find that transferability rates vary depending on the task, with sufficient signals for music genre being transferable $\approx26\%$ of the time. The other tasks reveal much higher variance in transferability and reveal that some models, in particular on deepfake detection, have different transferability behavior. We call these models `flat-earther' models. We investigate deepfake audio in more depth, and show that transferability analysis also allows to us to discover information theoretic differences between the models which are not captured by the more familiar metrics of accuracy and precision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Transferability analysis is a new empirical probe for audio model differences that standard metrics miss, but the methods are too thin to evaluate the claims yet.

read the letter

The paper introduces transferability analysis: extract a minimal sufficient signal that makes one model output a label, then check whether other models output the same label on that signal. They run this across music genre classification, emotion recognition, and deepfake detection. The headline numbers are a roughly 26% transfer rate on genre and noticeably higher variance plus some outlier models on deepfakes, which they label flat-earther models. They also claim the approach surfaces information-theoretic differences that accuracy and precision do not capture. That framing is new; I have not seen this exact cross-model signal test before in the audio literature they cite. The empirical sweep over three tasks is the part that actually works: it gives concrete, comparative data points instead of just another accuracy table. The deepfake section in particular shows models that agree on performance but disagree on which signals they treat as sufficient, which is the kind of observation that could matter for downstream security work. The soft spot is the missing methods. The abstract gives no description of how the minimal signals are found, what optimization or search procedure is used, what statistical controls are in place, or how they quantify the information-theoretic differences beyond the raw transfer percentages. Without those steps, the 26% figure and the flat-earther category are hard to interpret or reproduce. If the full paper supplies reproducible code or a clear algorithm for the minimal-signal step, that gap closes; right now it is the load-bearing assumption. This is work for people already doing audio model interpretability or adversarial audio research. A reader who wants a new comparative tool rather than a new SOTA number will find the empirical pattern useful. I would send it to peer review. The idea is distinct enough that referees can pressure the methods into something solid rather than desk-rejecting it outright.

Referee Report

3 major / 1 minor

Summary. The paper proposes transferability analysis as a method to probe information-processing differences among audio classification models. Given a minimal sufficient signal for a classification decision on model f, the approach tests whether other models assign the same label to that signal. Experiments across music genre classification, emotion recognition, and deepfake detection report task-dependent transfer rates (approximately 26% for music genre) and identify a subset of deepfake models, termed 'flat-earther' models, whose transfer behavior deviates from the others. The authors argue that these transfer patterns expose information-theoretic distinctions not visible in accuracy or precision metrics.

Significance. If the empirical findings are reproducible, the work supplies a practical diagnostic for comparing audio models that goes beyond scalar performance numbers. The deepfake results in particular suggest that transferability can surface model-specific sensitivities to minimal cues, which may inform robustness evaluation and model selection in security-sensitive audio tasks.

major comments (3)

[Abstract] Abstract and the description of the method: the central claim that transferability rates reveal information-theoretic differences rests on the reliable extraction of minimal sufficient signals, yet no concrete procedure, optimization criterion, or verification step for identifying these signals is supplied. Without this, the reported 26% transfer rate and the distinction drawn for flat-earther models cannot be assessed for robustness or reproducibility.
[Deepfake experiments] Deepfake detection results: the assertion that transferability uncovers distinctions missed by accuracy and precision requires an explicit comparison (e.g., correlation analysis or controlled ablation) between the two families of metrics; the current presentation leaves open whether the observed variance is an artifact of the signal-minimization process rather than a genuine information-theoretic signal.
[Method definition] Definition of transferability: the paper states that it 'defines what it means for a sufficient signal to be transferable,' but the operational test (threshold on model output, agreement metric, or statistical test) is not specified. This definition is load-bearing for all quantitative claims and must be stated formally before the empirical rates can be interpreted.

minor comments (1)

[Abstract] Abstract contains the grammatical error 'allows to us to discover'; correct to 'allows us to discover'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for improving clarity and reproducibility. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Abstract] Abstract and the description of the method: the central claim that transferability rates reveal information-theoretic differences rests on the reliable extraction of minimal sufficient signals, yet no concrete procedure, optimization criterion, or verification step for identifying these signals is supplied. Without this, the reported 26% transfer rate and the distinction drawn for flat-earther models cannot be assessed for robustness or reproducibility.

Authors: We agree that the current manuscript lacks sufficient detail on the extraction of minimal sufficient signals. In the revised version, we will add a dedicated subsection describing the concrete optimization procedure (including the loss function and constraints used to minimize the signal while preserving the classification decision on model f), the specific algorithm employed, and verification steps such as ablation checks and sensitivity analysis to confirm minimality and sufficiency. This will enable readers to reproduce and assess the robustness of the reported transfer rates and model distinctions. revision: yes
Referee: [Deepfake experiments] Deepfake detection results: the assertion that transferability uncovers distinctions missed by accuracy and precision requires an explicit comparison (e.g., correlation analysis or controlled ablation) between the two families of metrics; the current presentation leaves open whether the observed variance is an artifact of the signal-minimization process rather than a genuine information-theoretic signal.

Authors: We will strengthen this section by adding an explicit comparison between transferability patterns and standard metrics. Specifically, we will include a correlation analysis across models between transfer rates and accuracy/precision values, along with a controlled ablation that varies the signal-minimization parameters while tracking both metric families. This will demonstrate that the observed distinctions for flat-earther models are not artifacts of the minimization process. revision: yes
Referee: [Method definition] Definition of transferability: the paper states that it 'defines what it means for a sufficient signal to be transferable,' but the operational test (threshold on model output, agreement metric, or statistical test) is not specified. This definition is load-bearing for all quantitative claims and must be stated formally before the empirical rates can be interpreted.

Authors: We acknowledge that the operational definition requires formalization. In the revision, we will state the definition explicitly, including the precise threshold applied to model output probabilities for label agreement, the agreement metric (e.g., exact label match or probabilistic divergence), and any statistical tests used to determine transferability. This formalization will be placed early in the methods section to support all subsequent quantitative claims. revision: yes

Circularity Check

0 steps flagged

Empirical study with no circular derivations

full rationale

The paper defines transferability analysis upfront as an empirical procedure: given a minimal sufficient signal for model f, check whether other models accept the same classification. Results are obtained from a large-scale study across three tasks (music genre, emotion recognition, deepfake detection), reporting observed transfer rates and identifying 'flat-earther' models in deepfake detection. No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or self-citation chains. The information-theoretic differences are presented as direct observations from varying transferability behavior, not derived from prior work by the same authors or renamed known results. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that minimal sufficient signals exist and can be extracted for model classifications; no free parameters or invented entities with independent evidence are described.

axioms (1)

domain assumption Minimal sufficient signals can be identified for a given model's classification decision
This underpins the definition of transferable signals in the proposed analysis.

invented entities (1)

flat-earther models no independent evidence
purpose: To categorize models exhibiting atypical transferability behavior in deepfake detection
New descriptive term introduced based on observed experimental patterns.

pith-pipeline@v0.9.0 · 5488 in / 1263 out tokens · 73342 ms · 2026-05-13T18:50:31.263175+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 1 internal anchor

[1]

Lerch,An introduction to audio content analysis: Music Information Retrieval tasks and applications

A. Lerch,An introduction to audio content analysis: Music Information Retrieval tasks and applications. John Wiley & Sons, 2022

work page 2022
[2]

A simple method to determine if a music information retrieval system is a “horse

B. L. Sturm, “A simple method to determine if a music information retrieval system is a “horse”, ”IEEE Transactions on Multimedia, vol. 16, no. 6, pp. 1636–1644, 2014

work page 2014
[4]

Causal identification of sufficient, necessary and complete explanations in image classification,

——, “Causal identification of sufficient, necessary and complete explanations in image classification, ”arXiv preprint arXiv:2507.23497, 2025

work page arXiv 2025
[5]

I am big, you are little; i am right, you are wrong,

D. A. Kelly, A. Chanchal, and N. Blake, “I am big, you are little; i am right, you are wrong, ” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 817–826

work page 2025
[6]

J. Y. Halpern,Actual Causality. The MIT Press, 2019

work page 2019
[7]

I guess that’s why they call it the blues: Causal analysis for audio classifiers,

D. A. Kelly and H. Chockler, “I guess that’s why they call it the blues: Causal analysis for audio classifiers, ”arXiv preprint arXiv:2601.16675, 2026

work page arXiv 2026
[8]

Pearl,Causality: Models, Reasoning and Inference, 2nd ed

J. Pearl,Causality: Models, Reasoning and Inference, 2nd ed. USA: Cambridge University Press, 2009

work page 2009
[9]

Abduction-based explanations for machine learning models,

A. Ignatiev, N. Narodytska, and J. Marques-Silva, “Abduction-based explanations for machine learning models, ” inThe Thirty-Third AAAI Conference on Artificial Intelligence, AAAI. AAAI Press, 2019, pp. 1511–1519

work page 2019
[10]

Multiple different explanations for image classifiers,

H. Chockler, D. A. Kelly, and D. Kroening, “Multiple different explanations for image classifiers, ” inECAI European Conference on Artificial Intelligence, 2025

work page 2025
[11]

Activation-deactivation: A general framework for robust post-hoc explainable ai,

A. Chanchal, D. A. Kelly, and H. Chockler, “Activation-deactivation: A general framework for robust post-hoc explainable ai, ”arXiv preprint arXiv:2510.01038, 2025

work page arXiv 2025
[12]

Oppenheim and R

A. Oppenheim and R. Schafer,Discrete-Time Signal Processing. Pearson Deutschland, 2013. [Online]. Available: https://elibrary.pearson.de/book/99. 150005/9781292038155

work page 2013
[13]

Causal explanations for image classifiers,

H. Chockler, D. A. Kelly, D. Kroening, and Y. Sun, “Causal explanations for image classifiers, ”Journal of Artificial Intelligence Research, 2026

work page 2026
[14]

The ryerson audio-visual database of emotional speech and song (ravdess),

S. R. Livingstone and F. A. Russo, “The ryerson audio-visual database of emotional speech and song (ravdess), ” Apr. 2018. [Online]. Available: https://doi.org/10.5281/zenodo.1188976

work page doi:10.5281/zenodo.1188976 2018
[15]

Fusion and W

C. Fusion and W. Cukierski, “Gtzan, ” https://www.kaggle.com/datasets/ andradaolteanu/gtzan-dataset-music-genre-classification, 2011, kaggle

work page 2011
[16]

The gtzan dataset: Its contents, its faults, their effects on evaluation, and its future use,

B. L. Sturm, “The gtzan dataset: Its contents, its faults, their effects on evaluation, and its future use, ”arXiv preprint arXiv:1306.1461, 2013

work page arXiv 2013
[17]

Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,

X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidul- lah, V. Vestman, T. Kinnunen, K. A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. W. g, S. L. Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. M. ka, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka...

work page 2019
[18]

Does audio deepfake detection generalize?

N. M. Müller, P. Czempin, F. Dieckmann, A. Froghyar, and K. Böttinger, “Does audio deepfake detection generalize?”Interspeech, 2022

work page 2022
[19]

Transformers: State-of-the-art natural language processing,

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of-the-art natural language processing, ” inProceedings of the 2020 Conference on Empirical Meth...

work page 2020
[20]

Binary codes capable of correcting deletions, insertions, and reversals,

V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals, ”Soviet physics. Doklady, vol. 10, pp. 707–710, 1965. [Online]. Available: https://api.semanticscholar.org/CorpusID:60827152

work page 1965
[21]

An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,

J. Jensen and C. H. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers, ”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 2009–2022, 2016

work page 2009
[22]

Ai got your tongue? analysing the sounds of audio deepfake genera- tion methods,

K. Schäfer, “Ai got your tongue? analysing the sounds of audio deepfake genera- tion methods, ” inProceedings of the 2025 International Conference on Multimedia Retrieval, ser. ICMR ’25. New York, NY, USA: Association for Computing Machinery, 2025, p. 2023–2027

work page 2025
[23]

Does audio deepfake detection generalize?

N. M. Müller, P. Czempin, F. Dieckmann, A. Froghyar, and K. Böttinger, “Does audio deepfake detection generalize?” inInterspeech, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247793039

work page 2022
[24]

On success and simplicity: A second look at transferable targeted attacks,

Z. Zhao, Z. Liu, and M. Larson, “On success and simplicity: A second look at transferable targeted attacks, ” inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 6115–6128

work page 2021
[25]

Autonomous language-image generation loops converge to generic visual motifs,

A. Hintze, F. Proschinger Åström, and J. Schossau, “Autonomous language-image generation loops converge to generic visual motifs, ”Patterns, vol. 7, no. 1, p. 101451, 2026

work page 2026
[26]

Audio deepfake verification,

L. Wang, J. Ao, L. Gan, Y. Wang, X. Zhang, and Z. Wu, “Audio deepfake verification, ” 2025. [Online]. Available: https://arxiv.org/abs/2509.08476

work page arXiv 2025
[27]

Pitch imperfect: Detecting audio deepfakes through acoustic prosodic analysis,

K. Warren, D. Olszewski, S. Layton, K. Butler, C. Gates, and P. Traynor, “Pitch imperfect: Detecting audio deepfakes through acoustic prosodic analysis, ” 2025. [Online]. Available: https://arxiv.org/abs/2502.14726

work page arXiv 2025
[28]

Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks,

A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita-Rotaru, and F. Roli, “Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks, ” in28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 321–

work page 2019
[29]

Available: https://www.usenix.org/conference/usenixsecurity19/ presentation/demontis

[Online]. Available: https://www.usenix.org/conference/usenixsecurity19/ presentation/demontis

work page
[30]

On success and simplicity: A second look at transferable targeted attacks,

Z. Zhao, Z. Liu, and M. Larson, “On success and simplicity: A second look at transferable targeted attacks, ”Advances in Neural Information Processing Systems, vol. 34, pp. 6115–6128, 2021

work page 2021
[31]

Beyond boundaries: A comprehensive survey of transferable attacks on ai systems,

G. Wang, C. Zhou, Y. Wang, B. Chen, H. Guo, and Q. Yan, “Beyond boundaries: A comprehensive survey of transferable attacks on ai systems, ”arXiv preprint arXiv:2311.11796, 2023

work page arXiv 2023
[32]

Audio adversarial examples: Targeted attacks on speech-to-text,

N. Carlini and D. Wagner, “Audio adversarial examples: Targeted attacks on speech-to-text, ” in2018 IEEE Security and Privacy Workshops (SPW), 2018, pp. 1–7

work page 2018
[33]

Sirenattack: Generating adver- sarial audio for end-to-end acoustic systems,

T. Du, S. Ji, J. Li, Q. Gu, T. Wang, and R. Beyah, “Sirenattack: Generating adver- sarial audio for end-to-end acoustic systems, ” inProceedings of the 15th ACM Asia conference on computer and communications security, 2020, pp. 357–369

work page 2020
[34]

Targeted adversarial examples for black box audio systems,

R. Taori, A. Kamsetty, B. Chu, and N. Vemuri, “Targeted adversarial examples for black box audio systems, ” in2019 IEEE security and privacy workshops (SPW). IEEE, 2019, pp. 15–20

work page 2019
[35]

Transferable adversarial attacks on audio deepfake detection,

M. U. Farooq, A. Khan, K. Uddin, and K. Malik, “Transferable adversarial attacks on audio deepfake detection, ” 01 2025

work page 2025
[36]

On the transferability of local model-agnostic explanations of machine learning models to unseen data,

A. López and E. García-Cuesta, “On the transferability of local model-agnostic explanations of machine learning models to unseen data, ” pp. 1–10, 05 2024

work page 2024
[37]

On the reasons behind decisions,

A. Darwiche and A. Hirth, “On the reasons behind decisions, ” inECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020), ser. Frontiers in Artificial Intelligence a...

work page 2020
[38]

Elimelech, T

K. Elimelech, T. Yaacov, D. Kelly, H. Chockler, and M. Vardi,Explaining Failures of Cyber-Physical Systems with Actual Causality. IEEE, Jun. 2026

work page 2026
[39]

Explainable ai for time series classification: A review, taxonomy and research directions,

A. Theissler, F. Spinnato, U. Schlegel, and R. Guidotti, “Explainable ai for time series classification: A review, taxonomy and research directions, ”IEEE Access, vol. 10, 2022

work page 2022
[40]

Sig-lime: A signal-based enhancement of lime explanation technique,

T. A. A. Abdullah, M. S. M. Zahid, A. F. Turki, W. Ali, A. A. Jiman, M. J. Abdulaal, N. M. Sobahi, and E. T. Attar, “Sig-lime: A signal-based enhancement of lime explanation technique, ”IEEE Access, vol. 12, pp. 52 641–52 658, 2024

work page 2024
[41]

Local interpretable model-agnostic explanations for music content analysis,

S. Mishra, B. L. T. Sturm, and S. Dixon, “Local interpretable model-agnostic explanations for music content analysis, ” inProceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017, 2017, pp. 537–543

work page 2017
[42]

Audio explainable artificial intelligence: A review,

A. Akman and B. W. Schuller, “Audio explainable artificial intelligence: A review, ” Intelligent Computing, vol. 2, p. 0074, 2024

work page 2024
[43]

Lemons: Listen- able explanations for music recommender systems,

A. B. Melchiorre, V. Haunschmid, M. Schedl, and G. Widmer, “Lemons: Listen- able explanations for music recommender systems, ” inEuropean Conference on Information Retrieval. Springer, 2021, pp. 531–536

work page 2021
[44]

audiolime: Listenable explanations using source separation,

V. Haunschmid, E. Manilow, and G. Widmer, “audiolime: Listenable explanations using source separation, ”arXiv preprint arXiv:2008.00582, 2020

work page arXiv 2008
[45]

Musiclime: Explainable multimodal music understanding,

T. Sotirou, V. Lyberatos, O. M. Mastromichalakis, and G. Stamou, “Musiclime: Explainable multimodal music understanding, ” 2025. [Online]. Available: https://arxiv.org/abs/2409.10496

work page arXiv 2025
[46]

Classification accuracy is not enough: On the evaluation of music genre recognition systems,

B. L. Sturm, “Classification accuracy is not enough: On the evaluation of music genre recognition systems, ”Journal of Intelligent Information Systems, vol. 41, no. 3, pp. 371–406, 2013

work page 2013
[47]

A fourier explanation of ai-music artifacts,

D. Afchar, G. Meseguer-Brocal, K. Akesbi, and R. Hennequin, “A fourier explanation of ai-music artifacts, ” inInternational Society for Music Information Retrieval Conference, 2025. [Online]. Available: https://api.semanticscholar.org/ CorpusID:280000343

work page 2025
[48]

Out-of-the-box: Black-box Causal Attacks on Object Detectors

M. Navaratnarajah, D. A. Kelly, and H. Chockler, “Out-of-the-box: Black-box causal attacks on object detectors, ”arXiv preprint arXiv:2512.03730, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[49]

Dodge, G

J. Dodge, G. Ilharco, R. Schwartz, A. Farhadi, H. Hajishirzi, and N. Smith, “Fine-tuning pretrained language models: Weight initializations, data orders, and David A. Kelly and Hana Chockler early stopping, ” 2020. [Online]. Available: https://arxiv.org/abs/2002.06305

work page arXiv 2020
[50]

The multiBERTs: BERT reproductions for robustness analysis,

T. Sellam, S. Yadlowsky, I. Tenney, J. Wei, N. Saphra, A. D’Amour, T. Linzen, J. Bastings, I. R. Turc, J. Eisenstein, D. Das, and E. Pavlick, “The multiBERTs: BERT reproductions for robustness analysis, ” inInternational Conference on Learning Representations, 2022. [Online]. Available: https: //openreview.net/forum?id=K0E_F0gFDgA

work page 2022
[51]

Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision,

D. Picard, “Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision, ”CoRR, vol. abs/2109.08203, 2021. [Online]. Available: https://arxiv.org/abs/2109.08203

work page arXiv 2021