ARIA: A Diagnostic Framework for Music Training Data Attribution

Ashkan Panahi; Changheon Han; K{\i}van\c{c} Tatar

arxiv: 2605.16181 · v1 · pith:GS2EE6XUnew · submitted 2026-05-15 · 💻 cs.SD

ARIA: A Diagnostic Framework for Music Training Data Attribution

Changheon Han , Ashkan Panahi , K{\i}van\c{c} Tatar This is my paper

Pith reviewed 2026-05-19 18:20 UTC · model grok-4.3

classification 💻 cs.SD

keywords training data attributionmusic generationcopyright analysisreliability diagnosticssymbolic musicaudio musicinfluence decompositionscore matrix analysis

0 comments

The pith

ARIA decomposes music training data attribution into specific musical aspects and validates methods using reliability diagnostics that match ground truth rankings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing attribution methods for music generation models give only a single number for how much a training song influenced an output. ARIA instead breaks that influence down along distinct musical aspects such as melody or harmony for symbolic music and timbre or rhythm for audio. It then runs three reliability checks on the resulting score matrix: how similar the top attributed tracks are to each other, the structure revealed by singular value decomposition, and basic column statistics. When tested on a symbolic model whose true influences are known from retraining without certain songs, these checks rank four different attribution methods in exactly the same order as the ground-truth retraining does. The result gives per-aspect evidence that aligns with the idea-expression distinction used in copyright analysis.

Core claim

The paper claims that pairing aspect-decomposed attribution scores with reliability diagnostics computed from the segment-level score matrix produces a diagnostic framework that ranks attribution methods identically to ground-truth rankings obtained by counterfactual retraining on a symbolic-music model, while also exposing substantial differences in behavior across methods on an audio generation model and characterizing embedding baselines by the musical aspect each encoder emphasizes.

What carries the argument

ARIA framework that decomposes attribution scores along five musical aspects for symbolic music or three for audio and applies reliability diagnostics including within-group similarity of top-K tracks, singular value decomposition of the score matrix, and column statistics.

If this is right

Attribution reports can list influence separately for each musical aspect instead of a single scalar.
Reliability diagnostics can serve as an objective way to compare new attribution methods against existing ones.
Score matrices that return nearly identical tracks for every query can be flagged as failing to reflect query-specific influence.
Embedding-similarity baselines can be characterized by which musical aspect their encoder tends to surface.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers could use the per-aspect breakdowns to audit training data for specific stylistic borrowings before release.
The same diagnostic structure might be adapted to other generative domains once domain-specific aspects are defined.
Courts or rights holders could request aspect-level attribution reports when assessing whether a generated work copies protectable expression.

Load-bearing premise

The chosen musical aspects and the three reliability diagnostics together capture the dimensions of influence that matter for both model behavior and copyright analysis.

What would settle it

On the symbolic-music model, if the reliability diagnostics ranked the four attribution methods in an order different from the order produced by counterfactual retraining without each song, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.16181 by Ashkan Panahi, Changheon Han, K{\i}van\c{c} Tatar.

**Figure 1.** Figure 1: Homogeneity z¯c across K for GRAD-COS and LOGRA at coarse and fine, contrasting the largest and smallest r1 regimes among non-semantic attribution settings. Dashed curves use the original S seg, solid curves use the rank-1 residual. COS coarse is the clearest case (r1=1.000, r2:5=0.000), producing z¯c=+29.56 on timbral and +8.56 on harmonic, both the largest values across all 15 settings. Rank-1 removal fl… view at source ↗

**Figure 2.** Figure 2: Within-group homogeneity z¯c across K on MusicTransformer + MAESTRO, one panel per jSymbolic channel with the four attribution methods overlaid. Dashed curves use the original S seg , solid curves use the rank-1 residual. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Homogeneity z¯c across K for the eight attribution stage-method settings not shown in [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Homogeneity z¯c across K for the three embedding-based retrieval baselines. Curve conventions as in [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

read the original abstract

Training data attribution (TDA) for music generation must answer two questions that copyright analysis requires, namely which training songs influence a generated output and along which musical aspects the influence operates. Existing methods reduce influence to a single scalar, without revealing which musical aspects are dominant in that influence. We propose ARIA, a framework that decomposes attribution along musical aspects (five for symbolic music, three for audio) and pairs the decomposition with reliability diagnostics computed from the segment-level score matrix. It measures within-group similarity among the top-K attributed tracks against random reference groups drawn from the training pool, and diagnoses the score matrix through its singular value decomposition and column statistics. On a symbolic-music model where attribution ground truth is available through counterfactual retraining, the reliability diagnostics rank four attribution methods identically to that ground truth. On an audio music generation model, ARIA reveals attribution behaviors that vary substantially across TDA methods, flags score matrices whose retrieved tracks are nearly identical across queries rather than reflecting per-query attribution, and characterizes embedding-similarity retrieval baselines by the musical aspect each encoder surfaces. Together, ARIA produces per-aspect attribution evidence aligned with the musical aspects considered under the idea-expression distinction in copyright analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ARIA adds aspect decomposition and matrix diagnostics to music TDA but its main validation rests on potentially noisy counterfactual retraining.

read the letter

The main thing here is that ARIA decomposes training data attribution into musical aspects and layers on reliability checks from the score matrix. On the symbolic case the diagnostics match the ranking produced by counterfactual retraining, and on audio it surfaces clear differences across methods plus cases where attributions collapse to near-identical tracks regardless of query. That combination is the actual step beyond scalar-only TDA. It does a useful job of aligning the output with copyright-relevant distinctions by showing influence along specific aspects rather than a single number. The diagnostics themselves are simple and inspectable: within-group similarity against random baselines, SVD structure, and column statistics. Those give a concrete way to flag degenerate matrices or non-specific retrieval. The audio experiments also characterize embedding baselines by which musical dimension each encoder picks up. That part is practical for anyone auditing music generators. The soft spot is the ground-truth construction. Removing one track and retraining can easily be dominated by random seed effects, data ordering, and optimizer noise, so the measured output change does not cleanly isolate the removed track's influence. The abstract gives no variance numbers or stability runs across seeds, which leaves the matching ranking claim weaker than it first appears. Aspect selection also lacks detail on whether the five symbolic and three audio choices were fixed in advance or adjusted after seeing data. This is aimed at MIR and ML people who need attribution tools for model auditing or copyright questions. A reader working on generative music who wants more than influence scores will get concrete methods and comparisons to think about. It has enough of a framework and initial evidence to deserve a serious referee, even with the robustness gaps. I would send it for review but ask for stability checks on the retraining and clearer pre-specification of aspects and diagnostics.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces ARIA, a framework for training data attribution (TDA) in music generation models. It decomposes attribution scores along musical aspects (five for symbolic music, three for audio) and augments them with reliability diagnostics computed from the segment-level score matrix, specifically within-group similarity of top-K attributed tracks versus random references, singular value decomposition, and column statistics. On a symbolic-music model, these diagnostics produce the same ranking of four TDA methods as ground truth obtained via counterfactual retraining; on an audio model, ARIA is used to characterize differences across TDA methods and to flag cases where retrieved tracks are nearly identical across queries.

Significance. If the central validation holds, ARIA supplies a needed tool for aspect-specific attribution analysis in generative music, directly relevant to copyright questions under the idea-expression distinction. The explicit use of counterfactual retraining to obtain ground truth and the introduction of matrix-based diagnostics constitute concrete strengths that move beyond scalar TDA scores.

major comments (1)

[Symbolic-music validation experiment] Symbolic-music validation experiment (as described in the abstract): the claim that the reliability diagnostics rank the four attribution methods identically to ground truth rests on counterfactual retraining, yet the manuscript reports no variance across random seeds, no stability checks on convergence, and no quantification of how much output differences arise from optimizer noise versus track removal. Because neural training is stochastic, the measured ground-truth ranking itself may be confounded, weakening the assertion that the diagnostics correctly recover a reliable ordering.

minor comments (2)

The selection process for the five symbolic and three audio musical aspects, and the precise construction of the segment-level score matrix, are not described with sufficient detail to determine whether they were fixed before seeing results or how they map to dimensions of influence.
Clarify the exact definition of 'within-group similarity' (e.g., distance metric, choice of K, and sampling procedure for random reference groups) so that the diagnostic can be reproduced.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on the manuscript. We address the single major comment below and agree that additional analysis of training stochasticity will strengthen the validation section.

read point-by-point responses

Referee: [Symbolic-music validation experiment] Symbolic-music validation experiment (as described in the abstract): the claim that the reliability diagnostics rank the four attribution methods identically to ground truth rests on counterfactual retraining, yet the manuscript reports no variance across random seeds, no stability checks on convergence, and no quantification of how much output differences arise from optimizer noise versus track removal. Because neural training is stochastic, the measured ground-truth ranking itself may be confounded, weakening the assertion that the diagnostics correctly recover a reliable ordering.

Authors: We agree that neural training stochasticity is a legitimate concern for the counterfactual-retraining ground truth and that the original manuscript does not report variance across seeds or quantify optimizer noise versus track-removal effects. In the experiments we performed, a single fixed seed was used for reproducibility, and the ranking of the four TDA methods remained identical to the diagnostics across the reported runs. To strengthen the claim, we will add a new subsection that repeats the counterfactual retraining for three independent random seeds, reports the resulting variance in the ground-truth ranking, and compares the magnitude of output changes attributable to seed variation versus track removal. This revision will make the validation more robust without altering the core findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes ARIA to decompose attribution scores along musical aspects and apply independent reliability diagnostics (within-group similarity to random groups, SVD of the score matrix, and column statistics). The key empirical claim is that these diagnostics produce the same ranking of four TDA methods as an external ground truth obtained by counterfactual retraining on a symbolic model. No equations, fitted parameters, or self-citations are described that would make the diagnostics or rankings equivalent to the inputs by construction. The validation relies on an independent retraining procedure rather than reducing to a definitional or fitted equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that musical influence can be decomposed into a small number of discrete aspects and that the proposed diagnostics are valid proxies for attribution quality; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption Musical influence operates along a small set of discrete, identifiable aspects that align with copyright-relevant distinctions.
The framework defines five aspects for symbolic music and three for audio without further justification in the abstract.

pith-pipeline@v0.9.0 · 5746 in / 1307 out tokens · 44993 ms · 2026-05-19T18:20:24.981741+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reliability diagnostics … singular value decomposition … mean absolute inter-query correlation κ … mean concentration ratio p … within-group musical homogeneity … zc(q)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

five jSymbolic channels (melody, harmony, rhythm, dynamic, texture) … three audio channels (rhythm, harmony, timbre)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

[1]

MusicLM: Generating Music From Text

Andrea Agostinelli, Timo I Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, et al. MusicLM: Generating music from text. arXiv preprint arXiv:2301.11325, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Towards tracing knowledge in language models back to the training data

Ekin Akyurek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, and Kelvin Guu. Towards tracing knowledge in language models back to the training data. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2429–2446, Abu Dhabi, United Arab Emirates, D...

work page 2022
[3]

Exploring musical roots: Applying audio embeddings to empower influence attribution for a generative music model

Julia Barnett, Hugo Flores Garcia, and Bryan Pardo. Exploring musical roots: Applying audio embeddings to empower influence attribution for a generative music model. InProceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR), 2024

work page 2024
[4]

Bittner, Brian McFee, Justin Salamon, Peter Li, and Juan Pablo Bello

Rachel M. Bittner, Brian McFee, Justin Salamon, Peter Li, and Juan Pablo Bello. Deep salience representa- tions for F0 estimation in polyphonic music. InProceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pages 63–70, 2017

work page 2017
[5]

AudioLM: A language modeling approach to audio generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2523–2533, 2023

Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. AudioLM: A language modeling approach to audio generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2523–2533, 2023

work page 2023
[6]

Quantifying memorization across neural language models

Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramèr, and Chiyuan Zhang. Quantifying memorization across neural language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[7]

Extracting training data from large language models

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021

work page 2021
[8]

Extracting training data from diffusion models

Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In32nd USENIX Security Symposium (USENIX Security 23), pages 5253–5270, 2023

work page 2023
[9]

Input similarity from the neural network perspective.Advances in Neural Information Processing Systems, 32, 2019

Guillaume Charpiat, Nicolas Girard, Loris Felardos, and Yuliya Tarabalka. Input similarity from the neural network perspective.Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[10]

Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Baker Grosse, and Eric P. Xing. What is your data worth to GPT? LLM-scale data valuation with influence functions. InThe Thirty-ninth Annual Conference on Neural Information Processing ...

work page 2026
[11]

Large-scale training data attribution for music generative models via unlearning.arXiv preprint arXiv:2506.18312, 2025

Woosung Choi, Junghyun Koo, Kin Wai Cheuk, Joan Serrà, Marco A Martínez-Ramírez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, and Yuki Mitsufuji. Large-scale training data attribution for music generative models via unlearning.arXiv preprint arXiv:2506.18312, 2025

work page arXiv 2025
[12]

Steven Davis and Paul Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980

work page 1980
[13]

FMA: A dataset for music analysis

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson. FMA: A dataset for music analysis. InProceedings of the 18th International Society for Music Information Retrieval Conference, pages 316–323, 2017

work page 2017
[14]

Junwei Deng, Xirui Jiang, Shiyuan Zhang, Shichang Zhang, Himabindu Lakkaraju, Ruijiang Gao, Chris Donahue, and Jiaqi W. Ma. Computational copyright: Towards a royalty model for music generative AI. arXiv preprint arXiv:2312.06646, 2023

work page arXiv 2023
[15]

dattri: A library for efficient data attribution

Junwei Deng, Ting-Wei Li, Shiyuan Zhang, Shixuan Liu, Yijun Pan, Hao Huang, Xinhe Wang, Pingbang Hu, Xingjian Zhang, and Jiaqi Ma. dattri: A library for efficient data attribution. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 136763–136781....

work page 2024
[16]

Dornis and Sebastian Stober

Tim W. Dornis and Sebastian Stober. Generative AI training and copyright law.Transactions of the International Society for Music Information Retrieval, 2025. arXiv:2502.15858. 10

work page arXiv 2025
[17]

CLAP: Learning audio concepts from natural language supervision

Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, and Huaming Wang. CLAP: Learning audio concepts from natural language supervision. InICASSP 2023 – IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1–5, 2023

work page 2023
[18]

Flowsynth: simplifying complex audio generation through explorable latent spaces with normalizing flows

Philippe Esling, Naotake Masuda, and Axel Chemla-Romeu-Santos. Flowsynth: simplifying complex audio generation through explorable latent spaces with normalizing flows. InProceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 5273–5275, 2021

work page 2021
[19]

Matrix computations 3rd edition.The John Hopkins University, Baltimore, 1996

Gene H Golub and Charles F Van Loan. Matrix computations 3rd edition.The John Hopkins University, Baltimore, 1996

work page 1996
[20]

Detecting harmonic change in musical audio

Christopher Harte, Mark Sandler, and Martin Gasser. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, pages 21–26, 2006

work page 2006
[21]

Enabling factorized piano music modeling and generation with the MAESTRO dataset

Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. Enabling factorized piano music modeling and generation with the MAESTRO dataset. InInternational Conference on Learning Representations, 2019

work page 2019
[22]

A functional taxonomy of music generation systems.ACM Computing Surveys (CSUR), 50(5):1–30, 2017

Dorien Herremans, Ching-Hua Chuan, and Elaine Chew. A functional taxonomy of music generation systems.ACM Computing Surveys (CSUR), 50(5):1–30, 2017

work page 2017
[23]

Beatnet: Crnn and particle filtering for online joint beat downbeat and meter tracking

Mojtaba Heydari, Frank Cwitkowitz, and Zhiyao Duan. Beatnet: Crnn and particle filtering for online joint beat downbeat and meter tracking. 2021

work page 2021
[24]

Pingbang Hu, Joseph Melkonian, Weijing Tang, Han Zhao, and Jiaqi W. Ma. GraSS: Scalable data attribution with gradient sparsification and sparse projection.arXiv preprint arXiv:2505.18976, 2025

work page arXiv 2025
[25]

Music transformer: Generating music with long-term structure

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. Music transformer: Generating music with long-term structure. InInternational Conference on Learning Representations, 2019

work page 2019
[26]

Datamodels: Predicting predictions from training data

Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, and Aleksander Madry. Datamodels: Predicting predictions from training data. InICML, 2022

work page 2022
[27]

No encore: Unlearning as opt-out in music generation

Jinju Kim, Taehan Kim, Abdul Waheed, Jong Hwan Ko, and Rita Singh. No encore: Unlearning as opt-out in music generation. InNeurIPS 2025 Workshop on AI for Music, 2025

work page 2025
[28]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, volume 70 ofICML ’17, pages 1885–1894, 2017

work page 2017
[29]

Disentangled multidimensional metric learning for music similarity

Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. Disentangled multidimensional metric learning for music similarity. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6–10. IEEE, 2020

work page 2020
[30]

Metric learning vs classification for disentangled music representation learning

Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. Metric learning vs classification for disentangled music representation learning. InThe 21th International Society for Music Information Retrieval Conference (ISMIR). International Society for Music Information Retrieval, 2020

work page 2020
[31]

Thirteen ways to look at the correlation coefficient.The American Statistician, 42(1):59–66, 1988

Joseph Lee Rodgers and W Alan Nicewander. Thirteen ways to look at the correlation coefficient.The American Statistician, 42(1):59–66, 1988

work page 1988
[32]

Mert: Acoustic music understanding model with large-scale self-supervised training,

Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, et al. MERT: Acoustic music understanding model with large-scale self-supervised training.arXiv preprint arXiv:2306.00107, 2023

work page arXiv 2023
[33]

Copyright infringement of music: Determining whether what sounds alike is alike.Vanderbilt Journal of Entertainment and Technology Law, 15(2):227–294, 2013

Margit Livingston and Joseph Urbinato. Copyright infringement of music: Determining whether what sounds alike is alike.Vanderbilt Journal of Entertainment and Technology Law, 15(2):227–294, 2013

work page 2013
[34]

Learning disentangled representations of timbre and pitch for musical instrument sounds using Gaussian mixture variational autoencoders

Yin-Jyun Luo, Kat Agres, and Dorien Herremans. Learning disentangled representations of timbre and pitch for musical instrument sounds using Gaussian mixture variational autoencoders. InProceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), 2019

work page 2019
[35]

K. V . Mardia, J. T. Kent, and J. M. Bibby.Multivariate Analysis. Academic Press, London, 1979

work page 1979
[36]

Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in Python. InProceedings of the 14th Python in Science Conference, pages 18–25, 2015. 11

work page 2015
[37]

Cory McKay, Julie Cumming, and Ichiro Fujinaga. jSymbolic 2.2: Extracting features from symbolic music for use in musicological and MIR research.Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 348–354, 2018

work page 2018
[38]

Morreale, W

Fabio Morreale, Wiebke Hutiri, Joan Serrà, Alice Xiang, and Yuki Mitsufuji. Attribution-by-design: Ensuring inference-time provenance in generative music systems.arXiv preprint arXiv:2510.08062, 2025

work page arXiv 2025
[39]

Court decisions on music plagiarism and the predictive value of similarity algorithms.Musicae Scientiae, 13(1_suppl):257–295, 2009

Daniel Müllensiefen and Marc Pendzich. Court decisions on music plagiarism and the predictive value of similarity algorithms.Musicae Scientiae, 13(1_suppl):257–295, 2009

work page 2009
[40]

Springer, 2015

Meinard Müller.Fundamentals of Music Processing. Springer, 2015

work page 2015
[41]

Harmonizing music theory and music law.Iowa Law Review, 108:1247–1313, 2023

Peter Nicolas. Harmonizing music theory and music law.Iowa Law Review, 108:1247–1313, 2023

work page 2023
[42]

A folk musician became a target for AI fakes and a copyright troll

Terrence O’Brien. A folk musician became a target for AI fakes and a copyright troll. The Verge, April 2026

work page 2026
[43]

TRAK: Attributing model behavior at scale

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. TRAK: Attributing model behavior at scale. InProceedings of the 40th International Conference on Machine Learning, pages 27074–27113, 2023

work page 2023
[44]

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

Yonghyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Woosung Choi, Kin Wai Cheuk, Junghyun Koo, and Yuki Mitsufuji. Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution. InInternational Conference on Learning Representations (ICLR), 2026

work page 2026
[45]

Estimating training data influence by tracing gradient descent

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. InAdvances in Neural Information Processing Systems, volume 33, pages 19920–19930, 2020

work page 2020
[46]

Justin Salamon, Emilia Gómez, Daniel P. W. Ellis, and Gaël Richard. Melody extraction from polyphonic music signals: Approaches, applications, and challenges.IEEE Signal Processing Magazine, 31(2):118– 134, 2014

work page 2014
[47]

Constant-q transform toolbox for music processing

Christian Schörkhuber and Anssi Klapuri. Constant-q transform toolbox for music processing. In7th sound and music computing conference, Barcelona, Spain, pages 3–64. SMC, 2010

work page 2010
[48]

Supervised contrastive learning from weakly-labeled audio segments for musical version matching

Joan Serrà, R.Õguz Araz, Dmitry Bogdanov, and Yuki Mitsufuji. Supervised contrastive learning from weakly-labeled audio segments for musical version matching. InInternational Conference on Machine Learning (ICML), 2025

work page 2025
[49]

Diffusion art or digital forgery? investigating data replication in diffusion models

Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6048–6058, 2023

work page 2023
[50]

Latent timbre synthesis: Audio-based variational auto-encoders for music composition and sound design applications.Neural Computing and Applications, 33(1):67–84, 2021

Kıvanç Tatar, Daniel Bisig, and Philippe Pasquier. Latent timbre synthesis: Audio-based variational auto-encoders for music composition and sound design applications.Neural Computing and Applications, 33(1):67–84, 2021

work page 2021
[51]

UMG Recordings v. Suno. Complaint, UMG Recordings, Inc. v. Suno, Inc., no. 1:24-cv-11611 (D. Mass. 2024), 2024

work page 2024
[52]

UMG Recordings v. Udio. Complaint, UMG Recordings, Inc. v. Uncharted Labs, Inc., no. 1:24-cv-04777 (S.D.N.Y . 2024), 2024

work page 2024
[53]

Efros, Jun-Yan Zhu, and Richard Zhang

Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu, and Richard Zhang. Evaluating data attribution for text-to-image models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7192–7203, 2023

work page 2023
[54]

Data attribution for text-to-image models by unlearning synthesized images.Advances in Neural Information Processing Systems, 37:4235–4266, 2024

Sheng-Yu Wang, Aaron Hertzmann, Alexei A Efros, Jun-Yan Zhu, and Richard Zhang. Data attribution for text-to-image models by unlearning synthesized images.Advances in Neural Information Processing Systems, 37:4235–4266, 2024

work page 2024
[55]

Self-supervised disentanglement of harmonic and rhythmic features in music audio signals

Yiming Wu. Self-supervised disentanglement of harmonic and rhythmic features in music audio signals. arXiv preprint arXiv:2309.02796, 2023

work page arXiv 2023
[56]

Omnizart: A general toolbox for automatic music transcription.Journal of Open Source Software, 6(68):3391, 2021

Yu-Te Wu, Yin-Jyun Luo, Tsung-Ping Chen, I-Chieh Wei, Jui-Yang Hsu, Yi-Chin Chuang, and Li Su. Omnizart: A general toolbox for automatic music transcription.Journal of Open Source Software, 6(68):3391, 2021. 12 A The Definitions and Formulations of Methods in ARIA A.1 Attribution Method Formulations Each attribution method assigns a real-valued score to e...

work page arXiv 2021

[1] [1]

MusicLM: Generating Music From Text

Andrea Agostinelli, Timo I Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, et al. MusicLM: Generating music from text. arXiv preprint arXiv:2301.11325, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Towards tracing knowledge in language models back to the training data

Ekin Akyurek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, and Kelvin Guu. Towards tracing knowledge in language models back to the training data. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2429–2446, Abu Dhabi, United Arab Emirates, D...

work page 2022

[3] [3]

Exploring musical roots: Applying audio embeddings to empower influence attribution for a generative music model

Julia Barnett, Hugo Flores Garcia, and Bryan Pardo. Exploring musical roots: Applying audio embeddings to empower influence attribution for a generative music model. InProceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR), 2024

work page 2024

[4] [4]

Bittner, Brian McFee, Justin Salamon, Peter Li, and Juan Pablo Bello

Rachel M. Bittner, Brian McFee, Justin Salamon, Peter Li, and Juan Pablo Bello. Deep salience representa- tions for F0 estimation in polyphonic music. InProceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pages 63–70, 2017

work page 2017

[5] [5]

AudioLM: A language modeling approach to audio generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2523–2533, 2023

Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. AudioLM: A language modeling approach to audio generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2523–2533, 2023

work page 2023

[6] [6]

Quantifying memorization across neural language models

Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramèr, and Chiyuan Zhang. Quantifying memorization across neural language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023

[7] [7]

Extracting training data from large language models

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650, 2021

work page 2021

[8] [8]

Extracting training data from diffusion models

Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In32nd USENIX Security Symposium (USENIX Security 23), pages 5253–5270, 2023

work page 2023

[9] [9]

Input similarity from the neural network perspective.Advances in Neural Information Processing Systems, 32, 2019

Guillaume Charpiat, Nicolas Girard, Loris Felardos, and Yuliya Tarabalka. Input similarity from the neural network perspective.Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[10] [10]

Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Baker Grosse, and Eric P. Xing. What is your data worth to GPT? LLM-scale data valuation with influence functions. InThe Thirty-ninth Annual Conference on Neural Information Processing ...

work page 2026

[11] [11]

Large-scale training data attribution for music generative models via unlearning.arXiv preprint arXiv:2506.18312, 2025

Woosung Choi, Junghyun Koo, Kin Wai Cheuk, Joan Serrà, Marco A Martínez-Ramírez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, and Yuki Mitsufuji. Large-scale training data attribution for music generative models via unlearning.arXiv preprint arXiv:2506.18312, 2025

work page arXiv 2025

[12] [12]

Steven Davis and Paul Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980

work page 1980

[13] [13]

FMA: A dataset for music analysis

Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson. FMA: A dataset for music analysis. InProceedings of the 18th International Society for Music Information Retrieval Conference, pages 316–323, 2017

work page 2017

[14] [14]

Junwei Deng, Xirui Jiang, Shiyuan Zhang, Shichang Zhang, Himabindu Lakkaraju, Ruijiang Gao, Chris Donahue, and Jiaqi W. Ma. Computational copyright: Towards a royalty model for music generative AI. arXiv preprint arXiv:2312.06646, 2023

work page arXiv 2023

[15] [15]

dattri: A library for efficient data attribution

Junwei Deng, Ting-Wei Li, Shiyuan Zhang, Shixuan Liu, Yijun Pan, Hao Huang, Xinhe Wang, Pingbang Hu, Xingjian Zhang, and Jiaqi Ma. dattri: A library for efficient data attribution. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 136763–136781....

work page 2024

[16] [16]

Dornis and Sebastian Stober

Tim W. Dornis and Sebastian Stober. Generative AI training and copyright law.Transactions of the International Society for Music Information Retrieval, 2025. arXiv:2502.15858. 10

work page arXiv 2025

[17] [17]

CLAP: Learning audio concepts from natural language supervision

Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, and Huaming Wang. CLAP: Learning audio concepts from natural language supervision. InICASSP 2023 – IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1–5, 2023

work page 2023

[18] [18]

Flowsynth: simplifying complex audio generation through explorable latent spaces with normalizing flows

Philippe Esling, Naotake Masuda, and Axel Chemla-Romeu-Santos. Flowsynth: simplifying complex audio generation through explorable latent spaces with normalizing flows. InProceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 5273–5275, 2021

work page 2021

[19] [19]

Matrix computations 3rd edition.The John Hopkins University, Baltimore, 1996

Gene H Golub and Charles F Van Loan. Matrix computations 3rd edition.The John Hopkins University, Baltimore, 1996

work page 1996

[20] [20]

Detecting harmonic change in musical audio

Christopher Harte, Mark Sandler, and Martin Gasser. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, pages 21–26, 2006

work page 2006

[21] [21]

Enabling factorized piano music modeling and generation with the MAESTRO dataset

Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. Enabling factorized piano music modeling and generation with the MAESTRO dataset. InInternational Conference on Learning Representations, 2019

work page 2019

[22] [22]

A functional taxonomy of music generation systems.ACM Computing Surveys (CSUR), 50(5):1–30, 2017

Dorien Herremans, Ching-Hua Chuan, and Elaine Chew. A functional taxonomy of music generation systems.ACM Computing Surveys (CSUR), 50(5):1–30, 2017

work page 2017

[23] [23]

Beatnet: Crnn and particle filtering for online joint beat downbeat and meter tracking

Mojtaba Heydari, Frank Cwitkowitz, and Zhiyao Duan. Beatnet: Crnn and particle filtering for online joint beat downbeat and meter tracking. 2021

work page 2021

[24] [24]

Pingbang Hu, Joseph Melkonian, Weijing Tang, Han Zhao, and Jiaqi W. Ma. GraSS: Scalable data attribution with gradient sparsification and sparse projection.arXiv preprint arXiv:2505.18976, 2025

work page arXiv 2025

[25] [25]

Music transformer: Generating music with long-term structure

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. Music transformer: Generating music with long-term structure. InInternational Conference on Learning Representations, 2019

work page 2019

[26] [26]

Datamodels: Predicting predictions from training data

Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, and Aleksander Madry. Datamodels: Predicting predictions from training data. InICML, 2022

work page 2022

[27] [27]

No encore: Unlearning as opt-out in music generation

Jinju Kim, Taehan Kim, Abdul Waheed, Jong Hwan Ko, and Rita Singh. No encore: Unlearning as opt-out in music generation. InNeurIPS 2025 Workshop on AI for Music, 2025

work page 2025

[28] [28]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, volume 70 ofICML ’17, pages 1885–1894, 2017

work page 2017

[29] [29]

Disentangled multidimensional metric learning for music similarity

Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. Disentangled multidimensional metric learning for music similarity. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6–10. IEEE, 2020

work page 2020

[30] [30]

Metric learning vs classification for disentangled music representation learning

Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. Metric learning vs classification for disentangled music representation learning. InThe 21th International Society for Music Information Retrieval Conference (ISMIR). International Society for Music Information Retrieval, 2020

work page 2020

[31] [31]

Thirteen ways to look at the correlation coefficient.The American Statistician, 42(1):59–66, 1988

Joseph Lee Rodgers and W Alan Nicewander. Thirteen ways to look at the correlation coefficient.The American Statistician, 42(1):59–66, 1988

work page 1988

[32] [32]

Mert: Acoustic music understanding model with large-scale self-supervised training,

Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, et al. MERT: Acoustic music understanding model with large-scale self-supervised training.arXiv preprint arXiv:2306.00107, 2023

work page arXiv 2023

[33] [33]

Copyright infringement of music: Determining whether what sounds alike is alike.Vanderbilt Journal of Entertainment and Technology Law, 15(2):227–294, 2013

Margit Livingston and Joseph Urbinato. Copyright infringement of music: Determining whether what sounds alike is alike.Vanderbilt Journal of Entertainment and Technology Law, 15(2):227–294, 2013

work page 2013

[34] [34]

Learning disentangled representations of timbre and pitch for musical instrument sounds using Gaussian mixture variational autoencoders

Yin-Jyun Luo, Kat Agres, and Dorien Herremans. Learning disentangled representations of timbre and pitch for musical instrument sounds using Gaussian mixture variational autoencoders. InProceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), 2019

work page 2019

[35] [35]

K. V . Mardia, J. T. Kent, and J. M. Bibby.Multivariate Analysis. Academic Press, London, 1979

work page 1979

[36] [36]

Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. librosa: Audio and music signal analysis in Python. InProceedings of the 14th Python in Science Conference, pages 18–25, 2015. 11

work page 2015

[37] [37]

Cory McKay, Julie Cumming, and Ichiro Fujinaga. jSymbolic 2.2: Extracting features from symbolic music for use in musicological and MIR research.Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pages 348–354, 2018

work page 2018

[38] [38]

Morreale, W

Fabio Morreale, Wiebke Hutiri, Joan Serrà, Alice Xiang, and Yuki Mitsufuji. Attribution-by-design: Ensuring inference-time provenance in generative music systems.arXiv preprint arXiv:2510.08062, 2025

work page arXiv 2025

[39] [39]

Court decisions on music plagiarism and the predictive value of similarity algorithms.Musicae Scientiae, 13(1_suppl):257–295, 2009

Daniel Müllensiefen and Marc Pendzich. Court decisions on music plagiarism and the predictive value of similarity algorithms.Musicae Scientiae, 13(1_suppl):257–295, 2009

work page 2009

[40] [40]

Springer, 2015

Meinard Müller.Fundamentals of Music Processing. Springer, 2015

work page 2015

[41] [41]

Harmonizing music theory and music law.Iowa Law Review, 108:1247–1313, 2023

Peter Nicolas. Harmonizing music theory and music law.Iowa Law Review, 108:1247–1313, 2023

work page 2023

[42] [42]

A folk musician became a target for AI fakes and a copyright troll

Terrence O’Brien. A folk musician became a target for AI fakes and a copyright troll. The Verge, April 2026

work page 2026

[43] [43]

TRAK: Attributing model behavior at scale

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. TRAK: Attributing model behavior at scale. InProceedings of the 40th International Conference on Machine Learning, pages 27074–27113, 2023

work page 2023

[44] [44]

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

Yonghyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Woosung Choi, Kin Wai Cheuk, Junghyun Koo, and Yuki Mitsufuji. Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution. InInternational Conference on Learning Representations (ICLR), 2026

work page 2026

[45] [45]

Estimating training data influence by tracing gradient descent

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. InAdvances in Neural Information Processing Systems, volume 33, pages 19920–19930, 2020

work page 2020

[46] [46]

Justin Salamon, Emilia Gómez, Daniel P. W. Ellis, and Gaël Richard. Melody extraction from polyphonic music signals: Approaches, applications, and challenges.IEEE Signal Processing Magazine, 31(2):118– 134, 2014

work page 2014

[47] [47]

Constant-q transform toolbox for music processing

Christian Schörkhuber and Anssi Klapuri. Constant-q transform toolbox for music processing. In7th sound and music computing conference, Barcelona, Spain, pages 3–64. SMC, 2010

work page 2010

[48] [48]

Supervised contrastive learning from weakly-labeled audio segments for musical version matching

Joan Serrà, R.Õguz Araz, Dmitry Bogdanov, and Yuki Mitsufuji. Supervised contrastive learning from weakly-labeled audio segments for musical version matching. InInternational Conference on Machine Learning (ICML), 2025

work page 2025

[49] [49]

Diffusion art or digital forgery? investigating data replication in diffusion models

Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffusion art or digital forgery? investigating data replication in diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6048–6058, 2023

work page 2023

[50] [50]

Latent timbre synthesis: Audio-based variational auto-encoders for music composition and sound design applications.Neural Computing and Applications, 33(1):67–84, 2021

Kıvanç Tatar, Daniel Bisig, and Philippe Pasquier. Latent timbre synthesis: Audio-based variational auto-encoders for music composition and sound design applications.Neural Computing and Applications, 33(1):67–84, 2021

work page 2021

[51] [51]

UMG Recordings v. Suno. Complaint, UMG Recordings, Inc. v. Suno, Inc., no. 1:24-cv-11611 (D. Mass. 2024), 2024

work page 2024

[52] [52]

UMG Recordings v. Udio. Complaint, UMG Recordings, Inc. v. Uncharted Labs, Inc., no. 1:24-cv-04777 (S.D.N.Y . 2024), 2024

work page 2024

[53] [53]

Efros, Jun-Yan Zhu, and Richard Zhang

Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu, and Richard Zhang. Evaluating data attribution for text-to-image models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7192–7203, 2023

work page 2023

[54] [54]

Data attribution for text-to-image models by unlearning synthesized images.Advances in Neural Information Processing Systems, 37:4235–4266, 2024

Sheng-Yu Wang, Aaron Hertzmann, Alexei A Efros, Jun-Yan Zhu, and Richard Zhang. Data attribution for text-to-image models by unlearning synthesized images.Advances in Neural Information Processing Systems, 37:4235–4266, 2024

work page 2024

[55] [55]

Self-supervised disentanglement of harmonic and rhythmic features in music audio signals

Yiming Wu. Self-supervised disentanglement of harmonic and rhythmic features in music audio signals. arXiv preprint arXiv:2309.02796, 2023

work page arXiv 2023

[56] [56]

Omnizart: A general toolbox for automatic music transcription.Journal of Open Source Software, 6(68):3391, 2021

Yu-Te Wu, Yin-Jyun Luo, Tsung-Ping Chen, I-Chieh Wei, Jui-Yang Hsu, Yi-Chin Chuang, and Li Su. Omnizart: A general toolbox for automatic music transcription.Journal of Open Source Software, 6(68):3391, 2021. 12 A The Definitions and Formulations of Methods in ARIA A.1 Attribution Method Formulations Each attribution method assigns a real-valued score to e...

work page arXiv 2021