Pith Number

pith:GS2EE6XU

pith:2026:GS2EE6XU7XI5XMUVNMW4LZO5YU

not attested not anchored not stored refs resolved

ARIA: A Diagnostic Framework for Music Training Data Attribution

Ashkan Panahi, Changheon Han, K{\i}van\c{c} Tatar

ARIA decomposes music training data attribution into specific musical aspects and validates methods using reliability diagnostics that match ground truth rankings.

arxiv:2605.16181 v1 · 2026-05-15 · cs.SD

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{GS2EE6XU7XI5XMUVNMW4LZO5YU}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On a symbolic-music model where attribution ground truth is available through counterfactual retraining, the reliability diagnostics rank four attribution methods identically to that ground truth.

C2weakest assumption

The chosen musical aspects (five for symbolic music, three for audio) and the reliability diagnostics (within-group similarity, SVD, column statistics) correctly capture the dimensions of influence relevant to copyright analysis and model behavior.

C3one line summary

ARIA decomposes music training data attribution into musical aspects and supplies reliability diagnostics from similarity metrics and score matrix analysis, with validation on symbolic models using counterfactual retraining.

References

56 extracted · 56 resolved · 1 Pith anchors

[1] MusicLM: Generating Music From Text 2023 · arXiv:2301.11325

[2] Towards tracing knowledge in language models back to the training data 2022

[3] Exploring musical roots: Applying audio embeddings to empower influence attribution for a generative music model 2024

[4] Bittner, Brian McFee, Justin Salamon, Peter Li, and Juan Pablo Bello 2017

[5] AudioLM: A language modeling approach to audio generation.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2523–2533, 2023 2023

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:01:56.512122Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

34b4427af4fdd1dbb2956b2dc5e5ddc53cb40d7aeb9c8dcec88d5dbe25152c0f

Aliases

arxiv: 2605.16181 · arxiv_version: 2605.16181v1 · doi: 10.48550/arxiv.2605.16181 · pith_short_12: GS2EE6XU7XI5 · pith_short_16: GS2EE6XU7XI5XMUV · pith_short_8: GS2EE6XU

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/GS2EE6XU7XI5XMUVNMW4LZO5YU \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 34b4427af4fdd1dbb2956b2dc5e5ddc53cb40d7aeb9c8dcec88d5dbe25152c0f

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "7700e21a1bb61236b9592ea8607358f1f2bad5b415889dbddb2a2e9f765ee7c1",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SD",
    "submitted_at": "2026-05-15T17:00:14Z",
    "title_canon_sha256": "a10312b468e591865a82c6fdb6fd1ead032aa69ea56807972b03792e35c78d8d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16181",
    "kind": "arxiv",
    "version": 1
  }
}