pith:EIQJJ4XW
Modeling Music as a Time-Frequency Image: A 2D Tokenizer for Music Generation
BandTok turns music into a 2D time-frequency token grid from a single shared codebook, reducing sequential dependencies for autoregressive generation.
arxiv:2605.15831 v1 · 2026-05-15 · cs.SD · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EIQJJ4XWI4KAFI3M6JBEF5M44Z}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
BandTok yields a physically interpretable time-frequency token grid with a more independent token structure, making it better suited for autoregressive modeling than residual-codebook tokenizers.
The residual hierarchy in existing high-fidelity codecs imposes strong sequential dependencies that amplify error accumulation during autoregressive generation after sequence flattening; the single shared codebook in BandTok avoids this while preserving reconstruction quality.
BandTok tokenizes Mel-spectrograms as independent time-frequency band tokens from a single codebook and pairs it with 2D RoPE in an autoregressive model to improve music generation over residual multi-codebook tokenizers.
References
Receipt and verification
| First computed | 2026-05-20T00:01:20.748003Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
222094f2f6471402a36cf24242f59ce6789cb804c2f7c74151660059a3ecc68c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EIQJJ4XWI4KAFI3M6JBEF5M44Z \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 222094f2f6471402a36cf24242f59ce6789cb804c2f7c74151660059a3ecc68c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "5a796fd442792dd585c6372960d188aea1fb81a5273d71638182b88600ea5c08",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.SD",
"submitted_at": "2026-05-15T10:35:49Z",
"title_canon_sha256": "ec5f07fdea99f6f7f8c5ec81374e48a37f6d08a0f354945307c1c131a428c9f1"
},
"schema_version": "1.0",
"source": {
"id": "2605.15831",
"kind": "arxiv",
"version": 1
}
}