pith. sign in
Pith Number

pith:GMH4UAMF

pith:2026:GMH4UAMFDXWDWTBEAYXTDM7KOR
not attested not anchored not stored refs pending

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

Adhiraj Banerjee, Vipul Arora

PairAlign generates compact audio token sequences by training each view's output to be likely under the other's encoder while contrasting unrelated examples.

arxiv:2605.06582 v2 · 2026-05-07 · cs.LG · cs.CL · cs.SD

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GMH4UAMFDXWDWTBEAYXTDM7KOR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On TIMIT retrieval, it preserves edit-distance search while reducing archive token count by 55%. A continuous-sweep probe shows lower local overlap than a dense geometric tokenizer, but stronger length control and bounded edit trajectories under 100 ms shifts.

C2weakest assumption

That optimizing cross-view sequence likelihood with unrelated negatives as contrast produces token sequences whose edit-distance properties generalize to downstream tasks without direct supervision on those properties.

C3one line summary

PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token count by 55% on TIMIT.

Receipt and verification
First computed 2026-06-09T02:07:28.588832Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

330fca01851dec3b4c24062f31b3ea745f2547c7a9d3c564477ee7730ed16628

Aliases

arxiv: 2605.06582 · arxiv_version: 2605.06582v2 · doi: 10.48550/arxiv.2605.06582 · pith_short_12: GMH4UAMFDXWD · pith_short_16: GMH4UAMFDXWDWTBE · pith_short_8: GMH4UAMF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GMH4UAMFDXWDWTBEAYXTDM7KOR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 330fca01851dec3b4c24062f31b3ea745f2547c7a9d3c564477ee7730ed16628
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "7b681725430bf02f2e528276b1172415e2a4c1fc77a25c71711f1db6a0af57e6",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.SD"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-07T17:11:22Z",
    "title_canon_sha256": "59807b35e32d4a73ebd341180dead6b1f03cafad47f37df62a6354572ca93743"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.06582",
    "kind": "arxiv",
    "version": 2
  }
}