pith:BYO6ZUZA
Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time
OP-Mix simulates candidate data mixtures by interpolating low-rank adapters trained on the current model, enabling efficient mixing across all phases of language model training.
arxiv:2605.15220 v1 · 2026-05-13 · cs.CL · cs.AI · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BYO6ZUZAKDHZTZFUY5RFGTNQCS}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
OP-Mix consistently finds near-optimal mixtures while using a fraction of the compute of the baselines. In pretraining, OP-Mix improves upon training without mixing by 6.3% in average perplexity. For continual learning, OP-Mix matches the performance of both retraining and on-policy distillation while using 66% and 95% less overall compute, respectively.
That interpolating between low-rank adapters trained directly on the current model accurately simulates the effect of different data mixtures on the full model's learning dynamics without requiring separate proxy models or fixed domain assumptions.
OP-Mix is an on-policy data mixing method that uses low-rank adapter interpolation to find near-optimal data mixtures throughout language model training with reduced compute.
References
Receipt and verification
| First computed | 2026-05-20T00:00:46.924763Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
0e1decd32050cf99e4b4c762534db0148411cc5d3cdbb5e35a9dae5a7b84a2cb
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BYO6ZUZAKDHZTZFUY5RFGTNQCS \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0e1decd32050cf99e4b4c762534db0148411cc5d3cdbb5e35a9dae5a7b84a2cb
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "b02916e3588c09fe7233174d7c1f1ad36ab5296ff41f7fdcf464af241eab45ab",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-05-13T02:29:19Z",
"title_canon_sha256": "47ccfa359dc3ecc490055107686b0da1b7f1ea016203b4f3c159ab97bd172d75"
},
"schema_version": "1.0",
"source": {
"id": "2605.15220",
"kind": "arxiv",
"version": 1
}
}