pith:LRMVZIQ4
ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization
ROAD frames data mixing in offline-to-online reinforcement learning as a bi-level optimization problem solved by a multi-armed bandit to automate replay ratios.
arxiv:2605.14497 v1 · 2026-05-14 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LRMVZIQ4PYFXAYYZLNGLEZQUGT}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our empirical results demonstrate that this approach consistently outperforms existing data replay methods across various datasets, eliminating the need for manual, context-specific adjustments while achieving superior stability and asymptotic performance.
The surrogate objective used inside the multi-armed bandit sufficiently approximates the true bi-level gradient so that the outer-level data-mixing decisions actually improve the final policy performance.
ROAD formulates data mixing as a bi-level optimization problem solved via multi-armed bandit to adaptively balance offline priors and online updates in RL.
References
Receipt and verification
| First computed | 2026-05-17T23:39:06.361561Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
5c595ca21c7e0b7063195b4cb2661434fe63434295697c0a11e39585a72f9109
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LRMVZIQ4PYFXAYYZLNGLEZQUGT \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5c595ca21c7e0b7063195b4cb2661434fe63434295697c0a11e39585a72f9109
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "503a5d5ebe496c4c6f24c513eecce7c4434bffaebbd30fd8d393cbf674ec4041",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-14T07:35:58Z",
"title_canon_sha256": "11b8a06a8ccb4408cfc2bc36fdd84d56c8f412eb4767a63ea1f8f4168572c3aa"
},
"schema_version": "1.0",
"source": {
"id": "2605.14497",
"kind": "arxiv",
"version": 1
}
}