pith:UJD6C5LK
SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data
SOPE uses an actor-aligned OPE signal on held-out data to automatically stop offline phases in online RL with prior data.
arxiv:2605.05863 v2 · 2026-05-07 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UJD6C5LKM6FET57BR5S427227D}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Evaluated on 25 continuous control tasks from the Minari benchmark suite, SOPE improves baseline performance by up to 45.6% while reducing the required TFLOPs by up to 22x, thus balancing the tradeoff between sample and computational efficiency.
That the actor-aligned OPE signal evaluated on a held-out validation split under the current policy's action distribution accurately detects the saturation point of out-of-distribution benefits without either stopping too early (wasting prior knowledge) or too late (causing overfitting).
SOPE uses an actor-aligned OPE signal on a held-out validation split to dynamically stop offline stabilization phases in online RL, improving performance up to 45.6% and cutting TFLOPs up to 22x on 25 Minari tasks.
Receipt and verification
| First computed | 2026-05-21T01:05:20.299992Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
a247e1756a678a49f7e18f65cd7f5af8e8ff466fe8f75a884968938cd2b826da
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UJD6C5LKM6FET57BR5S427227D \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a247e1756a678a49f7e18f65cd7f5af8e8ff466fe8f75a884968938cd2b826da
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "c9929fbe9928ffdf20365251fc61cf2891f678711bc7705ee68bedcdf8189391",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-07T08:32:09Z",
"title_canon_sha256": "385fe671c4fa7b2102bd48a27a22b37a07480575b6974d319ac81587ec48f3c2"
},
"schema_version": "1.0",
"source": {
"id": "2605.05863",
"kind": "arxiv",
"version": 2
}
}