pith. sign in
Pith Number

pith:UJD6C5LK

pith:2026:UJD6C5LKM6FET57BR5S427227D
not attested not anchored not stored refs pending

SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data

Alessandro Sestini, Andrew D. Bagdanov, Carlo Romeo, Girolamo Macaluso

SOPE uses an actor-aligned OPE signal on held-out data to automatically stop offline phases in online RL with prior data.

arxiv:2605.05863 v2 · 2026-05-07 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UJD6C5LKM6FET57BR5S427227D}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Evaluated on 25 continuous control tasks from the Minari benchmark suite, SOPE improves baseline performance by up to 45.6% while reducing the required TFLOPs by up to 22x, thus balancing the tradeoff between sample and computational efficiency.

C2weakest assumption

That the actor-aligned OPE signal evaluated on a held-out validation split under the current policy's action distribution accurately detects the saturation point of out-of-distribution benefits without either stopping too early (wasting prior knowledge) or too late (causing overfitting).

C3one line summary

SOPE uses an actor-aligned OPE signal on a held-out validation split to dynamically stop offline stabilization phases in online RL, improving performance up to 45.6% and cutting TFLOPs up to 22x on 25 Minari tasks.

Receipt and verification
First computed 2026-05-21T01:05:20.299992Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a247e1756a678a49f7e18f65cd7f5af8e8ff466fe8f75a884968938cd2b826da

Aliases

arxiv: 2605.05863 · arxiv_version: 2605.05863v2 · doi: 10.48550/arxiv.2605.05863 · pith_short_12: UJD6C5LKM6FE · pith_short_16: UJD6C5LKM6FET57B · pith_short_8: UJD6C5LK
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UJD6C5LKM6FET57BR5S427227D \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a247e1756a678a49f7e18f65cd7f5af8e8ff466fe8f75a884968938cd2b826da
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c9929fbe9928ffdf20365251fc61cf2891f678711bc7705ee68bedcdf8189391",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-07T08:32:09Z",
    "title_canon_sha256": "385fe671c4fa7b2102bd48a27a22b37a07480575b6974d319ac81587ec48f3c2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.05863",
    "kind": "arxiv",
    "version": 2
  }
}