pith:2YNZJQXK
Reward-Conditioned Reinforcement Learning
Conditioning RL agents on reward parameters during single-objective training enables zero-shot adaptation to new rewards via replay data alone.
arxiv:2603.05066 v3 · 2026-03-05 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2YNZJQXKERHE44UGTNN3W6KL3R}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
RCRL improves sample efficiency under the nominal reward parameterization, enables efficient adaptation to new parameterizations, and supports zero-shot behavioral adjustment at deployment.
That recomputing counterfactual rewards from replay data collected under the nominal policy produces unbiased training signals for other reward parameterizations.
RCRL conditions RL policies on reward parameters and uses shared replay data to train for multiple objectives under a single nominal reward, improving efficiency and adaptability.
Formal links
Receipt and verification
| First computed | 2026-05-20T01:06:08.893422Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d61b94c2ea244e4e72869b5bbb794bdc6a3c7c3d2353b502e82050d6f4510a50
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2YNZJQXKERHE44UGTNN3W6KL3R \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d61b94c2ea244e4e72869b5bbb794bdc6a3c7c3d2353b502e82050d6f4510a50
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "c5b4a4bb45cbd5cc2bcec4f144608d501ff9ef6fb5eb15de46ef0744c9be4430",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-03-05T11:29:17Z",
"title_canon_sha256": "70cb4e591b6b3f80a2eb6f4194501f2e5ff9d1017e8ad8e834565b90df3e761d"
},
"schema_version": "1.0",
"source": {
"id": "2603.05066",
"kind": "arxiv",
"version": 3
}
}