pith:4RANQRIG
Eureka: Human-Level Reward Design via Coding Large Language Models
Large language models can design reward functions for robot tasks that outperform those created by human experts.
arxiv:2310.12931 v2 · 2023-10-19 · cs.RO · cs.AI · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4RANQRIG7CIP6ZY4O2KYPT54NN}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%.
That LLM-generated reward code will continue to produce stable, non-degenerate policies when transferred to new tasks or real hardware, rather than exploiting simulator quirks that do not appear in the reported 29 environments.
Eureka uses LLMs for evolutionary optimization of reward code to outperform human experts on 83% of 29 RL tasks with 52% average improvement and enables gradient-free RLHF.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:21.855779Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e440d84506f890ff671c769587cfbc6b63cefa2aa6492989b34337cb0c3515af
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4RANQRIG7CIP6ZY4O2KYPT54NN \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e440d84506f890ff671c769587cfbc6b63cefa2aa6492989b34337cb0c3515af
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ad3784548ded4a14b5fafc6dbc128558fb41c2cdec7eae24949d70c5961dbb6c",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.RO",
"submitted_at": "2023-10-19T17:31:01Z",
"title_canon_sha256": "cb5b85979677dc841ed11123093ae61df2a02fa38dcba4fe26ffe246c855c92e"
},
"schema_version": "1.0",
"source": {
"id": "2310.12931",
"kind": "arxiv",
"version": 2
}
}