pith. sign in
Pith Number

pith:4RANQRIG

pith:2023:4RANQRIG7CIP6ZY4O2KYPT54NN
not attested not anchored not stored refs resolved

Eureka: Human-Level Reward Design via Coding Large Language Models

Anima Anandkumar, De-An Huang, Dinesh Jayaraman, Guanzhi Wang, Linxi Fan, Osbert Bastani, William Liang, Yecheng Jason Ma, Yuke Zhu

Large language models can design reward functions for robot tasks that outperform those created by human experts.

arxiv:2310.12931 v2 · 2023-10-19 · cs.RO · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4RANQRIG7CIP6ZY4O2KYPT54NN}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%.

C2weakest assumption

That LLM-generated reward code will continue to produce stable, non-degenerate policies when transferred to new tasks or real hardware, rather than exploiting simulator quirks that do not appear in the reported 29 environments.

C3one line summary

Eureka uses LLMs for evolutionary optimization of reward code to outperform human experts on 83% of 29 RL tasks with 52% average improvement and enables gradient-free RLHF.

References

14 extracted · 14 resolved · 0 Pith anchors

[1] If you see phrases like [NUM: default_value], replace the entire phrase with a numerical value
[2] If you see phrases like {CHOICE: choice1, choice2, ...}, it means you should replace the entire phrase with one of the choices listed
[3] If you see [optional], it means you only add that line if necessary for the task, otherwise remove that line
[4] Do not invent new objects not listed here
[5] Always start the description with [start of plan] and end it with [end of plan]

Formal links

2 machine-checked theorem links

Cited by

37 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:21.855779Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

e440d84506f890ff671c769587cfbc6b63cefa2aa6492989b34337cb0c3515af

Aliases

arxiv: 2310.12931 · arxiv_version: 2310.12931v2 · doi: 10.48550/arxiv.2310.12931 · pith_short_12: 4RANQRIG7CIP · pith_short_16: 4RANQRIG7CIP6ZY4 · pith_short_8: 4RANQRIG
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4RANQRIG7CIP6ZY4O2KYPT54NN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e440d84506f890ff671c769587cfbc6b63cefa2aa6492989b34337cb0c3515af
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ad3784548ded4a14b5fafc6dbc128558fb41c2cdec7eae24949d70c5961dbb6c",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2023-10-19T17:31:01Z",
    "title_canon_sha256": "cb5b85979677dc841ed11123093ae61df2a02fa38dcba4fe26ffe246c855c92e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.12931",
    "kind": "arxiv",
    "version": 2
  }
}