Pith Number

pith:4RANQRIG

pith:2023:4RANQRIG7CIP6ZY4O2KYPT54NN

not attested not anchored not stored refs resolved

Eureka: Human-Level Reward Design via Coding Large Language Models

Anima Anandkumar, De-An Huang, Dinesh Jayaraman, Guanzhi Wang, Linxi Fan, Osbert Bastani, William Liang, Yecheng Jason Ma, Yuke Zhu

Large language models can design reward functions for robot tasks that outperform those created by human experts.

arxiv:2310.12931 v2 · 2023-10-19 · cs.RO · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{4RANQRIG7CIP6ZY4O2KYPT54NN}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%.

C2weakest assumption

That LLM-generated reward code will continue to produce stable, non-degenerate policies when transferred to new tasks or real hardware, rather than exploiting simulator quirks that do not appear in the reported 29 environments.

C3one line summary

Eureka uses LLMs for evolutionary optimization of reward code to outperform human experts on 83% of 29 RL tasks with 52% average improvement and enables gradient-free RLHF.

References

14 extracted · 14 resolved · 0 Pith anchors

[1] If you see phrases like [NUM: default_value], replace the entire phrase with a numerical value

[2] If you see phrases like {CHOICE: choice1, choice2, ...}, it means you should replace the entire phrase with one of the choices listed

[3] If you see [optional], it means you only add that line if necessary for the task, otherwise remove that line

[4] Do not invent new objects not listed here

[5] Always start the description with [start of plan] and end it with [end of plan]

Formal links

2 machine-checked theorem links

Cited by

37 papers in Pith

Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations

Automatic Generation of High-Performance RL Environments

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

Scalable Option Learning in High-Throughput Environments

Receipt and verification

First computed	2026-05-17T23:39:21.855779Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

e440d84506f890ff671c769587cfbc6b63cefa2aa6492989b34337cb0c3515af

Aliases

arxiv: 2310.12931 · arxiv_version: 2310.12931v2 · doi: 10.48550/arxiv.2310.12931 · pith_short_12: 4RANQRIG7CIP · pith_short_16: 4RANQRIG7CIP6ZY4 · pith_short_8: 4RANQRIG

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/4RANQRIG7CIP6ZY4O2KYPT54NN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e440d84506f890ff671c769587cfbc6b63cefa2aa6492989b34337cb0c3515af

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ad3784548ded4a14b5fafc6dbc128558fb41c2cdec7eae24949d70c5961dbb6c",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2023-10-19T17:31:01Z",
    "title_canon_sha256": "cb5b85979677dc841ed11123093ae61df2a02fa38dcba4fe26ffe246c855c92e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.12931",
    "kind": "arxiv",
    "version": 2
  }
}