Pith Number

pith:23IZB7K2

pith:2023:23IZB7K2H2STHS5OGU3SX45Y43

not attested not anchored not stored refs pending

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Anand Siththaranjan, Anca Dragan, Andi Peng, Charbel-Rapha\"el Segerie, Claudia Shi, David Krueger, David Lindner, Dmitrii Krasheninnikov, Dorsa Sadigh, Dylan Hadfield-Menell, Erdem B{\i}y{\i}k, Eric J. Michaud, Jacob Pfau, Javier Rando, J\'er\'emy Scheurer, Lauro Langosco, Max Nadeau, Mehul Damani, Micah Carroll, Pedro Freire, Peter Hase, Phillip Christoffersen, Rachel Freedman, Samuel Marks, Stephen Casper, Stewart Slocum, Thomas Krendl Gilbert, Tomasz Korbak, Tony Wang, Usman Anwar, Xander Davies, Xin Chen

RLHF, the dominant method for aligning large language models with human goals, carries fundamental limitations that incremental fixes cannot fully resolve.

arxiv:2307.15217 v2 · 2023-07-27 · cs.AI · cs.CL · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{23IZB7K2H2STHS5OGU3SX45Y43}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

RLHF has emerged as the central method used to finetune state-of-the-art large language models but has fundamental limitations, emphasizing the importance of a multi-faceted approach to the development of safer AI systems.

C2weakest assumption

That the identified open problems represent fundamental limitations of RLHF rather than challenges that can be resolved through incremental improvements or better implementation.

C3one line summary

RLHF has significant open problems and fundamental limitations that require a multi-faceted approach for safer AI development.

Formal links

2 machine-checked theorem links

Cited by

45 papers in Pith

Active teacher selection for reward learning

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

Echo: Learning from Experience Data via User-Driven Refinement

Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap

ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

Receipt and verification

First computed	2026-05-17T23:39:21.418541Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

d6d190fd5a3ea533cbae35372bf3b8e6ddf630437eb272f01b14c5a437f010e0

Aliases

arxiv: 2307.15217 · arxiv_version: 2307.15217v2 · doi: 10.48550/arxiv.2307.15217 · pith_short_12: 23IZB7K2H2ST · pith_short_16: 23IZB7K2H2STHS5O · pith_short_8: 23IZB7K2

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/23IZB7K2H2STHS5OGU3SX45Y43 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d6d190fd5a3ea533cbae35372bf3b8e6ddf630437eb272f01b14c5a437f010e0

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "b2d5a1ffd2f126b3687464bdd54878590cc00c30071a32d643bc8ba98db2121f",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2023-07-27T22:29:25Z",
    "title_canon_sha256": "f64b8e348d873883aaed9189c75a52a859f0648b3f54a9627b55a100e94a8e16"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2307.15217",
    "kind": "arxiv",
    "version": 2
  }
}