pith. sign in
Pith Number

pith:AD4OU3S5

pith:2023:AD4OU3S57S5Y2GK4FCGKHPM5GD
not attested not anchored not stored refs resolved

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Abhinav Rastogi, Colton Bishop, Ethan Hall, Harrison Lee, Hassan Mansoor, Johan Ferret, Kellie Lu, Samrat Phatale, Sushant Prakash, Thomas Mesnard, Victor Carbune

Reinforcement learning from AI feedback matches human feedback performance for aligning large language models.

arxiv:2309.00267 v3 · 2023-09-01 · cs.CL · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{AD4OU3S57S5Y2GK4FCGKHPM5GD}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across the tasks of summarization, helpful dialogue generation, and harmless dialogue generation, we show that RLAIF achieves comparable performance to RLHF. ... we introduce direct-RLAIF (d-RLAIF) ... which achieves superior performance to canonical RLAIF.

C2weakest assumption

That the preferences generated by an off-the-shelf LLM are high-quality enough to serve as a substitute for human preferences in training the reward model.

C3one line summary

RLAIF matches RLHF on summarization and dialogue tasks, with a direct-RLAIF variant achieving superior results by using LLM rewards directly during training.

References

98 extracted · 98 resolved · 15 Pith anchors

[3] E., Fort, S., Lanham, T., Telleen-Lawton, T., Conerly, T., Henighan, T., Hume, T., Bowman, S 2022
[4] D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al 1901
[6] F., Leike, J., Brown, T., Martic, M., Legg, S., and Amodei, D 2017
[8] RAFT : Reward ranked finetuning for generative foundation model alignment 2023
[9] Understanding dataset difficulty with V -usable information 2022

Formal links

2 machine-checked theorem links

Cited by

35 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.098142Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

00f8ea6e5dfcbb8d195c288ca3bd9d30ccec365083e1091ebe19ac2b0a61252f

Aliases

arxiv: 2309.00267 · arxiv_version: 2309.00267v3 · doi: 10.48550/arxiv.2309.00267 · pith_short_12: AD4OU3S57S5Y · pith_short_16: AD4OU3S57S5Y2GK4 · pith_short_8: AD4OU3S5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/AD4OU3S57S5Y2GK4FCGKHPM5GD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 00f8ea6e5dfcbb8d195c288ca3bd9d30ccec365083e1091ebe19ac2b0a61252f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "e06fbc5dabdd57615fbf708dac24235e084d3e237bd3abd328f2bc19edfd90ee",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-09-01T05:53:33Z",
    "title_canon_sha256": "30a71bef573f4df6e33a747f2c8824790bde2c0d53c2ef6779f1df973bc3eb36"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2309.00267",
    "kind": "arxiv",
    "version": 3
  }
}