pith. machine review for the scientific record. sign in
Pith Number

pith:7LPQM4CB

pith:2024:7LPQM4CBKI6IJLFXS4SPBULVWL
not attested not anchored not stored refs resolved

RLHF Workflow: From Reward Modeling to Online RLHF

Bo Pang, Caiming Xiong, Doyen Sahoo, HanZe Dong, Han Zhao, Haoxiang Wang, Nan Jiang, Tong Zhang, Wei Xiong, Yingbo Zhou

Online iterative RLHF using proxy preference models from open-source datasets reaches state-of-the-art results on LLM chatbot benchmarks.

arxiv:2405.07863 v3 · 2024-05-13 · cs.LG · cs.AI · cs.CL · stat.ML

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We have shown that supervised fine-tuning (SFT) and iterative RLHF can obtain state-of-the-art performance with fully open-source datasets.

C2weakest assumption

The proxy preference model built from open-source datasets approximates real human feedback closely enough that online RLHF updates remain beneficial rather than harmful.

C3one line summary

The paper supplies a complete open-source recipe for online iterative RLHF that uses proxy preference models and reaches competitive performance on AlpacaEval-2, Arena-Hard, and MT-Bench.

References

23 extracted · 23 resolved · 0 Pith anchors

[1] Flat-leaf parsley is often preferred for its robust flavor
[2] Garlic: Finely chopped or minced raw garlic gives a sharp, pungent taste to the Gremolata
[3] In some variations, you might also find:
[4] Lemon juice: This adds a little extra acidity and freshness to the mix. 5. Fresh basil or mint: These herbs can be used instead of, or in addition to, parsley, depending on the dish and personal prefe 2024
[5] Squats: Helps to strengthen the lower body, particularly the glutes, quadriceps, and hamstrings. 2. Push-ups: Improves upper body strength, targeting the chest, triceps, and shoulders. 3. Planks: Enga

Formal links

2 machine-checked theorem links

Cited by

19 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.576059Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

fadf067041523c84acb79724f0d175b2cf4b4855d6bbde6f3cae47a8b2a89787

Aliases

arxiv: 2405.07863 · arxiv_version: 2405.07863v3 · doi: 10.48550/arxiv.2405.07863 · pith_short_12: 7LPQM4CBKI6I · pith_short_16: 7LPQM4CBKI6IJLFX · pith_short_8: 7LPQM4CB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7LPQM4CBKI6IJLFXS4SPBULVWL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fadf067041523c84acb79724f0d175b2cf4b4855d6bbde6f3cae47a8b2a89787
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "592586c7b486cb0c7461dffbf62ec428773c3a092eac9be768e4c02bd653d800",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "stat.ML"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-05-13T15:50:39Z",
    "title_canon_sha256": "9f4678f3878ae83510a56d06bd2ee031ccdaa6a9660221f34c78267ca1f6e80d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2405.07863",
    "kind": "arxiv",
    "version": 3
  }
}