pith. the verified trust layer for science. sign in
Pith Number

pith:YYXR2WAL

pith:2024:YYXR2WAL6MK5EWBYF2QDMLUFPS
not attested not anchored not stored refs resolved

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Bin Chen, Hao Chen, Haoran Wang, Haotian Xu, Jason Klein Liu, Jian Hu, Songlin Jiang, Weikai Fang, Wei Shen, Weixun Wang, Xianyu, Xibin Wu, Yiming Liu, Yu Cao, Zilin Zhu

OpenRLHF delivers a streamlined open-source framework for RLHF that trains models 1.22x to 1.68x faster while requiring far fewer lines of code.

arxiv:2405.11143 v6 · 2024-05-20 · cs.AI · cs.CL · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experimental results show that OpenRLHF achieves superior training efficiency, with speedups ranging from 1.22x to 1.68x across different model sizes, compared to state-of-the-art frameworks. Additionally, it requires significantly fewer lines of code for implementation.

C2weakest assumption

The reported speedups and code reductions are measured under fair, comparable conditions against the true state-of-the-art baselines, and the ease-of-use metric (lines of code) accurately reflects real-world implementation effort for typical users.

C3one line summary

OpenRLHF is a new open-source RLHF framework reporting 1.22x to 1.68x speedups and fewer lines of code than prior systems.

References

30 extracted · 30 resolved · 10 Pith anchors

[1] Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30 2017
[2] Learning to summarize with human feedback 2020
[3] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948
[4] Exploring data scaling trends and effects in reinforcement learning from human feedback 2025
[5] GPT-4 Technical Report 2023 · arXiv:2303.08774

Cited by

35 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:53.680048Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c62f1d580bf315d258382ea0362e857c8d43d67375d111aa7ff0b1884177f5cb

Aliases

arxiv: 2405.11143 · arxiv_version: 2405.11143v6 · doi: 10.48550/arxiv.2405.11143 · pith_short_12: YYXR2WAL6MK5 · pith_short_16: YYXR2WAL6MK5EWBY · pith_short_8: YYXR2WAL
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YYXR2WAL6MK5EWBYF2QDMLUFPS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c62f1d580bf315d258382ea0362e857c8d43d67375d111aa7ff0b1884177f5cb
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "97f3ea108185fc25741dae86d92f03fa0ec171db8c855fab78d2fd4659774e7a",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2024-05-20T01:04:40Z",
    "title_canon_sha256": "cac5c1897e3139b2e2616615e0260b990c1809c83d38220b3a69c988b2521d15"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2405.11143",
    "kind": "arxiv",
    "version": 6
  }
}