Pith Number

pith:YYXR2WAL

pith:2024:YYXR2WAL6MK5EWBYF2QDMLUFPS

not attested not anchored not stored refs resolved

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Bin Chen, Hao Chen, Haoran Wang, Haotian Xu, Jason Klein Liu, Jian Hu, Songlin Jiang, Weikai Fang, Wei Shen, Weixun Wang, Xianyu, Xibin Wu, Yiming Liu, Yu Cao, Zilin Zhu

OpenRLHF delivers a streamlined open-source framework for RLHF that trains models 1.22x to 1.68x faster while requiring far fewer lines of code.

arxiv:2405.11143 v6 · 2024-05-20 · cs.AI · cs.CL · cs.LG

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experimental results show that OpenRLHF achieves superior training efficiency, with speedups ranging from 1.22x to 1.68x across different model sizes, compared to state-of-the-art frameworks. Additionally, it requires significantly fewer lines of code for implementation.

C2weakest assumption

The reported speedups and code reductions are measured under fair, comparable conditions against the true state-of-the-art baselines, and the ease-of-use metric (lines of code) accurately reflects real-world implementation effort for typical users.

C3one line summary

OpenRLHF is a new open-source RLHF framework reporting 1.22x to 1.68x speedups and fewer lines of code than prior systems.

References

30 extracted · 30 resolved · 10 Pith anchors

[1] Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30 2017

[2] Learning to summarize with human feedback 2020

[3] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948

[4] Exploring data scaling trends and effects in reinforcement learning from human feedback 2025

[5] GPT-4 Technical Report 2023 · arXiv:2303.08774

Cited by

35 papers in Pith

RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs

A Survey of Reinforcement Learning for Large Reasoning Models

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning

HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments

Receipt and verification

First computed	2026-05-17T23:38:53.680048Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

c62f1d580bf315d258382ea0362e857c8d43d67375d111aa7ff0b1884177f5cb

Aliases

arxiv: 2405.11143 · arxiv_version: 2405.11143v6 · doi: 10.48550/arxiv.2405.11143 · pith_short_12: YYXR2WAL6MK5 · pith_short_16: YYXR2WAL6MK5EWBY · pith_short_8: YYXR2WAL

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/YYXR2WAL6MK5EWBYF2QDMLUFPS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c62f1d580bf315d258382ea0362e857c8d43d67375d111aa7ff0b1884177f5cb

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "97f3ea108185fc25741dae86d92f03fa0ec171db8c855fab78d2fd4659774e7a",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2024-05-20T01:04:40Z",
    "title_canon_sha256": "cac5c1897e3139b2e2616615e0260b990c1809c83d38220b3a69c988b2521d15"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2405.11143",
    "kind": "arxiv",
    "version": 6
  }
}