Real: Efficient rlhf training of large language models with parameter reallocation

Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

cs.AI · 2025-04-07 · unverdicted · novelty 6.0

VAPO achieves 60.4 on AIME 2024 with Qwen 32B, outperforming prior methods by over 10 points through targeted fixes for value bias, sequence length variation, and sparse rewards.

citing papers explorer

Showing 1 of 1 citing paper.

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks cs.AI · 2025-04-07 · unverdicted · none · ref 13
VAPO achieves 60.4 on AIME 2024 with Qwen 32B, outperforming prior methods by over 10 points through targeted fixes for value bias, sequence length variation, and sparse rewards.

Real: Efficient rlhf training of large language models with parameter reallocation

fields

years

verdicts

representative citing papers

citing papers explorer