HybridFlow: A flexible and efficient RLHF framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, Chuan Wu · 2025 · arXiv 9031.36960

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

cs.DC · 2026-06-10 · unverdicted · novelty 7.0

ForeMoE uses routing foresight from the rollout stage to enable micro-step load balancing in MoE RL post-training via a hierarchical planner and transfer engine, claiming up to 1.45x speedup on 64 GPUs.

CATPO: Critique-Augmented Tree Policy Optimization

cs.CL · 2026-06-06 · unverdicted · novelty 6.0

CATPO introduces an informativeness score F(T) and critique-guided healing for failed trees to improve efficiency and performance in tree-based RLVR, reaching 37.5% macro accuracy on math benchmarks.

Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

cs.LG · 2026-02-03 · unverdicted · novelty 6.0

PULSE exploits BF16-invisible sparsity in weight updates to enable over 100x lower communication in distributed RL post-training via compute-visible sparsification.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training cs.DC · 2026-06-10 · unverdicted · none · ref 52
ForeMoE uses routing foresight from the rollout stage to enable micro-step load balancing in MoE RL post-training via a hierarchical planner and transfer engine, claiming up to 1.45x speedup on 64 GPUs.
CATPO: Critique-Augmented Tree Policy Optimization cs.CL · 2026-06-06 · unverdicted · none · ref 14
CATPO introduces an informativeness score F(T) and critique-guided healing for failed trees to improve efficiency and performance in tree-based RLVR, reaching 37.5% macro accuracy on math benchmarks.
Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference cs.LG · 2026-05-07 · unverdicted · none · ref 41
Feather uses reinforcement learning and a Chunked Hash Tree to balance batch size against prefix homogeneity in LLM inference, delivering 2-10x higher throughput than existing schedulers.
Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL cs.LG · 2026-02-03 · unverdicted · none · ref 13
PULSE exploits BF16-invisible sparsity in weight updates to enable over 100x lower communication in distributed RL post-training via compute-visible sparsification.

HybridFlow: A flexible and efficient RLHF framework

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer