hub

Streamrl: Scalable, heterogeneous, and elastic rl for llms with disaggregated stream generation

Yinmin Zhong, Zili Zhang, Xiaoniu Song, Hanpeng Hu, Chao Jin, Bingyang Wu, Nuo Chen, Yukun Chen, Yu Zhou, Changyi Wan, Hongyu Zhou, Yimin Jiang, Yibo Zhu, Daxin Jiang · 2025 · arXiv 2504.15930

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning

cs.LG · 2026-05-09 · conditional · novelty 8.0

ReLibra uses pre-known token-to-expert routing from RL rollouts to perform inter-batch expert reordering and intra-batch replication, delivering up to 1.6x higher throughput than Megatron-LM and 1.2x over oracle-equipped EPLB while staying within 6-10% of an ideal balanced baseline.

AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

AstraFlow decouples RL components into autonomous dataflow services to natively support multi-policy agentic LLM training, elastic scaling, and cross-region execution with 2.7x speedup on math, code, search, and AgentBench workloads.

BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

BubbleSpec exploits long-tail bubbles in synchronous RL by using faster ranks' idle time to pre-generate rollout drafts for speculative decoding, reducing steps by 50% and raising throughput up to 1.8x while preserving exact synchrony.

HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments

cs.DC · 2025-12-13 · unverdicted · novelty 7.0

HetRL delivers up to 9.17x higher throughput for LLM RL training on heterogeneous GPUs by using hybrid and ILP-based schedulers to solve a joint optimization problem over computation and data dependencies.

How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning

cs.LG · 2026-05-17 · conditional · novelty 6.0

Mu-GRPO enables substantially more off-policy GRPO training for LLMs via relaxed clipping and negative-advantage veto in large staged batches, matching standard GRPO performance at ~2x training speed.

FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.

Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing identified as favorable on the stability-support trade-off.

ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

cs.DC · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

ROSE is a system for cooperative elasticity that co-locates serving and rollout models on shared GPUs, delivering 1.3-3.3x higher end-to-end throughput than fixed-resource baselines while preserving serving SLOs.

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

DORA's multi-version streaming rollout enables 2-3x higher throughput in asynchronous RL for LLMs while preserving convergence by maintaining policy consistency, data integrity, and bounded staleness.

JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training

cs.LG · 2026-04-26 · unverdicted · novelty 6.0

JigsawRL achieves up to 1.85x higher throughput in LLM RL pipelines via pipeline multiplexing, sub-stage graphs, and look-ahead scheduling compared to prior systems.

TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training

cs.DC · 2026-04-10 · unverdicted · novelty 6.0

TensorHub uses Reference-Oriented Storage to enable scalable weight transfer in LLM RL training by referencing replicated GPU weights, achieving up to 19x reduction in cross-datacenter stall time.

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

cs.DC · 2025-11-18 · unverdicted · novelty 6.0

Seer improves synchronous LLM RL rollout throughput by up to 2.04x and reduces long-tail latency by 72-94% via divided rollout, context-aware scheduling, and adaptive grouped speculative decoding based on prompt similarity observations.

RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs

cs.DC · 2025-10-22 · unverdicted · novelty 6.0

RLBoost harvests preemptible GPUs for RL rollout via a hybrid architecture with adaptive offload, pull-based transfer, and token-level migration, delivering 1.51x-1.97x throughput and 28-49% better cost efficiency than on-demand-only setups.

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

cs.LG · 2025-05-30 · conditional · novelty 6.0

AReaL decouples generation and training in LLM reinforcement learning to achieve up to 2.77x speedup with matched or better performance on math and code benchmarks.

citing papers explorer

Showing 14 of 14 citing papers.

ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning cs.LG · 2026-05-09 · conditional · none · ref 68
ReLibra uses pre-known token-to-expert routing from RL rollouts to perform inter-batch expert reordering and intra-batch replication, delivering up to 1.6x higher throughput than Megatron-LM and 1.2x over oracle-equipped EPLB while staying within 6-10% of an ideal balanced baseline.
AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs cs.LG · 2026-05-15 · unverdicted · none · ref 50
AstraFlow decouples RL components into autonomous dataflow services to natively support multi-policy agentic LLM training, elastic scaling, and cross-region execution with 2.7x speedup on math, code, search, and AgentBench workloads.
BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning cs.LG · 2026-05-09 · unverdicted · none · ref 26
BubbleSpec exploits long-tail bubbles in synchronous RL by using faster ranks' idle time to pre-generate rollout drafts for speculative decoding, reducing steps by 50% and raising throughput up to 1.8x while preserving exact synchrony.
HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments cs.DC · 2025-12-13 · unverdicted · none · ref 49
HetRL delivers up to 9.17x higher throughput for LLM RL training on heterogeneous GPUs by using hybrid and ILP-based schedulers to solve a joint optimization problem over computation and data dependencies.
How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning cs.LG · 2026-05-17 · conditional · none · ref 38
Mu-GRPO enables substantially more off-policy GRPO training for LLMs via relaxed clipping and negative-advantage veto in large staged batches, matching standard GRPO performance at ~2x training speed.
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration cs.LG · 2026-05-08 · unverdicted · none · ref 38
FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.
Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models cs.LG · 2026-05-08 · unverdicted · none · ref 121
Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing identified as favorable on the stability-support trade-off.
ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL cs.DC · 2026-05-07 · unverdicted · none · ref 96 · 2 links
ROSE is a system for cooperative elasticity that co-locates serving and rollout models on shared GPUs, delivering 1.3-3.3x higher end-to-end throughput than fixed-resource baselines while preserving serving SLOs.
DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training cs.LG · 2026-04-29 · unverdicted · none · ref 23
DORA's multi-version streaming rollout enables 2-3x higher throughput in asynchronous RL for LLMs while preserving convergence by maintaining policy consistency, data integrity, and bounded staleness.
JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training cs.LG · 2026-04-26 · unverdicted · none · ref 68
JigsawRL achieves up to 1.85x higher throughput in LLM RL pipelines via pipeline multiplexing, sub-stage graphs, and look-ahead scheduling compared to prior systems.
TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training cs.DC · 2026-04-10 · unverdicted · none · ref 46
TensorHub uses Reference-Oriented Storage to enable scalable weight transfer in LLM RL training by referencing replicated GPU weights, achieving up to 19x reduction in cross-datacenter stall time.
Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning cs.DC · 2025-11-18 · unverdicted · none · ref 54
Seer improves synchronous LLM RL rollout throughput by up to 2.04x and reduces long-tail latency by 72-94% via divided rollout, context-aware scheduling, and adaptive grouped speculative decoding based on prompt similarity observations.
RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs cs.DC · 2025-10-22 · unverdicted · none · ref 52
RLBoost harvests preemptible GPUs for RL rollout via a hybrid architecture with adaptive offload, pull-based transfer, and token-level migration, delivering 1.51x-1.97x throughput and 28-49% better cost efficiency than on-demand-only setups.
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning cs.LG · 2025-05-30 · conditional · none · ref 70
AReaL decouples generation and training in LLM reinforcement learning to achieve up to 2.77x speedup with matched or better performance on math and code benchmarks.

Streamrl: Scalable, heterogeneous, and elastic rl for llms with disaggregated stream generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer