pith. machine review for the scientific record. sign in

arxiv: 2511.14617 · v3 · submitted 2025-11-18 · 💻 cs.DC · cs.LG

Recognition: unknown

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

Authors on Pith no claims yet
classification 💻 cs.DC cs.LG
keywords learninglong-tailrolloutseerlatencysynchronouscontextend-to-end
0
0 comments X
read the original abstract

Reinforcement Learning (RL) has emerged as a critical technique for advancing modern Large Language Models (LLMs), yet existing synchronous RL systems face severe performance bottlenecks. The rollout phase, which dominates end-to-end iteration time, suffers from substantial long-tail latency and poor resource utilization due to inherent workload imbalance. We present Seer, a novel context learning RL system that addresses these challenges through a key observation: requests sharing the same prompt exhibit strong similarities in output lengths and response patterns. Leveraging this insight, Seer introduces three coordinated techniques: (1) divided rollout for dynamic load balancing, (2) context-aware scheduling to mitigate long-tail request delays, and (3) adaptive grouped speculative decoding to accelerate generation. These mechanisms work in concert to markedly reduce long-tail latency and improve resource efficiency during rollout. Evaluations on production-grade RL workloads demonstrate that Seer achieves up to 2.04$\times$ end-to-end rollout throughput improvement compared to the state-of-the-art synchronous RL systems, while notably reducing long-tail latency by 72-94%.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning

    cs.LG 2026-05 conditional novelty 8.0

    ReLibra uses pre-known token-to-expert routing from RL rollouts to perform inter-batch expert reordering and intra-batch replication, delivering up to 1.6x higher throughput than Megatron-LM and 1.2x over oracle-equip...

  2. Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

  3. ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL

    cs.DC 2026-05 unverdicted novelty 6.0

    ROSE delivers 1.2-3.3x higher end-to-end throughput for agentic RL by safely co-using underutilized serving GPUs for rollouts while meeting serving SLOs.

  4. JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training

    cs.LG 2026-04 unverdicted novelty 6.0

    JigsawRL achieves up to 1.85x higher throughput in LLM RL pipelines via pipeline multiplexing, sub-stage graphs, and look-ahead scheduling compared to prior systems.