Freshness-Aware PER augments prioritized experience replay with exponential age decay based on effective sample size to enable successful reuse of trajectories in LLM and VLM reinforcement learning, outperforming on-policy baselines on agentic tasks.
Part ii: Roll flash–accelerating rlvr and agentic training with asynchrony
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
NExt accelerates RLVR training for LLMs by nonlinearly extrapolating low-rank parameter trajectories extracted from LoRA runs.
ROSE is a system for cooperative elasticity that co-locates serving and rollout models on shared GPUs, delivering 1.3-3.3x higher end-to-end throughput than fixed-resource baselines while preserving serving SLOs.
Seer improves synchronous LLM RL rollout throughput by up to 2.04x and reduces long-tail latency by 72-94% via divided rollout, context-aware scheduling, and adaptive grouped speculative decoding based on prompt similarity observations.
DARTS accelerates LLM RL training up to 1.77x by distribution-aware trajectory sampling and adaptive redundancy allocation that shapes rollouts toward conciseness without performance loss.
citing papers explorer
-
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
NExt accelerates RLVR training for LLMs by nonlinearly extrapolating low-rank parameter trajectories extracted from LoRA runs.
-
DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning
DARTS accelerates LLM RL training up to 1.77x by distribution-aware trajectory sampling and adaptive redundancy allocation that shapes rollouts toward conciseness without performance loss.