ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

Chris Tong; Eric Yang; Jiaqi Huang; Jie Xiao; Jingwei Song; Lynn Ai; Meng Chen; Qingnan Ren; Rymon Yu; Shuo Lu

arxiv: 2602.02192 · v5 · pith:FQB5Z6HRnew · submitted 2026-02-02 · 💻 cs.LG · cs.DC

ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning

Jingwei Song , Meng Chen , Jie Xiao , Qingnan Ren , Jiaqi Huang , Yangshen Deng , Chris Tong , Wanyi Chen

show 10 more authors

Suli Wang Zhisheng Chen Ziqian Bi Shuo Lu Yiqun Duan Xu Wang Rymon Yu Lynn Ai Eric Yang Tianyu Shi

This is my paper

classification 💻 cs.LG cs.DC

keywords disseminationecho-2rolloutlearningdistributedpost-trainingcentralizedcost

0 comments

read the original abstract

Reinforcement learning (RL) is a critical stage in post-training large language models (LLMs), involving repeated interaction between rollout generation, reward evaluation, and centralized learning. Distributing rollout execution offers opportunities to leverage more cost-efficient inference resources, but introduces challenges in wide-area coordination and policy dissemination. We present ECHO-2, a distributed RL framework for post-training with remote inference workers and non-negligible dissemination latency. ECHO-2 combines centralized learning with distributed rollouts and treats bounded policy staleness as a user-controlled parameter, enabling rollout generation, dissemination, and training to overlap. We introduce an overlap-based capacity model that relates training time, dissemination latency, and rollout throughput, yielding a practical provisioning rule for sustaining learner utilization. To mitigate dissemination bottlenecks and lower cost, ECHO-2 employs peer-assisted pipelined broadcast and cost-aware activation of heterogeneous workers. Experiments on GRPO post-training of LLMs ranging from 4B to 32B parameters under real wide-area bandwidth regimes show that ECHO-2 significantly improves cost efficiency while preserving RL reward comparable to strong baselines.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FAST: A Synergistic Framework of Attention and State-space Models for Spatiotemporal Traffic Prediction
cs.LG 2026-04 unverdicted novelty 4.0

FAST uses a Temporal-Spatial-Temporal structure with attention and Mamba modules plus learnable embeddings to achieve better accuracy on traffic prediction tasks than previous models.