DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training

Dian Yang; Jiaming Xu; Jiarui Hu; Jinlong Hou; Liming Liu; Mingjun Zhang; Ping Zhang; Siyuan Feng; Tianyi Zhou; Tongyu Wang

REVIEW

DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 2507.13833 v4 pith:4RVOZWSS submitted 2025-07-18 cs.DC

DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training

Zhixin Wang , Jiaming Xu , Tianyi Zhou , Mingjun Zhang , Liming Liu , Jiarui Hu , Dian Yang , Tongyu Wang

show 5 more authors

Ping Zhang Jinlong Hou Siyuan Feng Yuan Qi Yuan Cheng

This is my paper

classification cs.DC

keywords datadistflowcontroldistributedarchitecturecommunicationefficientexecution

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

Effectively scaling Reinforcement Learning (RL) is crucial for enhancing the reasoning and alignment of Large Language Models. The massive data and complex execution flows inherent in these tasks require a distributed architecture capable of efficient scaling. However, to simplify programming and dependency management, mainstream frameworks often rely on a centralized architecture where a single node dispatches both control and data. This inherent coupling creates significant communication bottlenecks, severely limiting system scalability and efficiency. We present DISTFLOW, a novel, fully distributed RL framework that adopts a multi-controller paradigm. By decoupling data transmission from control dispatch, DISTFLOW establishes a parallelism-aware, decentralized Data Coordinator that leverages local caching, load balancing, and asynchronous double buffer to minimize communication overhead and mitigate straggler effects. For control logic, it introduces a task scheduler built upon Directed Acyclic Graph (DAG) that facilitates fine-grained, independent execution. Experimental results demonstrate that DISTFLOW achieves near-linear scalability up to 512 GPUs and delivers up to a 2.63x throughput improvement over state-of-the-art (SOTA) frameworks. The source code is available at: https://github.com/sii-research/siiRL.

DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training

discussion (0)