RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

Bo Han; Dahai Yu; Jiangchao Yao; Jiaqi Fan; Ka Ho Li; Michael Kwok-Po Ng; Xiao Feng; Zhanke Zhou

arxiv: 2603.18859 · v2 · pith:NI66TQHPnew · submitted 2026-03-19 · 💻 cs.AI · cs.CL· cs.LG

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

Xiao Feng , Bo Han , Zhanke Zhou , Jiaqi Fan , Jiangchao Yao , Ka Ho Li , Dahai Yu , Michael Kwok-Po Ng This is my paper

classification 💻 cs.AI cs.CLcs.LG

keywords rewardflowagenticreasoningrewardrewardsstateacrossgraphs

0 comments

read the original abstract

Reinforcement learning (RL) shows promise for enhancing LLM agentic reasoning, yet sparse terminal rewards hinder fine-grained optimization. Process reward modeling offers an alternative but incurs high computational costs, reward hacking risks, and annotation bottlenecks. We introduce RewardFlow, a lightweight method for estimating state-level rewards in agentic reasoning. By constructing state graphs that capture the intrinsic topological structure of trajectories, RewardFlow performs topology-aware propagation to estimate each state's contribution to success, yielding principled, annotation-free dense rewards. Used for RL optimization, RewardFlow substantially outperforms prior baselines across four agentic benchmarks: +6.2% average success rate on text-based tasks, +29.7% on visual reasoning over the strongest baseline across three model scales, and +10% accuracy on DeepResearch, with superior robustness and training efficiency. The implementation of RewardFlow is publicly available at https://github.com/tmlr-group/RewardFlow.

This paper has not been read by Pith yet.

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

discussion (0)