Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

Jianing Yu; Jiasi Chen; Lixiong Qin; Sheng Gao; Sheng Yang; Weiran Xu; Yingjie Feng; Yuchen Liu

arxiv: 2605.29697 · v1 · pith:U6RNDZT5new · submitted 2026-05-28 · 💻 cs.AI

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

Yuchen Liu , Yingjie Feng , Lixiong Qin , Jiasi Chen , Jianing Yu , Sheng Gao , Sheng Yang , Weiran Xu This is my paper

classification 💻 cs.AI

keywords graphstep-levelrewardsearchadvantagesagenticanswergdcr

0 comments

read the original abstract

In Agentic Search, trajectory-level outcome rewards fail to quantify the behavioral contributions of individual steps, while existing step-level reward methods typically rely on costly tree sampling. We view world knowledge as a latent world graph and each IS task as search within a latent task graph, where effective steps should make graph progress toward the answer node. Based on this prior, we propose Graph-Distance Contribution Reward (GDCR), a step-level process reward that scores newly-retrieved and newly-cited entities by their distance to the answer node in a training-time Entity-Relation (ER) graph. We further propose Step Advantage Policy Optimization (SAPO), which converts GDCR into step-level advantages and combines them with trajectory-level outcome advantages. Experiments on four challenging benchmarks validate the effectiveness of our method.

This paper has not been read by Pith yet.

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

discussion (0)