← back to paper
arxiv: 2604.08178 · 2 revisions
Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling