L e TS : Learning to Think-and-Search via Process-and-Outcome Reward Hybridization

Zhang, Qi, Yang, Shouqing, Gao, Lirong, Chen, Hao, Hu, Xiaomeng, Chen, Jinglei · 2025 · DOI 10.18653/v1/2025.emnlp-main.257

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

ARBOR introduces a reusable rubric buffer that consolidates contrastive trajectory drafts into cross-query rubrics for online process rewards, outperforming GRPO and DAPO on multi-hop QA benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents cs.CL · 2026-06-02 · unverdicted · none · ref 27
ARBOR introduces a reusable rubric buffer that consolidates contrastive trajectory drafts into cross-query rubrics for online process rewards, outperforming GRPO and DAPO on multi-hop QA benchmarks.

L e TS : Learning to Think-and-Search via Process-and-Outcome Reward Hybridization

fields

years

verdicts

representative citing papers

citing papers explorer