arXiv preprint arXiv:2602.03619 , year=

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation , author= · arXiv 2602.03619

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

Co-ReAct adds step-level rubric guidance to ReAct agents via a GRPO-trained generator using list-wise ranking rewards, yielding consistent gains on DeepResearchBench and SQA-CS-V2.

TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

TRACE improves math reasoning by distilling only on annotator-marked critical spans with forward KL on correct key spans, optional reverse KL on errors, and GRPO elsewhere, gaining 2.76 points over GRPO while preserving OOD performance.

citing papers explorer

Showing 2 of 2 citing papers.

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents cs.AI · 2026-05-22 · unverdicted · none · ref 17
Co-ReAct adds step-level rubric guidance to ReAct agents via a GRPO-trained generator using list-wise ranking rewards, yielding consistent gains on DeepResearchBench and SQA-CS-V2.
TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment cs.AI · 2026-05-11 · unverdicted · none · ref 10
TRACE improves math reasoning by distilling only on annotator-marked critical spans with forward KL on correct key spans, optional reverse KL on errors, and GRPO elsewhere, gaining 2.76 points over GRPO while preserving OOD performance.

arXiv preprint arXiv:2602.03619 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer