CORRECT” if the step is logically and mathematically correct •“INCORRECT

Are there any errors in the step? Respond with EXACTLY one of: •“CORRECT” if the step is logically, mathematically correct •“INCORRECT” if the step contains any error C Training Hyperparameters Table 4Training Hyperparameters · 2048

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Step-wise Rubric Rewards for LLM Reasoning

cs.LG · 2026-05-17 · conditional · novelty 6.0

SRaR attributes rubric items to specific steps via an LLM judge, normalizes per-step scores across rollouts, and combines them with outcome rewards via a decoupled advantage estimator, yielding 3.57-point accuracy gains on Qwen3-8B across math benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Step-wise Rubric Rewards for LLM Reasoning cs.LG · 2026-05-17 · conditional · none · ref 29
SRaR attributes rubric items to specific steps via an LLM judge, normalizes per-step scores across rollouts, and combines them with outcome rewards via a decoupled advantage estimator, yielding 3.57-point accuracy gains on Qwen3-8B across math benchmarks.

CORRECT” if the step is logically and mathematically correct •“INCORRECT

fields

years

verdicts

representative citing papers

citing papers explorer