Let’s verify step by step

· 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

REC-RL: Referring expression counting via Gaussian and range-based reward optimization

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

REC-RL applies Group Relative Policy Optimization with combined range and Gaussian accuracy rewards plus a format reward to improve referring expression counting.

LLM Reasoning with Process Rewards for Outcome-Guided Steps

cs.LG · 2026-02-08 · unverdicted · novelty 5.0

PROGRS uses outcome-conditioned centering on PRM scores to safely integrate process rewards into GRPO for improved Pass@1 on math benchmarks.

Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization

cs.AI · 2025-08-13 · unverdicted · novelty 5.0

LCPO reduces average LRM output length by over 50% across benchmarks via targeted preference optimization while preserving reasoning performance.

citing papers explorer

Showing 3 of 3 citing papers.

REC-RL: Referring expression counting via Gaussian and range-based reward optimization cs.CV · 2026-05-15 · unverdicted · none · ref 16
REC-RL applies Group Relative Policy Optimization with combined range and Gaussian accuracy rewards plus a format reward to improve referring expression counting.
LLM Reasoning with Process Rewards for Outcome-Guided Steps cs.LG · 2026-02-08 · unverdicted · none · ref 1
PROGRS uses outcome-conditioned centering on PRM scores to safely integrate process rewards into GRPO for improved Pass@1 on math benchmarks.
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization cs.AI · 2025-08-13 · unverdicted · none · ref 40
LCPO reduces average LRM output length by over 50% across benchmarks via targeted preference optimization while preserving reasoning performance.

Let’s verify step by step

fields

years

verdicts

representative citing papers

citing papers explorer