Groundedprm: Tree-guided and fidelity-aware process reward modeling for step-level reasoning

Zhang, Yao, Wu, Yu, Zhang, Haowei, Li, Weiguo, Chen, Haokun, Wu, Jingpei · 2025 · arXiv 2510.14942

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment

cs.LG · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

DGPO is a critic-free RL framework that uses bounded Hellinger distance and entropy-gated advantage redistribution to enable fine-grained token-level credit assignment in long CoT generations for LLM alignment, reporting SOTA results on AIME benchmarks.

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

cs.LG · 2026-06-04 · unverdicted · novelty 5.0

RREDCoT approximates segment-level reward redistribution for CoT traces by querying the model itself, offering a lower-cost alternative to Monte Carlo credit assignment in reasoning-model RL.

citing papers explorer

Showing 2 of 2 citing papers after filters.

DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment cs.LG · 2026-05-05 · unverdicted · none · ref 35 · 2 links
DGPO is a critic-free RL framework that uses bounded Hellinger distance and entropy-gated advantage redistribution to enable fine-grained token-level credit assignment in long CoT generations for LLM alignment, reporting SOTA results on AIME benchmarks.
RREDCoT: Segment-Level Reward Redistribution for Reasoning Models cs.LG · 2026-06-04 · unverdicted · none · ref 117
RREDCoT approximates segment-level reward redistribution for CoT traces by querying the model itself, offering a lower-cost alternative to Monte Carlo credit assignment in reasoning-model RL.

Groundedprm: Tree-guided and fidelity-aware process reward modeling for step-level reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer