pith. sign in

hub

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it
abstract

Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by evaluating and guiding reasoning at the step or trajectory level. This survey provides a systematic overview of PRMs through the full loop: how to generate process data, build PRMs, and use PRMs for test-time scaling and reinforcement learning. We summarize applications across math, code, text, multimodal reasoning, robotics, and agents, and review emerging benchmarks. Our goal is to clarify design spaces, reveal open challenges, and guide future research toward fine-grained, robust reasoning alignment.

hub tools

citation-role summary

background 4

citation-polarity summary

years

2026 9 2025 1

roles

background 4

polarities

background 4

representative citing papers

citing papers explorer

Showing 10 of 10 citing papers.