LCA frames outcome-supervised PRM training as MIL, introduces SWS pooling for dependent steps, proves Bayes consistency under mild assumptions, and reports consistent gains over prior outcome-supervised baselines.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
The Weakest Link Tells It All: Outcome-Supervised Process Reward Modeling via Learnable Credit Assignment
LCA frames outcome-supervised PRM training as MIL, introduces SWS pooling for dependent steps, proves Bayes consistency under mild assumptions, and reports consistent gains over prior outcome-supervised baselines.