PRISM is a contrastive, policy-aware training framework for process reward models that reduces false positives by 22% on PRMBench and boosts downstream accuracy up to 33% in Best-of-N selection by learning reliable relative comparisons instead of pointwise labels.
arXiv preprint arXiv:2402.13210 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
BaRA adds Bayesian adaptive rank allocation to LoRA fine-tuning by activating sparse instance-specific latent factors, with a generalization bound depending on learned joint effective rank rather than fixed maximum rank.
citing papers explorer
-
The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning
PRISM is a contrastive, policy-aware training framework for process reward models that reduces false positives by 22% on PRMBench and boosts downstream accuracy up to 33% in Best-of-N selection by learning reliable relative comparisons instead of pointwise labels.
-
BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning
BaRA adds Bayesian adaptive rank allocation to LoRA fine-tuning by activating sparse instance-specific latent factors, with a generalization bound depending on learned joint effective rank rather than fixed maximum rank.