arXiv preprint arXiv:2402.13210 , year=

Bayesian reward models for LLM alignment , author= · 2024 · arXiv 2402.13210

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

PRISM is a contrastive, policy-aware training framework for process reward models that reduces false positives by 22% on PRMBench and boosts downstream accuracy up to 33% in Best-of-N selection by learning reliable relative comparisons instead of pointwise labels.

BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning

cs.LG · 2026-06-28 · unverdicted · novelty 5.0

BaRA adds Bayesian adaptive rank allocation to LoRA fine-tuning by activating sparse instance-specific latent factors, with a generalization bound depending on learned joint effective rank rather than fixed maximum rank.

citing papers explorer

Showing 2 of 2 citing papers after filters.

The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning cs.LG · 2026-06-08 · unverdicted · none · ref 20
PRISM is a contrastive, policy-aware training framework for process reward models that reduces false positives by 22% on PRMBench and boosts downstream accuracy up to 33% in Best-of-N selection by learning reliable relative comparisons instead of pointwise labels.
BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning cs.LG · 2026-06-28 · unverdicted · none · ref 12
BaRA adds Bayesian adaptive rank allocation to LoRA fine-tuning by activating sparse instance-specific latent factors, with a generalization bound depending on learned joint effective rank rather than fixed maximum rank.

arXiv preprint arXiv:2402.13210 , year=

fields

years

verdicts

representative citing papers

citing papers explorer