arXiv preprint arXiv:2503.22480 , year=

URLhttps://arxiv · arXiv 2503.22480

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Variance-aware Reward Modeling with Anchor Guidance

stat.ML · 2026-05-12 · unverdicted · novelty 7.0

Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.

A Unifying Lens on Reward Uncertainty in RLHF

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

A distributional reward model p(r|x,y) yields the closed-form effective reward ilde r(x,y) = eta ext{log} ext{E}_p[e^{r/eta}] (pessimistic branch) that unifies prior RLHF aggregation heuristics under Bayesian or KL-DRO views.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Variance-aware Reward Modeling with Anchor Guidance stat.ML · 2026-05-12 · unverdicted · none · ref 50
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
A Unifying Lens on Reward Uncertainty in RLHF cs.LG · 2026-06-08 · unverdicted · none · ref 16
A distributional reward model p(r|x,y) yields the closed-form effective reward ilde r(x,y) = eta ext{log} ext{E}_p[e^{r/eta}] (pessimistic branch) that unifies prior RLHF aggregation heuristics under Bayesian or KL-DRO views.

arXiv preprint arXiv:2503.22480 , year=

fields

years

verdicts

representative citing papers

citing papers explorer