pith. sign in

arXiv preprint arXiv:2503.22480 , year=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

Variance-aware Reward Modeling with Anchor Guidance

stat.ML · 2026-05-12 · unverdicted · novelty 7.0

Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.

A Unifying Lens on Reward Uncertainty in RLHF

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

A distributional reward model p(r|x,y) yields the closed-form effective reward ilde r(x,y) = eta ext{log} ext{E}_p[e^{r/eta}] (pessimistic branch) that unifies prior RLHF aggregation heuristics under Bayesian or KL-DRO views.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Variance-aware Reward Modeling with Anchor Guidance stat.ML · 2026-05-12 · unverdicted · none · ref 50

    Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.

  • A Unifying Lens on Reward Uncertainty in RLHF cs.LG · 2026-06-08 · unverdicted · none · ref 16

    A distributional reward model p(r|x,y) yields the closed-form effective reward ilde r(x,y) = eta ext{log} ext{E}_p[e^{r/eta}] (pessimistic branch) that unifies prior RLHF aggregation heuristics under Bayesian or KL-DRO views.