Prlm: Learning explicit reasoning for personalized rag via contrastive reward optimization

Kepu Zhang, Teng Shi, Weijie Yu, Jun Xu · 2020 · arXiv 2602.12116

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

PEBS applies Morris-James-Stein empirical-Bayes shrinkage to per-rater affine calibrators in RLHF, cutting within-user held-out RMSE by 8.58% on PRISM and 9.66% on PluriHarms versus pooled baselines.

Preference-Aware Rubric Learning for Personalized Evaluation

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

PARL formulates personalized LLM evaluation as a learning problem that induces preference-aware rubrics from raw user histories via discriminative RL and self-validation.

citing papers explorer

Showing 2 of 2 citing papers.

PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration cs.LG · 2026-06-25 · unverdicted · none · ref 21
PEBS applies Morris-James-Stein empirical-Bayes shrinkage to per-rater affine calibrators in RLHF, cutting within-user held-out RMSE by 8.58% on PRISM and 9.66% on PluriHarms versus pooled baselines.
Preference-Aware Rubric Learning for Personalized Evaluation cs.CL · 2026-05-29 · unverdicted · none · ref 32
PARL formulates personalized LLM evaluation as a learning problem that induces preference-aware rubrics from raw user histories via discriminative RL and self-validation.

Prlm: Learning explicit reasoning for personalized rag via contrastive reward optimization

fields

years

verdicts

representative citing papers

citing papers explorer