Skill-RM unifies heterogeneous reward criteria by modeling reward computation as dynamic execution of a reusable Reward-Evaluation Skill within an agent framework.
Everyone deserves a reward: Learning customized human preferences.arXiv preprint arXiv:2309.03126, 2023
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
PAFO applies Pareto fairness optimization and group-specialized distillation to produce a single personalized reward model that improves accuracy for both majority and minority preference groups without requiring group labels at inference.
citing papers explorer
-
PAFO: Pareto Fairness Optimization for Personalized Reward Modeling
PAFO applies Pareto fairness optimization and group-specialized distillation to produce a single personalized reward model that improves accuracy for both majority and minority preference groups without requiring group labels at inference.