SelectiveRM applies optimal transport with a joint consistency discrepancy and partial mass relaxation to produce reward models that optimize a tighter upper bound on clean risk while autonomously dropping noisy preference samples.
IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
RGBT combines GMM-derived instance reliability weights with a Bayes-label transition matrix to achieve consistent, low-variance estimation from noisy implicit feedback while using all samples.
citing papers explorer
-
Optimal Transport for LLM Reward Modeling from Noisy Preference
SelectiveRM applies optimal transport with a joint consistency discrepancy and partial mass relaxation to produce reward models that optimize a tighter upper bound on clean risk while autonomously dropping noisy preference samples.
-
Robust Recommendation from Noisy Implicit Feedback: A GMM-Weighted Bayes-label Transition Matrix Framework
RGBT combines GMM-derived instance reliability weights with a Bayes-label transition matrix to achieve consistent, low-variance estimation from noisy implicit feedback while using all samples.