RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

Fei-Yue Wang; Gang Xiong; Jie Cheng; Qinghai Miao; Xingyuan Dai; Yisheng Lv

arxiv: 2402.17257 · v4 · pith:MWMMPJ4Dnew · submitted 2024-02-27 · 💻 cs.LG · cs.AI· cs.RO

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

Jie Cheng , Gang Xiong , Xingyuan Dai , Qinghai Miao , Yisheng Lv , Fei-Yue Wang This is my paper

classification 💻 cs.LG cs.AIcs.RO

keywords pbrlrewardrimelearningpreferencesrobustmethodnoisy

0 comments

read the original abstract

Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method utilizes a sample selection-based discriminator to dynamically filter out noise and ensure robust training. To counteract the cumulative error stemming from incorrect selection, we suggest a warm start for the reward model, which additionally bridges the performance gap during the transition from pre-training to online training in PbRL. Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art PbRL method. Code is available at https://github.com/CJReinforce/RIME_ICML2024.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Efficient Preference Poisoning Attack on Offline RLHF
cs.LG 2026-05 unverdicted novelty 8.0

Label-flip attacks on log-linear DPO reduce to binary sparse approximation problems that can be solved efficiently by lattice-based and binary matching pursuit methods with recovery guarantees.
Efficient Preference Poisoning Attack on Offline RLHF
cs.LG 2026-05 unverdicted novelty 7.0

Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.