pith. sign in

arxiv: 2402.17257 · v4 · pith:MWMMPJ4Dnew · submitted 2024-02-27 · 💻 cs.LG · cs.AI· cs.RO

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

classification 💻 cs.LG cs.AIcs.RO
keywords pbrlrewardrimelearningpreferencesrobustmethodnoisy
0
0 comments X
read the original abstract

Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method utilizes a sample selection-based discriminator to dynamically filter out noise and ensure robust training. To counteract the cumulative error stemming from incorrect selection, we suggest a warm start for the reward model, which additionally bridges the performance gap during the transition from pre-training to online training in PbRL. Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art PbRL method. Code is available at https://github.com/CJReinforce/RIME_ICML2024.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Efficient Preference Poisoning Attack on Offline RLHF

    cs.LG 2026-05 unverdicted novelty 8.0

    Label-flip attacks on log-linear DPO reduce to binary sparse approximation problems that can be solved efficiently by lattice-based and binary matching pursuit methods with recovery guarantees.

  2. Efficient Preference Poisoning Attack on Offline RLHF

    cs.LG 2026-05 unverdicted novelty 7.0

    Preference poisoning against log-linear DPO reduces to a binary sparse approximation problem solved by lattice-reduction (BAL-A) and matching-pursuit (BMP-A) algorithms that carry recovery guarantees.