Pebble: Feedback-efficient inter- active reinforcement learning via relabeling experience and unsuper- vised pre-training,

· 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning

cs.RO · 2026-05-01 · unverdicted · novelty 6.0

PrefMoE learns multiple reward experts with adaptive soft routing and a load-balancing regularizer to capture diverse latent preferences under noisy supervision, improving robustness over single-model baselines on D4RL locomotion and MetaWorld manipulation tasks.

citing papers explorer

Showing 1 of 1 citing paper.

PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning cs.RO · 2026-05-01 · unverdicted · none · ref 12
PrefMoE learns multiple reward experts with adaptive soft routing and a load-balancing regularizer to capture diverse latent preferences under noisy supervision, improving robustness over single-model baselines on D4RL locomotion and MetaWorld manipulation tasks.

Pebble: Feedback-efficient inter- active reinforcement learning via relabeling experience and unsuper- vised pre-training,

fields

years

verdicts

representative citing papers

citing papers explorer