PrefMoE learns multiple reward experts with adaptive soft routing and a load-balancing regularizer to capture diverse latent preferences under noisy supervision, improving robustness over single-model baselines on D4RL locomotion and MetaWorld manipulation tasks.
Pebble: Feedback-efficient inter- active reinforcement learning via relabeling experience and unsuper- vised pre-training,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning
PrefMoE learns multiple reward experts with adaptive soft routing and a load-balancing regularizer to capture diverse latent preferences under noisy supervision, improving robustness over single-model baselines on D4RL locomotion and MetaWorld manipulation tasks.