pith. sign in

Pebble: Feedback-efficient inter- active reinforcement learning via relabeling experience and unsuper- vised pre-training,

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.RO 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning

cs.RO · 2026-05-01 · unverdicted · novelty 6.0

PrefMoE learns multiple reward experts with adaptive soft routing and a load-balancing regularizer to capture diverse latent preferences under noisy supervision, improving robustness over single-model baselines on D4RL locomotion and MetaWorld manipulation tasks.

citing papers explorer

Showing 1 of 1 citing paper.

  • PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning cs.RO · 2026-05-01 · unverdicted · none · ref 12

    PrefMoE learns multiple reward experts with adaptive soft routing and a load-balancing regularizer to capture diverse latent preferences under noisy supervision, improving robustness over single-model baselines on D4RL locomotion and MetaWorld manipulation tasks.