SARM2 presents RM, a multi-task stage-aware reward model achieving 80% lower value-estimation MSE, which when used in SPIRAL boosts manipulation task success from ~50% to near-perfect on several benchmarks.
arXiv preprint arXiv:2409.08687 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
AIDA augments scarce target data for sim-to-real visual RL by adaptively truncating unreliable imagined rollouts via a distribution-shift-aware discriminator and applying self-consistency loss on reliable state reconstructions.
citing papers explorer
-
SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation
SARM2 presents RM, a multi-task stage-aware reward model achieving 80% lower value-estimation MSE, which when used in SPIRAL boosts manipulation task success from ~50% to near-perfect on several benchmarks.
-
Domain Adaptation with Adaptive Imagination for Visual Reinforcement Learning under Limited Target Data
AIDA augments scarce target data for sim-to-real visual RL by adaptively truncating unreliable imagined rollouts via a distribution-shift-aware discriminator and applying self-consistency loss on reliable state reconstructions.