Reward as an Agent combined with DynDiff-GRPO enables diversified exploration in embodied RL world models while mitigating reward hacking via robust verification, yielding accuracy gains on open-source models.
Unified personalized reward model for vision generation.arXiv preprint arXiv:2602.02380, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Reward as An Agent for Embodied World Models
Reward as an Agent combined with DynDiff-GRPO enables diversified exploration in embodied RL world models while mitigating reward hacking via robust verification, yielding accuracy gains on open-source models.