Unified personalized reward model for vision generation.arXiv preprint arXiv:2602.02380, 2026

Yibin Wang, Yuhang Zang, Feng Han, Jiazi Bu, Yujie Zhou, Cheng Jin, Jiaqi Wang · 2026 · arXiv 2602.02380

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Reward as An Agent for Embodied World Models

cs.AI · 2026-06-18 · unverdicted · novelty 7.0

Reward as an Agent combined with DynDiff-GRPO enables diversified exploration in embodied RL world models while mitigating reward hacking via robust verification, yielding accuracy gains on open-source models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Reward as An Agent for Embodied World Models cs.AI · 2026-06-18 · unverdicted · none · ref 8
Reward as an Agent combined with DynDiff-GRPO enables diversified exploration in embodied RL world models while mitigating reward hacking via robust verification, yielding accuracy gains on open-source models.

Unified personalized reward model for vision generation.arXiv preprint arXiv:2602.02380, 2026

fields

years

verdicts

representative citing papers

citing papers explorer