pith. sign in

Unified personalized reward model for vision generation.arXiv preprint arXiv:2602.02380, 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.AI 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

Reward as An Agent for Embodied World Models

cs.AI · 2026-06-18 · unverdicted · novelty 7.0

Reward as an Agent combined with DynDiff-GRPO enables diversified exploration in embodied RL world models while mitigating reward hacking via robust verification, yielding accuracy gains on open-source models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Reward as An Agent for Embodied World Models cs.AI · 2026-06-18 · unverdicted · none · ref 8

    Reward as an Agent combined with DynDiff-GRPO enables diversified exploration in embodied RL world models while mitigating reward hacking via robust verification, yielding accuracy gains on open-source models.