pith. sign in

Onereward: Unified mask-guided image generation via multi-task human preference learning

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

fields

cs.CV 6 cs.AI 1

years

2026 5 2025 2

verdicts

UNVERDICTED 7

roles

background 3

polarities

background 3

representative citing papers

RewardHarness: Self-Evolving Agentic Post-Training

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

cs.CV · 2026-02-11 · unverdicted · novelty 7.0

DiNa-LRM introduces a diffusion-native latent reward model using a noise-calibrated Thurstone likelihood on noisy states, matching VLM performance at lower compute in image alignment and preference optimization.

citing papers explorer

Showing 7 of 7 citing papers.