Generalist reward models: Found inside large language models.arXiv preprint arXiv:2506.23235

Generalist Reward Models: Found Inside Large Language Models , author= · arXiv 2506.23235

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization

cs.CL · 2026-04-14 · unverdicted · novelty 6.0

IPVRM learns prefix values to produce reliable step rewards from sequence outcomes using TD learning, enabling distribution-level RL that improves reasoning when paired with calibrated rewards.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

cs.AI · 2025-07-28 · accept · novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

citing papers explorer

Showing 2 of 2 citing papers.

Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization cs.CL · 2026-04-14 · unverdicted · none · ref 7
IPVRM learns prefix values to produce reliable step rewards from sequence outcomes using TD learning, enabling distribution-level RL that improves reasoning when paired with calibrated rewards.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence cs.AI · 2025-07-28 · accept · none · ref 252
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

Generalist reward models: Found inside large language models.arXiv preprint arXiv:2506.23235

fields

years

verdicts

representative citing papers

citing papers explorer