Review history
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States
-
2026-05-12 UNVERDICTED
-
2026-05-11 UNVERDICTED
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States