Review history

arxiv: 2605.07579 · 2 revisions

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

2026-05-12 UNVERDICTED LOW v0.9.0 novelty 7.0

133505 ms 5576 in 1433 out 2026-05-12T03:56:50.835037+00:00
2026-05-11 UNVERDICTED LOW v0.9.0 novelty 7.0

46186 ms 5576 in 1446 out 2026-05-11T02:05:00.544734+00:00