pith. machine review for the scientific record. sign in

← back to paper

Review history

arxiv: 2605.07579 · 2 revisions

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

  1. 2026-05-12 UNVERDICTED LOW v0.9.0 novelty 7.0
    133505 ms 5576 in 1433 out 2026-05-12T03:56:50.835037+00:00
  2. 2026-05-11 UNVERDICTED LOW v0.9.0 novelty 7.0
    46186 ms 5576 in 1446 out 2026-05-11T02:05:00.544734+00:00