pith. sign in

← back to paper

Review history

arxiv: 2605.30719 · 2 revisions

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

  1. 2026-06-29 UNVERDICTED LOW v0.9.1-grok novelty 7.0
    23084 ms 5720 in 1054 out 2026-06-29T05:40:56.436395+00:00
  2. 2026-06-28 UNVERDICTED LOW v0.9.1-grok novelty 7.0
    32558 ms 5720 in 1187 out 2026-06-28T23:37:01.280712+00:00