pith. sign in

← back to paper

Review history

arxiv: 2605.06139 · 2 revisions

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

  1. 2026-05-21 UNVERDICTED LOW v0.9.0 novelty 6.0
    61144 ms 5774 in 1291 out 2026-05-21T08:58:13.137733+00:00
  2. 2026-05-08 UNVERDICTED LOW v0.9.0 novelty 6.0
    31211 ms 5543 in 1481 out 2026-05-08T13:51:51.705037+00:00