pith. sign in

← back to paper

Review history

arxiv: 2601.22478 · 2 revisions

Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language Models

  1. 2026-05-21 CONDITIONAL MODERATE v0.9.0 novelty 6.0
    50925 ms 5887 in 1356 out 2026-05-21T14:06:35.845358+00:00
  2. 2026-05-16 UNVERDICTED LOW v0.9.0 novelty 6.0
    29575 ms 5656 in 1397 out 2026-05-16T09:38:57.936853+00:00