pith. machine review for the scientific record. sign in

← back to paper

Review history

arxiv: 2605.02073 · 2 revisions

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

  1. 2026-05-11 UNVERDICTED LOW v0.9.0 novelty 7.0
    50453 ms 5666 in 1266 out 2026-05-11T02:07:20.533395+00:00
  2. 2026-05-08 ACCEPT MODERATE v0.9.0 novelty 7.0
    61157 ms 5666 in 1553 out 2026-05-08T19:09:02.606346+00:00