pith. sign in

← back to paper

Review history

arxiv: 2510.01857 · 2 revisions

Learning Reasoning Rewards from Expert Demonstrations with Inverse Reinforcement Learning

  1. 2026-05-21 UNVERDICTED LOW v0.9.0 novelty 6.0
    49259 ms 5791 in 1306 out 2026-05-21T21:50:03.960776+00:00
  2. 2026-05-18 UNVERDICTED UNKNOWN v0.9.0 novelty 6.0
    37027 ms 5791 in 1125 out 2026-05-18T11:08:44.647830+00:00