pith. sign in

← back to paper

Review history

arxiv: 2605.09063 · 2 revisions

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

  1. 2026-05-20 UNVERDICTED LOW v0.9.0 novelty 8.0
    50600 ms 6182 in 1371 out 2026-05-20T22:20:28.973443+00:00
  2. 2026-05-12 UNVERDICTED LOW v0.9.0 novelty 8.0
    30929 ms 5911 in 1066 out 2026-05-12T01:49:00.166940+00:00