pith. sign in

← back to paper

Review history

arxiv: 2605.10810 · 2 revisions

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities

  1. 2026-05-19 UNVERDICTED LOW v0.9.0 novelty 7.0
    31386 ms 5853 in 1291 out 2026-05-19T17:15:49.960817+00:00
  2. 2026-05-12 UNVERDICTED LOW v0.9.0 novelty 6.0
    33934 ms 5622 in 1207 out 2026-05-12T05:37:49.454351+00:00