Review history

arxiv: 2605.10810 · 2 revisions

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities

2026-05-19 UNVERDICTED LOW v0.9.0 novelty 7.0

31386 ms 5853 in 1291 out 2026-05-19T17:15:49.960817+00:00
2026-05-12 UNVERDICTED LOW v0.9.0 novelty 6.0

33934 ms 5622 in 1207 out 2026-05-12T05:37:49.454351+00:00