pith. sign in

← back to paper

Review history

arxiv: 2604.10825 · 2 revisions

CheeseBench: Evaluating Large Language Models on Rodent Behavioral Neuroscience Paradigms

  1. 2026-05-21 UNVERDICTED LOW v0.9.0 novelty 6.0
    46698 ms 5823 in 1450 out 2026-05-21T00:47:18.118551+00:00
  2. 2026-05-10 UNVERDICTED LOW v0.9.0 novelty 6.0
    52676 ms 5592 in 1192 out 2026-05-10T15:17:28.384818+00:00