pith. machine review for the scientific record. sign in

← back to paper

Review history

arxiv: 2604.23478 · 2 revisions

JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems

  1. 2026-05-11 UNVERDICTED LOW v0.9.0 novelty 7.0
    49085 ms 5478 in 1289 out 2026-05-11T01:00:31.314562+00:00
  2. 2026-05-08 UNVERDICTED LOW v0.9.0 novelty 7.0
    55936 ms 5471 in 1372 out 2026-05-08T06:27:24.136845+00:00