Review history

arxiv: 2604.23478 · 2 revisions

JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems

2026-05-11 UNVERDICTED LOW v0.9.0 novelty 7.0

49085 ms 5478 in 1289 out 2026-05-11T01:00:31.314562+00:00
2026-05-08 UNVERDICTED LOW v0.9.0 novelty 7.0

55936 ms 5471 in 1372 out 2026-05-08T06:27:24.136845+00:00