← back to paper
arxiv: 2604.23478 · 2 revisions
JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems