InfoDelphi partitions evidence to induce information asymmetry in multi-agent LLM deliberation, yielding 12-18% Brier score gains and 4-8 pp accuracy gains on a 375-question benchmark.
arXiv preprint arXiv:2602.08003 , year =
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
Nine LLM judges on three NLI datasets with human labels provide only ~2 effective independent votes due to correlated errors, underperforming independent voting by 8-22 points and matched or beaten by the best single judge.
LLM reliability techniques are unified as communication channel operators, with a new cost-aware router achieving superior quality-cost tradeoffs on hard tasks.
citing papers explorer
-
Nine Judges, Two Effective Votes: Correlated Errors Undermine LLM Evaluation Panels
Nine LLM judges on three NLI datasets with human labels provide only ~2 effective independent votes due to correlated errors, underperforming independent voting by 8-22 points and matched or beaten by the best single judge.