Evaluating and mitigating llm-as-a-judge bias in communication systems, 2026

Jiaxin Gao, Chen Chen, Yanwen Jia, Xueluan Gong, Kwok-Yan Lam, Qian Wang · 2026 · arXiv 2510.12462

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

LLM judges exhibit high stability under neutral re-evaluation but substantial reversibility under targeted post-decision challenges, quantified via a new Evaluation Robustness Score (ERS).

An LLM-Native Psychometric Instrument Does Not Predict LLM Behavior: Evidence Across 25 Models

cs.HC · 2026-04-24 · conditional · novelty 7.0

An LLM-native five-factor psychometric instrument produces stable self-report structure but fails to predict observed behavior, and reveals a shared textual-surface bias between self-report and LLM judges that human raters do not share.

citing papers explorer

Showing 2 of 2 citing papers.

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges cs.AI · 2026-06-03 · unverdicted · none · ref 45
LLM judges exhibit high stability under neutral re-evaluation but substantial reversibility under targeted post-decision challenges, quantified via a new Evaluation Robustness Score (ERS).
An LLM-Native Psychometric Instrument Does Not Predict LLM Behavior: Evidence Across 25 Models cs.HC · 2026-04-24 · conditional · none · ref 10
An LLM-native five-factor psychometric instrument produces stable self-report structure but fails to predict observed behavior, and reveals a shared textual-surface bias between self-report and LLM judges that human raters do not share.

Evaluating and mitigating llm-as-a-judge bias in communication systems, 2026

fields

years

verdicts

representative citing papers

citing papers explorer