LLM-as-a-Judge systems exhibit significant biases in specific tasks despite strong overall performance, as measured by the new CALM quantification framework.
If the prompt encourages responses that contain cited information that might be false, it is considered Authority Bias
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
LLM-as-a-Judge systems exhibit significant biases in specific tasks despite strong overall performance, as measured by the new CALM quantification framework.