pith. sign in
Pith Number

pith:CVEHX7XF

pith:2024:CVEHX7XFUKF5DJRZPQYQDRXULA
not attested not anchored not stored refs resolved

Self-Preference Bias in LLM-as-a-Judge

Koki Wataoka, Ryokan Ri, Tsubasa Takahashi

LLMs as judges give higher scores to low-perplexity outputs than humans, even for non-self-generated text.

arxiv:2410.21819 v2 · 2024-10-29 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CVEHX7XFUKF5DJRZPQYQDRXULA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

LLMs assign significantly higher evaluations to outputs with lower perplexity than human evaluators, regardless of whether the outputs were self-generated. This suggests that the essence of the bias lies in perplexity.

C2weakest assumption

That the introduced quantitative metric isolates self-preference bias from other confounding factors in LLM judgments and that the observed correlation with perplexity is causal rather than correlational.

C3one line summary

LLMs judge their own outputs higher because they assign better scores to lower-perplexity text, even when the text is not self-generated.

References

19 extracted · 19 resolved · 3 Pith anchors

[1] Daniel Deutsch, Rotem Dror, and Dan Roth 2022
[2] On the limitations of reference-free evaluations of generated text 2022 · doi:10.18653/v1/2022.emnlp-main.753
[3] Bowman, and Shi Feng
[4] Chenghao Yang, Sida Li, and Ari Holtzman
[5] Large language models are inconsistent and biased evaluators

Formal links

2 machine-checked theorem links

Cited by

27 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.319961Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

15487bfee5a28bd1a6397c3101c6f45813b6d8a765e6b9054f054e2bec2ead97

Aliases

arxiv: 2410.21819 · arxiv_version: 2410.21819v2 · doi: 10.48550/arxiv.2410.21819 · pith_short_12: CVEHX7XFUKF5 · pith_short_16: CVEHX7XFUKF5DJRZ · pith_short_8: CVEHX7XF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CVEHX7XFUKF5DJRZPQYQDRXULA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 15487bfee5a28bd1a6397c3101c6f45813b6d8a765e6b9054f054e2bec2ead97
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "3939e6fde3f28e3e5448035dc81ace7f4a29cf0370051bdb1e096f49993cfa03",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-10-29T07:42:18Z",
    "title_canon_sha256": "3b78afe76893d6460aba6e3ad0163ef7cd549dbc3cff746644ae9a9fa89484a6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2410.21819",
    "kind": "arxiv",
    "version": 2
  }
}