Judging the judges: A systematic study of position bias in LLM -as-a-judge

Lin Shi, Chiyu Ma, Wenhua Liang, Xingjian Diao, Weicheng Ma, Soroush Vosoughi · 2025 · DOI 10.18653/v1/2025.ijcnlp-long.18

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

cs.CL · 2026-06-18 · unverdicted · novelty 7.0

StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.

Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research

cs.CE · 2026-06-01 · unverdicted · novelty 7.0

Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.

PrivacyAlign: Contextual Privacy Alignment for LLM Agents

cs.CL · 2026-06-19 · unverdicted · novelty 5.0

PrivacyAlign introduces a human-annotated dataset and annotation-conditioned reward modeling to align LLM agents with contextual privacy norms.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research cs.CE · 2026-06-01 · unverdicted · none · ref 59
Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.

Judging the judges: A systematic study of position bias in LLM -as-a-judge

fields

years

verdicts

representative citing papers

citing papers explorer