StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.
Judging the judges: A systematic study of position bias in LLM -as-a-judge
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.
PrivacyAlign introduces a human-annotated dataset and annotation-conditioned reward modeling to align LLM agents with contextual privacy norms.
citing papers explorer
-
Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research
Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.