StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.
Judging the judges: A systematic study of position bias in LLM -as-a-judge
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.
PrivacyAlign introduces a human-annotated dataset and annotation-conditioned reward modeling to align LLM agents with contextual privacy norms.
citing papers explorer
-
StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs
StylisticBias benchmark shows 15 visual attributes explain nearly 80% of bias variation in six MLLMs by isolating single cues like age and fashion in generated images.
-
Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research
Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.
-
PrivacyAlign: Contextual Privacy Alignment for LLM Agents
PrivacyAlign introduces a human-annotated dataset and annotation-conditioned reward modeling to align LLM agents with contextual privacy norms.