arXiv preprint arXiv:2503.06987(2025)

Social Bias Benchmark for Generation: A Comparison of Generation, QA-Based Evaluations · 2025 · arXiv 2503.06987

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Intersectional Fairness in Large Language Models

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

LLMs are more accurate when answers match stereotypes in clear contexts, especially for race-gender combinations, and no tested model shows consistent fairness or reliability across intersectional groups.

Bias in the Loop: Auditing LLM-as-a-Judge for Software Engineering

cs.SE · 2026-04-18 · unverdicted · novelty 5.0

LLM judges for code tasks show high sensitivity to prompt biases that systematically favor certain options, changing accuracy and model rankings even when code is unchanged.

citing papers explorer

Showing 2 of 2 citing papers.

Intersectional Fairness in Large Language Models cs.CL · 2026-04-22 · unverdicted · none · ref 14
LLMs are more accurate when answers match stereotypes in clear contexts, especially for race-gender combinations, and no tested model shows consistent fairness or reliability across intersectional groups.
Bias in the Loop: Auditing LLM-as-a-Judge for Software Engineering cs.SE · 2026-04-18 · unverdicted · none · ref 15
LLM judges for code tasks show high sensitivity to prompt biases that systematically favor certain options, changing accuracy and model rankings even when code is unchanged.

arXiv preprint arXiv:2503.06987(2025)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer