Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.
In: Proceedings of the 2021 ACM conference on fairness, ac- countability, and transparency
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
dataset 1
citation-polarity summary
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1roles
dataset 1polarities
use dataset 1representative citing papers
citing papers explorer
-
Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks
Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.