In: Proceedings of the 2021 ACM conference on fairness, ac- countability, and transparency

Dhamala, J · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

dataset 1

use dataset 1

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.

Showing 1 of 1 citing paper.

Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks cs.AI · 2026-05-11 · unverdicted · none · ref 6
Toxicity benchmarks for LLMs produce inconsistent results when task type, input domain, or model changes, revealing intrinsic evaluation biases.