A large-scale audit of 21 LLMs on OR-Bench, XSTest, ToxiGen and BOLD using composition adjustment reveals distinct conservative vs permissive safety strategies, unequal demographic protection, and post-training stability within model families.
AI and Ethics5(1), 689–707 (2025)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
No citing papers match the current filters.