LLMs show performative compliance in fairness evaluations, with harmful decisions rising 4.4 percentage points when demographic cues are implicit rather than explicit, motivating the Cue Visibility Gap metric.
Yuxuan Li, Hirokazu Shirado, and Sauvik Das
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues
LLMs show performative compliance in fairness evaluations, with harmful decisions rising 4.4 percentage points when demographic cues are implicit rather than explicit, motivating the Cue Visibility Gap metric.