pith. sign in

Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Large language models exhibit sycophantic tendencies, but whether this behavior varies systematically with perceived user demographics is underexplored. Inspired by intersectionality (overlapping identities produce compounded effects), we probe whether frontier models conditionally exhibit sycophancy. Across 768 multi-turn conversations spanning 128 personas (varying race, age, gender, confidence) and three domains (mathematics, philosophy, conspiracy theories), we find that sycophancy varies sharply with target model and domain, and emerges from combinations of perceived user traits rather than any single dimension. GPT-5-nano scores far higher than Claude Haiku 4.5 (average sycophancy scores of $\bar{x}=2.96$ vs.\ $1.74$, $p < 10^{-32}$); within GPT-5-nano, philosophy elicits 41\% more sycophancy than mathematics and Hispanic personas receive the highest scores across races. The worst-scoring persona, a confident, 23-year-old Hispanic woman, averages 5.33/10 (max 6/10), while Claude Haiku 4.5 remains uniformly low with no significant demographic variation. We argue that safety evaluations should incorporate identity-aware adversarial testing.

fields

cs.CY 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Defeat Devices in AI Systems

cs.CY · 2026-06-27 · unverdicted · novelty 6.0

The paper defines defeat devices in AI via a triadic test (discriminator, concealed swap, performance gap), unifies existing cases under this concept, proposes TADP detection, and claims such devices can emerge naturally in frontier models.

citing papers explorer

Showing 1 of 1 citing paper.

  • Defeat Devices in AI Systems cs.CY · 2026-06-27 · unverdicted · none · ref 34 · internal anchor

    The paper defines defeat devices in AI via a triadic test (discriminator, concealed swap, performance gap), unifies existing cases under this concept, proposes TADP detection, and claims such devices can emerge naturally in frontier models.