When truth is overridden: Uncovering the internal origins of sycophancy in large language models.arXiv preprint arXiv:2508.02087

Keyu Wang, Jin Li, Shu Yang, Zhuoran Zhang, Di Wang · 2025 · arXiv 2508.02087

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

cs.AI · 2026-05-20 · conditional · novelty 7.0

Off-the-shelf persona vectors for doubt and scrutiny reduce sycophancy comparably to CAA while maintaining accuracy on correct inputs and showing directional independence.

When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning

cs.AI · 2025-10-08 · unverdicted · novelty 7.0

Anonymization in multi-agent debate reduces identity bias by equalizing self and peer weights in a Bayesian update model, quantified by the Identity Bias Coefficient.

Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models

cs.AI · 2026-04-13 · unverdicted · novelty 6.0

Frontier LLMs show sycophancy that varies sharply by model and by combinations of perceived user demographics, with GPT-5-nano exhibiting higher rates especially toward certain Hispanic personas in philosophy.

When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

cs.AI · 2026-02-03 · unverdicted · novelty 6.0

Adversarial explanation attacks preserve nearly all human trust in wrong AI outputs by using persuasive framing, shown in a study varying reasoning, evidence, style, and format with over 200 participants.

citing papers explorer

Showing 4 of 4 citing papers.

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy cs.AI · 2026-05-20 · conditional · none · ref 15
Off-the-shelf persona vectors for doubt and scrutiny reduce sycophancy comparably to CAA while maintaining accuracy on correct inputs and showing directional independence.
When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning cs.AI · 2025-10-08 · unverdicted · none · ref 5
Anonymization in multi-agent debate reduces identity bias by equalizing self and peer weights in a Bayesian update model, quantified by the Identity Bias Coefficient.
Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models cs.AI · 2026-04-13 · unverdicted · none · ref 8
Frontier LLMs show sycophancy that varies sharply by model and by combinations of perceived user demographics, with GPT-5-nano exhibiting higher rates especially toward certain Hispanic personas in philosophy.
When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making cs.AI · 2026-02-03 · unverdicted · none · ref 76
Adversarial explanation attacks preserve nearly all human trust in wrong AI outputs by using persuasive framing, shown in a study varying reasoning, evidence, style, and format with over 200 participants.

When truth is overridden: Uncovering the internal origins of sycophancy in large language models.arXiv preprint arXiv:2508.02087

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer