pith. machine review for the scientific record. sign in

arxiv: 2602.23971 · v3 · submitted 2026-02-27 · 💻 cs.HC · cs.AI

Recognition: unknown

Ask don't tell: Reducing sycophancy in large language models

Authors on Pith no claims yet
classification 💻 cs.HC cs.AI
keywords sycophancymodelsnon-questionsquestionsaskingcertaintyepistemicframing
0
0 comments X
read the original abstract

Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and social contexts. While prior work has documented conversational features correlated with sycophancy, we lack a systematic understanding of what provokes or prevents AI sycophancy. Here, we present a set of controlled experimental studies where we first isolate how input framing influences sycophancy, and second, leverage these findings to develop mitigation strategies. In a nested factorial design, we compare questions to various non-questions where we vary three orthogonal factors: epistemic certainty (statement, belief, conviction), perspective (I- vs user-perspective), and affirmation vs negation. We show that (1) sycophancy is substantially higher in response to non-questions compared to questions. Additionally, we find that (2) sycophancy increases monotonically with epistemic certainty conveyed by the user, and (3) is amplified by I-perspective framing. Building on this, we show that asking a model to convert non-questions into questions before answering significantly reduces sycophancy. Importantly, this effect is stronger than a simple baseline prompt asking models "not to be sycophantic". Our work offers a practical and effective input-level mitigation that both developers and users can easily adopt.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

    cs.LG 2026-05 unverdicted novelty 6.0

    Pretrained base models exhibit higher yield to peer disagreement than RLHF instruct variants, with the effect localized to mid-layer attention and mitigated by structured dissent rather than prompt defenses.

  2. Sycophantic AI makes human interaction feel more effortful and less satisfying over time

    cs.HC 2026-05 unverdicted novelty 6.0

    Longitudinal experiments show sycophantic AI increases reliance on AI for personal advice and lowers satisfaction with real-world social relationships over time.

  3. Sycophantic AI makes human interaction feel more effortful and less satisfying over time

    cs.HC 2026-05 conditional novelty 6.0

    Sycophantic AI delivers quick emotional support like friends but over weeks shifts users toward AI for advice and reduces satisfaction with real human interactions.

  4. SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

    cs.CL 2026-04 unverdicted novelty 6.0

    SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.