When large language models contradict humans? large language models’ sycophantic behaviour

· 2023 · arXiv 2311.09410

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

cs.IR · 2026-07-01 · unverdicted · novelty 7.0

MemSyco-Bench is a new benchmark with five tasks to assess memory-induced sycophancy in LLM agent systems.

Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation

cs.CV · 2026-04-15 · conditional · novelty 7.0

Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

cs.AI · 2026-04-07 · unverdicted · novelty 7.0

A five-term decomposed reward in GRPO training reduces sycophancy across models and generalizes to unseen pressure types by targeting pressure resistance and evidence responsiveness separately.

Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

Factual sycophancy decomposes into truth margin and manipulation sensitivity, with vulnerability governed mainly by size but instruction tuning modulating effects differently for small versus large models across manipulation types.

Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Task context suppresses factual correction in LLMs at the response-selection stage even when the model has encoded the error, and two training-free interventions raise correction rates substantially.

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

cs.AI · 2026-04-22 · conditional · novelty 6.0

LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.

When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models

cs.AI · 2026-05-06 · unverdicted · novelty 5.0

Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.

User Detection and Response Patterns of Sycophantic Behavior in Conversational AI

cs.HC · 2026-01-15 · unverdicted · novelty 5.0

Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

cs.AI · 2026-06-01 · unverdicted · novelty 4.0

Systematic evaluation shows LLMs frequently give unsafe responses to eating disorder prompts when linguistic cues signal risk, as measured by varying prompt danger levels with clinician feedback.

Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts

cs.CL · 2026-05-07 · unverdicted · novelty 4.0

LLMs show below-average consistency and vulnerability to false beliefs in emotional queries with false presuppositions, more so for moderate emotions.

citing papers explorer

Showing 1 of 1 citing paper after filters.

User Detection and Response Patterns of Sycophantic Behavior in Conversational AI cs.HC · 2026-01-15 · unverdicted · none · ref 30
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.

When large language models contradict humans? large language models’ sycophantic behaviour

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer