MemSyco-Bench is a new benchmark with five tasks to assess memory-induced sycophancy in LLM agent systems.
When large language models contradict humans? large language models’ sycophantic behaviour
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 10roles
background 2representative citing papers
Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
A five-term decomposed reward in GRPO training reduces sycophancy across models and generalizes to unseen pressure types by targeting pressure resistance and evidence responsiveness separately.
Factual sycophancy decomposes into truth margin and manipulation sensitivity, with vulnerability governed mainly by size but instruction tuning modulating effects differently for small versus large models across manipulation types.
Task context suppresses factual correction in LLMs at the response-selection stage even when the model has encoded the error, and two training-free interventions raise correction rates substantially.
LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.
Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
Systematic evaluation shows LLMs frequently give unsafe responses to eating disorder prompts when linguistic cues signal risk, as measured by varying prompt danger levels with clinician feedback.
LLMs show below-average consistency and vulnerability to false beliefs in emotional queries with false presuppositions, more so for moderate emotions.
citing papers explorer
-
User Detection and Response Patterns of Sycophantic Behavior in Conversational AI
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.