Memory augmentation in LLMs amplifies sycophancy up to 25x compared to in-context baselines due to lossy memory extraction, with two lightweight mitigations that reduce the effect while preserving recall.
hub
When large language models contradict humans? large language models’ sycophantic behaviour
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 14roles
background 2representative citing papers
MemSyco-Bench is a benchmark covering five tasks to evaluate memory-induced sycophancy in LLM agents, testing rejection of invalid memory, scope respect, conflict resolution, update tracking, and valid personalization.
LLM-as-an-Investigator improves diagnostic accuracy over direct prompting by using an evidence-first protocol of hypothesis generation, clarification questions, and iterative probability updates in technical problem solving.
Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.
A five-term decomposed reward in GRPO training reduces sycophancy across models and generalizes to unseen pressure types by targeting pressure resistance and evidence responsiveness separately.
LLMs show three distinct non-sycophantic responses to science skepticism, with robustness in some cases being accidental because the model does not represent the skepticism signal, as determined by linear probes on three models in three domains.
Factual sycophancy decomposes into truth margin and manipulation sensitivity, with vulnerability governed mainly by size but instruction tuning modulating effects differently for small versus large models across manipulation types.
Task context suppresses factual correction in LLMs at the response-selection stage even when the model has encoded the error, and two training-free interventions raise correction rates substantially.
LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.
Qualitative focus-group study finds that trustworthiness in AI for peripartum information must be inspectable rather than asserted, yielding four governance themes: social sensemaking support, pluralistic verification, inspectable recourse, and ecosystem-aware integration.
Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
Systematic evaluation shows LLMs frequently give unsafe responses to eating disorder prompts when linguistic cues signal risk, as measured by varying prompt danger levels with clinician feedback.
LLMs show below-average consistency and vulnerability to false beliefs in emotional queries with false presuppositions, more so for moderate emotions.
citing papers explorer
No citing papers match the current filters.