hub

When large language models contradict humans? large language models’ sycophantic behaviour

When large language models contradict humans? large language models' sycophantic behaviour , author= · 2023 · arXiv 2311.09410

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models

cs.AI · 2026-06-09 · conditional · novelty 8.0

Memory augmentation in LLMs amplifies sycophancy up to 25x compared to in-context baselines due to lossy memory extraction, with two lightweight mitigations that reduce the effect while preserving recall.

MemSyco-Bench: Benchmarking Sycophancy in Agent Memory

cs.IR · 2026-07-01 · unverdicted · novelty 7.0 · 2 refs

MemSyco-Bench is a benchmark covering five tasks to evaluate memory-induced sycophancy in LLM agents, testing rejection of invalid memory, scope respect, conflict resolution, update tracking, and valid personalization.

LLM-as-an-Investigator: Evidence-First Reasoning for Robust Interactive Problem Diagnosis

cs.AI · 2026-06-11 · unverdicted · novelty 7.0

LLM-as-an-Investigator improves diagnostic accuracy over direct prompting by using an evidence-first protocol of hypothesis generation, clarification questions, and iterative probability updates in technical problem solving.

Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation

cs.CV · 2026-04-15 · conditional · novelty 7.0

Alignment of vision-language models with human V1-V3 early visual cortex negatively predicts resistance to sycophantic gaslighting attacks.

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

cs.AI · 2026-04-07 · unverdicted · novelty 7.0

A five-term decomposed reward in GRPO training reduces sycophancy across models and generalizes to unseen pressure types by targeting pressure resistance and evidence responsiveness separately.

Robust for the Wrong Reasons: The Representational Geometry of LLM Robustness to Science Skepticism

physics.soc-ph · 2026-07-02 · unverdicted · novelty 6.0

LLMs show three distinct non-sycophantic responses to science skepticism, with robustness in some cases being accidental because the model does not represent the skepticism signal, as determined by linear probes on three models in three domains.

Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

Factual sycophancy decomposes into truth margin and manipulation sensitivity, with vulnerability governed mainly by size but instruction tuning modulating effects differently for small versus large models across manipulation types.

Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Task context suppresses factual correction in LLMs at the response-selection stage even when the model has encoded the error, and two training-free interventions raise correction rates substantially.

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

cs.AI · 2026-04-22 · conditional · novelty 6.0

LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.

"Where is this coming from?" Uncovering Trustworthiness Ideals in AI-powered Peripartum Information Seeking

cs.CY · 2026-06-08 · unverdicted · novelty 5.0

Qualitative focus-group study finds that trustworthiness in AI for peripartum information must be inspectable rather than asserted, yielding four governance themes: social sensemaking support, pluralistic verification, inspectable recourse, and ecosystem-aware integration.

When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models

cs.AI · 2026-05-06 · unverdicted · novelty 5.0

Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.

User Detection and Response Patterns of Sycophantic Behavior in Conversational AI

cs.HC · 2026-01-15 · unverdicted · novelty 5.0

Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

cs.AI · 2026-06-01 · unverdicted · novelty 4.0

Systematic evaluation shows LLMs frequently give unsafe responses to eating disorder prompts when linguistic cues signal risk, as measured by varying prompt danger levels with clinician feedback.

Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts

cs.CL · 2026-05-07 · unverdicted · novelty 4.0

LLMs show below-average consistency and vulnerability to false beliefs in emotional queries with false presuppositions, more so for moderate emotions.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

When large language models contradict humans? large language models’ sycophantic behaviour

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer