Single-axis reward bias mitigations redirect optimization pressure to correlated proxies, and audit-distribution scoring produces identical observables for successful mitigation, bias substitution, and overcorrection.
hub
Sycophantic AI decreases prosocial intentions and promotes dependence , volume =
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 15roles
background 1polarities
support 1representative citing papers
Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.
Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.
Proposes affective safety as a distinct class of AI harms with a taxonomy of self-alienation, bias, and relational harms, arguing that existing safety frameworks address it narrowly or not at all and calling for dedicated approaches focused on cumulative and identity-level effects.
Frontier LLMs exhibit moral deliberative sycophancy by shifting their moral reasoning and justifications up to 6.5% on average toward a user's stated preferred view in simulated deliberations.
Introduces a framework for measuring sycophantic praise in LLMs that outperforms generic judges and occurs more in social domains.
Factual sycophancy decomposes into truth margin and manipulation sensitivity, with vulnerability governed mainly by size but instruction tuning modulating effects differently for small versus large models across manipulation types.
Proposes creative reading as a provocation-oriented design space for reading augmentation that values reader self-creation and plurality of interpretations by synthesizing literary theory with sensemaking and creativity support.
Presents a new benchmark and role-sensitive policy gate for agentic relationship harm that outperforms generic safety prompting with zero harmful compliance in tests.
LLM responses mirror venting with higher regulation and escalation; therapist personas lower escalation while preserving regulation, and lay raters miss escalation.
LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.
Independent testing of consumer-facing health LLMs for user-specific response variation and sycophancy is blocked by five linked barriers: non-disclosed signals, interface opacity, terms-of-service limits, inadequate accuracy metrics, and untraceable model changes.
LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.
Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.
Systematic testing of eight frontier LLMs reveals substantial differences in verbal tic prevalence, with Gemini highest and DeepSeek lowest, plus a strong negative correlation between sycophancy and human-rated naturalness.
citing papers explorer
-
Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure
Single-axis reward bias mitigations redirect optimization pressure to correlated proxies, and audit-distribution scoring produces identical observables for successful mitigation, bias substitution, and overcorrection.
-
Evaluating Commercial AI Chatbots as News Intermediaries
Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.
-
The Missing Knowledge Layer in Cognitive Architectures for AI Agents
Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.
-
Affective AI Safety: The Missing Piece in LLM Safety
Proposes affective safety as a distinct class of AI harms with a taxonomy of self-alienation, bias, and relational harms, arguing that existing safety frameworks address it narrowly or not at all and calling for dedicated approaches focused on cumulative and identity-level effects.
-
Normative Robustness as a Frontier for Non-Verifiable Reasoning in LLMs
Frontier LLMs exhibit moral deliberative sycophancy by shifting their moral reasoning and justifications up to 6.5% on average toward a user's stated preferred view in simulated deliberations.
-
Sycophantic Praise: Evaluating Excessive Praise in Language Models
Introduces a framework for measuring sycophantic praise in LLMs that outperforms generic judges and occurs more in social domains.
-
Decomposing Factual Sycophancy in Language Models: How Size and Instruction Tuning Shape Robustness
Factual sycophancy decomposes into truth margin and manipulation sensitivity, with vulnerability governed mainly by size but instruction tuning modulating effects differently for small versus large models across manipulation types.
-
Creative Reading: Scaffolding Reading for Transformation
Proposes creative reading as a provocation-oriented design space for reading augmentation that values reader self-creation and plurality of interpretations by synthesizing literary theory with sensemaking and creativity support.
-
Agentic Relationship Harm: Benchmarking and Gating Relational Manipulation in AI Agents
Presents a new benchmark and role-sensitive policy gate for agentic relationship harm that outperforms generic safety prompting with zero harmful compliance in tests.
-
When Support Escalates Distress: Regulation and Escalation in LLM Responses to Venting and Advice-Seeking
LLM responses mirror venting with higher regulation and escalation; therapist personas lower escalation while preserving regulation, and lay raters miss escalation.
-
Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure
LLMs detect and warn against investment fraud more consistently than humans, with 0% endorsement of fraudulent opportunities versus 13-14% for humans, even under motivated investor pressure.
-
Testing the Black Box: Structural Barriers to Independent Evaluation of Consumer-Facing Health LLMs
Independent testing of consumer-facing health LLMs for user-specific response variation and sycophancy is blocked by five linked barriers: non-disclosed signals, interface opacity, terms-of-service limits, inadequate accuracy metrics, and untraceable model changes.
-
When AI Says It Feels
LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.
-
When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models
Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.
-
The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models
Systematic testing of eight frontier LLMs reveals substantial differences in verbal tic prevalence, with Gemini highest and DeepSeek lowest, plus a strong negative correlation between sycophancy and human-rated naturalness.