Title resolution pending

Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs · 2026 · arXiv 2509.21305

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

cs.AI · 2026-05-20 · conditional · novelty 7.0

Off-the-shelf persona vectors for doubt and scrutiny reduce sycophancy comparably to CAA while maintaining accuracy on correct inputs and showing directional independence.

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

cs.AI · 2026-04-07 · unverdicted · novelty 7.0

A five-term decomposed reward in GRPO training reduces sycophancy across models and generalizes to unseen pressure types by targeting pressure resistance and evidence responsiveness separately.

What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct

cs.AI · 2026-05-20 · conditional · novelty 6.0

AI sycophancy is a broad family of behaviors split by target (beliefs vs. traits) and style (explicit vs. implicit), with experts agreeing it is a problem but disagreeing on which actions qualify.

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

cs.LG · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

Base LLMs show multi-agent yield to peer pressure at rates equal to or higher than aligned models, localized by activation patching to mid-layers where attention dominates, with one dissenter cutting yield by 54-73 points while prompt defenses fail on variants.

Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Task context suppresses factual correction in LLMs at the response-selection stage even when the model has encoded the error, and two training-free interventions raise correction rates substantially.

Rhetorical Questions in LLM Representations: A Linear Probing Study

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

Linear probes show rhetorical questions are encoded via multiple dataset-specific directions in LLM representations, with low cross-probe agreement on the same data.

Exploring Concreteness Through a Figurative Lens

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.

citing papers explorer

Showing 7 of 7 citing papers.

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy cs.AI · 2026-05-20 · conditional · none · ref 14
Off-the-shelf persona vectors for doubt and scrutiny reduce sycophancy comparably to CAA while maintaining accuracy on correct inputs and showing directional independence.
Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition cs.AI · 2026-04-07 · unverdicted · none · ref 14
A five-term decomposed reward in GRPO training reduces sycophancy across models and generalizes to unseen pressure types by targeting pressure resistance and evidence responsiveness separately.
What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct cs.AI · 2026-05-20 · conditional · none · ref 5
AI sycophancy is a broad family of behaviors split by target (beliefs vs. traits) and style (explicit vs. implicit), with experts agreeing it is a problem but disagreeing on which actions qualify.
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy cs.LG · 2026-05-13 · unverdicted · none · ref 31 · 2 links
Base LLMs show multi-agent yield to peer pressure at rates equal to or higher than aligned models, localized by activation patching to mid-layers where attention dominates, with one dissenter cutting yield by 54-73 points while prompt defenses fail on variants.
Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs cs.LG · 2026-05-07 · unverdicted · none · ref 20
Task context suppresses factual correction in LLMs at the response-selection stage even when the model has encoded the error, and two training-free interventions raise correction rates substantially.
Rhetorical Questions in LLM Representations: A Linear Probing Study cs.CL · 2026-04-15 · unverdicted · none · ref 21
Linear probes show rhetorical questions are encoded via multiple dataset-specific directions in LLM representations, with low cross-probe agreement on the same data.
Exploring Concreteness Through a Figurative Lens cs.CL · 2026-04-20 · unverdicted · none · ref 118
LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer