Title resolution pending

Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs · 2026 · arXiv 2509.21305

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition

cs.AI · 2026-04-07 · unverdicted · novelty 7.0

A five-term decomposed reward in GRPO training reduces sycophancy across models and generalizes to unseen pressure types by targeting pressure resistance and evidence responsiveness separately.

What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct

cs.AI · 2026-05-20 · conditional · novelty 6.0

AI sycophancy is a broad family of behaviors split by target (beliefs vs. traits) and style (explicit vs. implicit), with experts agreeing it is a problem but disagreeing on which actions qualify.

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

cs.LG · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

Base LLMs show multi-agent yield to peer pressure at rates equal to or higher than aligned models, localized by activation patching to mid-layers where attention dominates, with one dissenter cutting yield by 54-73 points while prompt defenses fail on variants.

Knowing but Not Correcting: Routine Task Requests Suppress Factual Correction in LLMs

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Task context suppresses factual correction in LLMs at the response-selection stage even when the model has encoded the error, and two training-free interventions raise correction rates substantially.

Rhetorical Questions in LLM Representations: A Linear Probing Study

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

Linear probes show rhetorical questions are encoded via multiple dataset-specific directions in LLM representations, with low cross-probe agreement on the same data.

Exploring Concreteness Through a Figurative Lens

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

cs.AI · 2026-05-20

citing papers explorer

Showing 2 of 2 citing papers after filters.

Rhetorical Questions in LLM Representations: A Linear Probing Study cs.CL · 2026-04-15 · unverdicted · none · ref 21
Linear probes show rhetorical questions are encoded via multiple dataset-specific directions in LLM representations, with low cross-probe agreement on the same data.
Exploring Concreteness Through a Figurative Lens cs.CL · 2026-04-20 · unverdicted · none · ref 118
LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer