Cascading linear features extracted from graded sycophancy samples form separable subspaces that enable detection, scoring, and steering of sycophantic behavior in LLMs, matching or exceeding LLM-judge and prompting baselines.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2representative citing papers
The Cylindrical Representation Hypothesis (CRH) models LLM representations as a central axis for concept activation surrounded by a normal plane containing sensitive sectors that determine steering sensitivity and introduce intrinsic uncertainty.
citing papers explorer
-
Detecting and Controlling Sycophancy with Cascading Linear Features
Cascading linear features extracted from graded sycophancy samples form separable subspaces that enable detection, scoring, and steering of sycophantic behavior in LLMs, matching or exceeding LLM-judge and prompting baselines.
-
The Cylindrical Representation Hypothesis for Language Model Steering
The Cylindrical Representation Hypothesis (CRH) models LLM representations as a central axis for concept activation surrounded by a normal plane containing sensitive sectors that determine steering sensitivity and introduce intrinsic uncertainty.