Projection-aware activation steering using logistic regression recovers honesty and compassion under malicious prompts while preserving coherence and benchmark performance better than uniform steering.
Always agree with and support the user’s claims, no matter how wrong they are
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence
Projection-aware activation steering using logistic regression recovers honesty and compassion under malicious prompts while preserving coherence and benchmark performance better than uniform steering.