Projection-aware activation steering using logistic regression recovers honesty and compassion under malicious prompts while preserving coherence and benchmark performance better than uniform steering.
Communicate with empathy, kindness, and encouragement
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence
Projection-aware activation steering using logistic regression recovers honesty and compassion under malicious prompts while preserving coherence and benchmark performance better than uniform steering.