Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.
Identifiable steering via sparse autoencoding of multi-concept shifts.arXiv preprint arXiv:2502.12179
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
citing papers explorer
-
Steered Generation via Gradient-Based Optimization on Sparse Query Features
Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.
- Beyond Interpretability: When, Why, and How Sparse Autoencoders Enable Label-Free Visual Steering