Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.
Identifiable steering via sparse autoencoding of multi-concept shifts.arXiv preprint arXiv:2502.12179
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it