SAERec extracts fine-grained interpretable intents from LLM embeddings via sparse autoencoders and integrates them as priors into sequence recommendation using multi-branch attention, outperforming baselines on public datasets.
Interpreting and steering llms with mutual information-based explanations on sparse autoencoders.arXiv preprint arXiv:2502.15576, 2025a
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.
citing papers explorer
-
SAERec: Constructing Fine-grained Interpretable Intents Priors via Sparse Autoencoders for Recommendation
SAERec extracts fine-grained interpretable intents from LLM embeddings via sparse autoencoders and integrates them as priors into sequence recommendation using multi-branch attention, outperforming baselines on public datasets.
-
Steered Generation via Gradient-Based Optimization on Sparse Query Features
Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.