Exemplar Partitioning creates Voronoi partitions of LLM activation space via leader clustering on streamed activations, yielding comparable, interpretable dictionaries that support interventions and achieve competitive benchmark results with ~1000x less compute than SAEs.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
SMIXAE is a new mixture-of-autoencoders architecture that learns multidimensional manifolds directly from transformer activations, recovering known structures and identifying novel ones in Gemma 2 2B and 9B models.
Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
COAST applies contrastive conceptors to steer VLA hidden states into task-specific success subspaces, yielding over 20% simulation and 40% real-robot success rate gains across three distinct policies.
Bilinear autoencoders decompose neural activations into low-rank quadratic forms to discover interpretable multi-dimensional manifolds, improving reconstruction in language models and challenging linear representation assumptions.
Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.
citing papers explorer
-
Exemplar Partitioning for Mechanistic Interpretability
Exemplar Partitioning creates Voronoi partitions of LLM activation space via leader clustering on streamed activations, yielding comparable, interpretable dictionaries that support interventions and achieve competitive benchmark results with ~1000x less compute than SAEs.
-
SMIXAE: Towards Unsupervised Manifold Discovery in Language Models
SMIXAE is a new mixture-of-autoencoders architecture that learns multidimensional manifolds directly from transformer activations, recovering known structures and identifying novel ones in Gemma 2 2B and 9B models.
-
From Mechanistic to Compositional Interpretability
Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
-
Contrastive Conceptor Activation Steering (COAST): Unlocking Vision-Language-Action Models through Hidden States
COAST applies contrastive conceptors to steer VLA hidden states into task-specific success subspaces, yielding over 20% simulation and 40% real-robot success rate gains across three distinct policies.
-
Bilinear autoencoders find interpretable manifolds
Bilinear autoencoders decompose neural activations into low-rank quadratic forms to discover interpretable multi-dimensional manifolds, improving reconstruction in language models and challenging linear representation assumptions.
-
Rigorous Interpretation Is a Form of Evaluation
Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
-
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models
Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.