Exemplar Partitioning creates Voronoi partitions of LLM activation space via leader clustering on streamed activations, yielding comparable, interpretable dictionaries that support interventions and achieve competitive benchmark results with ~1000x less compute than SAEs.
Interpretability in the Wild: a Circuit for Indirect Object Identification in
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Exemplar Partitioning for Mechanistic Interpretability
Exemplar Partitioning creates Voronoi partitions of LLM activation space via leader clustering on streamed activations, yielding comparable, interpretable dictionaries that support interventions and achieve competitive benchmark results with ~1000x less compute than SAEs.