pith. sign in

Title resolution pending

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

years

2026 7

representative citing papers

Exemplar Partitioning for Mechanistic Interpretability

cs.LG · 2026-05-14 · unverdicted · novelty 7.0 · 2 refs

Exemplar Partitioning creates Voronoi partitions of LLM activation space via leader clustering on streamed activations, yielding comparable, interpretable dictionaries that support interventions and achieve competitive benchmark results with ~1000x less compute than SAEs.

SMIXAE: Towards Unsupervised Manifold Discovery in Language Models

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

SMIXAE is a new mixture-of-autoencoders architecture that learns multidimensional manifolds directly from transformer activations, recovering known structures and identifying novel ones in Gemma 2 2B and 9B models.

From Mechanistic to Compositional Interpretability

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.

Bilinear autoencoders find interpretable manifolds

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Bilinear autoencoders decompose neural activations into low-rank quadratic forms to discover interpretable multi-dimensional manifolds, improving reconstruction in language models and challenging linear representation assumptions.

Rigorous Interpretation Is a Form of Evaluation

cs.CY · 2026-05-06 · unverdicted · novelty 5.0

Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.

citing papers explorer

Showing 7 of 7 citing papers.

  • Exemplar Partitioning for Mechanistic Interpretability cs.LG · 2026-05-14 · unverdicted · none · ref 6 · 2 links

    Exemplar Partitioning creates Voronoi partitions of LLM activation space via leader clustering on streamed activations, yielding comparable, interpretable dictionaries that support interventions and achieve competitive benchmark results with ~1000x less compute than SAEs.

  • SMIXAE: Towards Unsupervised Manifold Discovery in Language Models cs.LG · 2026-05-09 · unverdicted · none · ref 18

    SMIXAE is a new mixture-of-autoencoders architecture that learns multidimensional manifolds directly from transformer activations, recovering known structures and identifying novel ones in Gemma 2 2B and 9B models.

  • From Mechanistic to Compositional Interpretability cs.LG · 2026-05-09 · unverdicted · none · ref 79

    Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.

  • Contrastive Conceptor Activation Steering (COAST): Unlocking Vision-Language-Action Models through Hidden States cs.RO · 2026-05-16 · conditional · none · ref 33

    COAST applies contrastive conceptors to steer VLA hidden states into task-specific success subspaces, yielding over 20% simulation and 40% real-robot success rate gains across three distinct policies.

  • Bilinear autoencoders find interpretable manifolds cs.LG · 2026-05-09 · unverdicted · none · ref 31

    Bilinear autoencoders decompose neural activations into low-rank quadratic forms to discover interpretable multi-dimensional manifolds, improving reconstruction in language models and challenging linear representation assumptions.

  • Rigorous Interpretation Is a Form of Evaluation cs.CY · 2026-05-06 · unverdicted · none · ref 116

    Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.

  • Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 62

    Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.