Language-model-guided program synthesis can approximate transformer attention heads with over 75% IoU fidelity on held-out data and allow replacing 25% of heads with only 16% average perplexity increase.
Opening the ai black box: program synthesis via mechanistic interpretability.arXiv preprint arXiv:2402.05110
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
DIO-Agent frames IO2Code as LLM-driven evolutionary search over programs with a Transformation Priority Premise to favor simple hypotheses, outperforming baselines on a new IO2CodeBench.
Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.
Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.
citing papers explorer
-
Explaining Attention with Program Synthesis
Language-model-guided program synthesis can approximate transformer attention heads with over 75% IoU fidelity on held-out data and allow replacing 25% of heads with only 16% average perplexity increase.
-
From I/O to Code with Discovery Agent
DIO-Agent frames IO2Code as LLM-driven evolutionary search over programs with a Transformation Priority Premise to favor simple hypotheses, outperforming baselines on a new IO2CodeBench.
-
Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces
Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.
-
Probing for Representation Manifolds in Superposition
Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.