Towards Automated Circuit Discovery for Mechanistic Interpretability , url =

Conmy, Arthur, Mavor-Parker, Augustine, Lynch, Aengus, Heimersheim, Stefan

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Language model circuits show high within-task consistency and necessity but substantial overlap across tasks, making them less specific than assumed.

Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Unpack decomposes transformer credit via a unified backward recursion on the φ(S)U template, recovering known IOI circuits with mode labels and showing consistent duplicate-name suppression across Pythia scales from a single forward pass.

citing papers explorer

Showing 2 of 2 citing papers.

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits cs.CL · 2026-05-08 · unverdicted · none · ref 21
Language model circuits show high within-task consistency and necessity but substantial overlap across tasks, making them less specific than assumed.
Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition cs.LG · 2026-05-22 · unverdicted · none · ref 26
Unpack decomposes transformer credit via a unified backward recursion on the φ(S)U template, recovering known IOI circuits with mode labels and showing consistent duplicate-name suppression across Pythia scales from a single forward pass.

Towards Automated Circuit Discovery for Mechanistic Interpretability , url =

fields

years

verdicts

representative citing papers

citing papers explorer