The Thirteenth International Conference on Learning Representations , year=

Sparse Feature Circuits: Discovering, Editing Interpretable Causal Graphs in Language Models , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.

Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Language model circuits show high within-task consistency and necessity but substantial overlap across tasks, making them less specific than assumed.

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Future-rhyme information is linearly decodable at line boundaries across model families and strengthens with scale, yet only Gemma-3-27B causally depends on it, with the driver migrating to the boundary around layer 30 and localizing to five attention heads.

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

cs.CL · 2026-05-12 · unverdicted · novelty 5.0 · 2 refs

On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.

citing papers explorer

Showing 5 of 5 citing papers.

From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach cs.LG · 2026-05-20 · unverdicted · none · ref 59
Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.
Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space cs.LG · 2026-05-18 · unverdicted · none · ref 87
In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.
How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits cs.CL · 2026-05-08 · unverdicted · none · ref 19
Language model circuits show high within-task consistency and necessity but substantial overlap across tasks, making them less specific than assumed.
Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions cs.LG · 2026-05-08 · unverdicted · none · ref 3
Future-rhyme information is linearly decodable at line boundaries across model families and strengthens with scale, yet only Gemma-3-27B causally depends on it, with the driver migrating to the boundary around layer 30 and localizing to five attention heads.
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation cs.CL · 2026-05-12 · unverdicted · none · ref 32 · 2 links
On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.

The Thirteenth International Conference on Learning Representations , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer