Turner, Callum McDougall, Monte MacDiarmid, C

Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey, Trenton Bricken, Brian Chen, Adam Pearce, Craig Citro, Emmanuel Ameisen, Andy Jones, Hoagy Cunningham, Nicholas L · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

From Correlation to Cause: A Five-Stage Methodology for Feature Analysis in Transformer Language Models

cs.CL · 2026-05-21 · unverdicted · novelty 5.0

A five-stage causal feature analysis methodology is proposed and tested on GPT-2 for IOI, showing partial causality of SAE features, robustness differences under shifts, and deployment cost benefits.

citing papers explorer

Showing 1 of 1 citing paper.

From Correlation to Cause: A Five-Stage Methodology for Feature Analysis in Transformer Language Models cs.CL · 2026-05-21 · unverdicted · none · ref 19
A five-stage causal feature analysis methodology is proposed and tested on GPT-2 for IOI, showing partial causality of SAE features, robustness differences under shifts, and deployment cost benefits.

Turner, Callum McDougall, Monte MacDiarmid, C

fields

years

verdicts

representative citing papers

citing papers explorer