Causal analysis of syntactic agreement mechanisms in neural language models

Association for Computational Linguistics · 2021 · DOI 10.18653/v1/2021.acl-long.144

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

representative citing papers

Locating and Editing Factual Associations in GPT

cs.CL · 2022-02-10 · accept · novelty 8.0

Factual associations in autoregressive transformers are localized to mid-layer feed-forward modules and can be edited via rank-one model editing while preserving both specificity and generalization on counterfactual tests.

The Curse of Multiple Mediators: Hidden Interaction Effects in Activation Patching

cs.LG · 2026-06-25 · unverdicted · novelty 7.0

Re-derivation of activation patching NIE reveals it captures interaction effects in addition to direct causal effects, demonstrated via GPT-2 IOI circuit where INT explains component ranking issues and faithfulness instability.

Causally Evaluating the Learnability of Formal Language Tasks

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Introduces the binning semiring and causal graphical models to show that correlational evaluation of learnability in formal language tasks leads to incorrect conclusions from confounders.

Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

Llama-3.1-8B computes sums for cyclic concepts using base-10 addition via task-agnostic Fourier features with periods 2, 5, and 10 rather than modular arithmetic in the concept period.

A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

An exposure-based split on BLiMP data reveals delayed generalization in five grammatical phenomena during LLM pre-training, with post-generalization shifts in concept vector predictiveness and attention patterns.

Fine-Grained Analysis of Shared Syntactic Mechanisms in Language Models

cs.CL · 2026-04-24 · unverdicted · novelty 6.0

Language models employ a highly localized shared mechanism for filler-gap dependencies but no unified mechanism for NPI licensing, and activation patching generalizes better than supervised alignment search.

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

cs.AI · 2023-10-10 · unverdicted · novelty 6.0

At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.

citing papers explorer

Showing 7 of 7 citing papers after filters.

The Curse of Multiple Mediators: Hidden Interaction Effects in Activation Patching cs.LG · 2026-06-25 · unverdicted · none · ref 4
Re-derivation of activation patching NIE reveals it captures interaction effects in addition to direct causal effects, demonstrated via GPT-2 IOI circuit where INT explains component ranking issues and faithfulness instability.
Causally Evaluating the Learnability of Formal Language Tasks cs.CL · 2026-06-08 · unverdicted · none · ref 69
Introduces the binning semiring and causal graphical models to show that correlational evaluation of learnability in formal language tasks leads to incorrect conclusions from confounders.
Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery cs.CL · 2026-06-04 · unverdicted · none · ref 22
Structurally distinct circuits for literal sequence copying across token frequency bands implement the same computation, shown by broad transfer of band-specific edges, a shared core recovering 99% performance, and interchangeable representations via causal interventions.
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts cs.AI · 2026-05-01 · unverdicted · none · ref 159
Llama-3.1-8B computes sums for cyclic concepts using base-10 addition via task-agnostic Fourier features with periods 2, 5, and 10 rather than modular arithmetic in the concept period.
A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization cs.LG · 2026-05-29 · unverdicted · none · ref 28
An exposure-based split on BLiMP data reveals delayed generalization in five grammatical phenomena during LLM pre-training, with post-generalization shifts in concept vector predictiveness and attention patterns.
Fine-Grained Analysis of Shared Syntactic Mechanisms in Language Models cs.CL · 2026-04-24 · unverdicted · none · ref 9
Language models employ a highly localized shared mechanism for filler-gap dependencies but no unified mechanism for NPI licensing, and activation patching generalizes better than supervised alignment search.
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets cs.AI · 2023-10-10 · unverdicted · none · ref 70
At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.

Causal analysis of syntactic agreement mechanisms in neural language models

fields

years

verdicts

representative citing papers

citing papers explorer