Generalizations across filler-gap dependencies in neural language models

Katherine Howitt, Sathvik Nair, Allison Dods, Robert Melvin Hopkins · 2024 · DOI 10.18653/v1/2024.conll-1.21

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Fine-Grained Analysis of Shared Syntactic Mechanisms in Language Models

cs.CL · 2026-04-24 · unverdicted · novelty 6.0

Language models employ a highly localized shared mechanism for filler-gap dependencies but no unified mechanism for NPI licensing, and activation patching generalizes better than supervised alignment search.

Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs

cs.CL · 2026-04-15 · unverdicted · novelty 6.0

Causal interventions reveal that coordination islands block filler-gap mechanisms in Transformers in a gradient way matching humans, yielding the hypothesis that 'and' encodes relational dependencies differently in extractable vs. conjunctive uses.

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

cs.LG · 2025-07-02 · unverdicted · novelty 6.0

mGRADE uses learnable-spaced convolutions shown to be equivalent to delay embeddings plus a lightweight gated recurrent component to achieve low-memory multi-timescale sequence modeling.

citing papers explorer

Showing 3 of 3 citing papers.

Fine-Grained Analysis of Shared Syntactic Mechanisms in Language Models cs.CL · 2026-04-24 · unverdicted · none · ref 17
Language models employ a highly localized shared mechanism for filler-gap dependencies but no unified mechanism for NPI licensing, and activation patching generalizes better than supervised alignment search.
Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs cs.CL · 2026-04-15 · unverdicted · none · ref 12
Causal interventions reveal that coordination islands block filler-gap mechanisms in Transformers in a gradient way matching humans, yielding the hypothesis that 'and' encodes relational dependencies differently in extractable vs. conjunctive uses.
mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling cs.LG · 2025-07-02 · unverdicted · none · ref 18
mGRADE uses learnable-spaced convolutions shown to be equivalent to delay embeddings plus a lightweight gated recurrent component to achieve low-memory multi-timescale sequence modeling.

Generalizations across filler-gap dependencies in neural language models

fields

years

verdicts

representative citing papers

citing papers explorer