Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent

Peter Chin; Yuanhao Chen

arxiv: 2605.26431 · v2 · pith:777TPMANnew · submitted 2026-05-26 · 💻 cs.CL · stat.AP

Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent

Yuanhao Chen , Peter Chin This is my paper

Pith reviewed 2026-06-29 18:41 UTC · model grok-4.3

classification 💻 cs.CL stat.AP

keywords LLMssyntactic probingUniversal DependenciesMinimalist Programphase boundarieswh-movementactivation patching

0 comments

The pith

LLMs encode phase boundaries and internal cohesion that Universal Dependencies leave unmarked by design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper checks whether large language models pick up formal syntactic phases from the Minimalist Program even though Universal Dependencies annotations never mark phase boundaries or phase-internal cohesion. It constructs wh-movement examples in which the distance between the wh-element and its gap stays identical in the UD tree across three conditions that differ only in how many phase boundaries are crossed. In 12 of 13 models a probe effect grows with the number of phases crossed; in all 13 models an asymmetry appears inside a single clause exactly where phase cohesion predicts it. Activation patching shows these signals are used by the models themselves. The result indicates that what distributional pretraining produces can exceed the reach of any UD-grounded probe.

Core claim

Across 13 LLMs from four families, structural probes recover a phase-count gradient on a cross-clause wh-movement pair in 12 models and a consistent sign asymmetry on a within-clause pair whose UD distance is identical across conditions in all 13 models; both patterns are predicted by the number of phase boundaries crossed and by phase-internal cohesion. Activation patching confirms the probed representations are causally active in 12 of the 13 models.

What carries the argument

Wh-movement stimuli whose UD distances are held constant across bare small-clause, infinitival, and finite conditions while the number of Minimalist phase boundaries crossed by the wh-element increases.

If this is right

UD-grounded probes supply only a lower bound on syntactic knowledge in LLMs.
Distributional pretraining can induce representations aligned with Minimalist phase boundaries and cohesion.
The representations remain causally active inside the models as shown by activation patching.
Phase effects appear even on pairs whose surface UD distance is fixed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same controlled-stimulus logic could be applied to other Minimalist notions such as successive-cyclic movement or edge features without new annotations.
If phase structure is learnable from text alone, then models may acquire additional formal-syntactic distinctions that current annotation schemes also omit.
Future probes could test whether the phase gradient scales with model size or training data volume on the same fixed-UD stimuli.

Load-bearing premise

The stimuli are constructed so that UD distances stay identical across conditions, forcing any probe difference to come from structure that UD does not encode.

What would settle it

Absence of both the phase-count gradient and the within-clause sign asymmetry in a majority of the same models on the same stimuli would falsify the claim that the models encode phase structure beyond UD.

Figures

Figures reproduced from arXiv: 2605.26431 by Peter Chin, Yuanhao Chen.

**Figure 2.** Figure 2: UD parses of the three conditions, with embedded-clause words colour-coded by condition (matching [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Per-model layer profiles of 𝛽 (ℓ ) fin (finite − bare) and 𝛽 (ℓ ) inf (infinitival − bare) on the wh-esubj pair. Small filled markers indicate layers FDR-significant at 𝛼 = 0.05; the larger white-edged marker is the per-contrast peak in the predicted direction. 0.0 0.2 0.4 0.6 0.8 1.0 β (probe-distance difference) Qwen-2.5-1.5B Llama-3.1-8B Qwen-3-1.7B Qwen-2.5-7B Llama-3.2-1B Gemma-3-4B Qwen-3-4B Mistral-… view at source ↗

**Figure 4.** Figure 4: Per-model peak vs. canonical-layer 𝛽 on the wh-esubj pair, with 95% cluster-bootstrap CIs. 𝛽fin is shown at its peak layer (= canonical by construction). 𝛽inf is shown both at its own peak (open markers) and at 𝐿 ∗ (filled markers). Three observations rule out simpler accounts. First, UD distance between the two words is 1 edge in every condition (fig. 2), so a pure UD-decoding probe would yield 𝛽 ≈ 0. Sec… view at source ↗

**Figure 6.** Figure 6: Per-model layer profiles of 𝛽 (ℓ ) fin (finite − bare) and 𝛽 (ℓ ) inf (infinitival − bare) on the esubj-evb pair. The sign asymmetry — 𝛽fin < 0 and 𝛽inf > 0 — holds at FDR-significant layers throughout most of the network in most models. Marker conventions follow fig. 3. representationally closer in the model’s hidden states. Three converging lines of work. The cohesion hypothesis is supported by independe… view at source ↗

**Figure 7.** Figure 7: Per-model Δ𝛽 for the embedded-subject patch (filled circles) and the wh-position negative control (open grey squares), measured on wh-esubj (left panel, blue) and esubj-evb (right panel, orange). Source condition is infinitival; target is bare; intervention layer is each model’s canonical layer 𝐿 ∗ . Bootstrap 95% CIs shown. 5 Related Work The structural probe (Hewitt and Manning, 2019; Manning et al., 202… view at source ↗

**Figure 8.** Figure 8: Phrase structures with intermediate projections for the three conditions, following copy theory [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Per-layer probe quality on the UD-EWT validation set for all 13 models. Panel (a): distance Spearman [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

read the original abstract

Structural probes train on Universal Dependencies (UD), which does not encode formal-syntactic abstractions such as phase boundaries or phase-internal cohesion. Whether large language models (LLMs) encode these remains an open question that UD-based probing cannot answer by construction. We evaluate structural probes on wh-movement stimuli where UD distances are invariant across conditions by design -- any non-zero effect therefore reflects structure beyond UD. The three conditions -- bare small clause, infinitival, and finite -- are ordered by the number of Minimalist Program (MP) phase boundaries the wh-element crosses. Across 13 LLMs from four families, we find a phase-count gradient on a cross-clause pair (12/13 models) and a 13/13 sign asymmetry on a within-clause pair whose UD distance is identical across conditions -- the latter specifically predicted by phase-internal cohesion, an MP abstraction invisible to UD by construction. Activation patching confirms the representations are causally active in 12/13 models. These findings suggest that distributional pretraining can induce representations aligned with formal-syntactic abstractions beyond the reach of annotation-based probing; UD-grounded probes provide a lower bound on syntactic encoding, not an upper bound.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows LLMs pick up phase boundaries and cohesion in wh-movement stimuli that hold UD distance fixed, with a gradient in 12/13 models and sign asymmetry in all 13 plus patching support.

read the letter

The main point is that the authors test whether LLMs encode Minimalist phase structure that UD cannot represent by construction. They do this with wh-movement conditions where UD distances stay constant while phase count varies, so any probe difference has to come from something else.

What is new is the targeted design that isolates phase count on cross-clause pairs and the specific sign-asymmetry prediction on within-clause pairs from phase-internal cohesion. The results line up with that prediction across 13 models from four families, and activation patching shows the representations are used in 12/13 cases.

The control is the strongest part: holding UD distance fixed removes the usual confound, and the consistency plus causal check gives the claim some grounding. The work stays focused on a clear, falsifiable contrast rather than broad claims.

The soft spot is that only the abstract is in front of us, so effect sizes, exact stimulus wording, statistical thresholds, and probe training details are not visible. If those turn out weak or if the patching has uncontrolled variables, the support drops. The reported patterns are still worth checking once the full methods and data are available.

This is for readers who work on syntactic probing and want to know whether distributional models go past annotation schemes. It deserves a serious referee because the question is sharp and the control is explicit enough for others to evaluate the evidence directly.

Referee Report

0 major / 2 minor

Summary. The paper claims that LLMs encode Minimalist Program phase boundaries and phase-internal cohesion—abstractions invisible to Universal Dependencies (UD) by construction. Using wh-movement stimuli where UD distances are held invariant across three conditions (bare small clause, infinitival, finite) ordered by phase count, structural probes on 13 LLMs from four families yield a phase-count gradient (12/13 models) on cross-clause pairs and a sign asymmetry (13/13 models) on within-clause pairs. Activation patching establishes causal involvement of the probed representations in 12/13 models. The authors conclude that UD-grounded probes supply a lower bound, not an upper bound, on syntactic encoding in LLMs.

Significance. If the results hold, the work demonstrates that distributional pretraining can induce representations aligned with formal-syntactic abstractions beyond annotation schemes such as UD. Credit is given for the explicit control that any non-zero effect must reflect structure beyond UD, the evaluation across 13 models in four families, and the causal confirmation via activation patching. These elements strengthen the inference that LLMs capture phase structure invisible to UD.

minor comments (2)

[§3] §3 (Stimuli): supply explicit UD distance calculations or annotation examples for the three conditions to allow direct verification of invariance.
[§4] §4 (Results): report exact statistical tests, effect sizes, or per-model p-values supporting the 12/13 and 13/13 counts rather than summary statements alone.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, including the explicit controls, multi-family evaluation, and causal confirmation via activation patching. We appreciate the recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central inference rests on an experimental control in which UD distances are explicitly held invariant across conditions that differ in MP phase count and phase-internal cohesion. Probe effects and activation-patching results are measured directly on these stimuli; no equations, fitted parameters, or self-citations reduce the reported effects to quantities defined by the UD annotations themselves. The design therefore treats UD invariance as an external benchmark rather than deriving the MP attribution from the input annotations by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical probing study with no free parameters, axioms, or invented entities introduced; the central claim rests on the design of the stimuli and the interpretation of probe outputs as reflecting MP structure.

pith-pipeline@v0.9.1-grok · 5736 in / 1175 out tokens · 28112 ms · 2026-06-29T18:41:40.496896+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 25 canonical work pages · 5 internal anchors

[1]

Ananth Agarwal, Jasper Jian, Christopher D Manning, and Shikhar Murty. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1712 Mechanisms vs. Outcomes : Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages 33737--33757, Suzhou, C...

work page doi:10.18653/v1/2025.emnlp-main.1712 2025
[2]

Yoav Benjamini and Yosef Hochberg. 1995. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x Controlling the False Discovery Rate : A Practical and Powerful Approach to Multiple Testing . Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289--300

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995
[3]

Željko Bošković. 2007. https://doi.org/10.1162/ling.2007.38.4.589 On the Locality and Motivation of Move and Agree : An Even More Minimal Theory . Linguistic Inquiry, 38(4):589--644

work page doi:10.1162/ling.2007.38.4.589 2007
[4]

Rejean Canac Marquis. 2005. https://doi.org/10.21248/hpsg.2005.28 Phases and Binding of Reflexives and Pronouns in English . Proceedings of the International Conference on Head-Driven Phrase Structure Grammar

work page doi:10.21248/hpsg.2005.28 2005
[5]

Noam Chomsky. 1986. Barriers. Linguistic Inquiry Monographs . MIT Press, Cambridge, MA, USA

1986
[6]

Noam Chomsky. 2000. Minimalist Inquiries : The Framework . In Step by Step : Essays on Minimalist Syntax in Honor of Howard Lasnik , pages 89--155. MIT Press, Cambridge

2000
[7]

Noam Chomsky. 2001. https://doi.org/10.7551/mitpress/4056.003.0004 Derivation by Phase . In Michael Kenstowicz, editor, Ken Hale , pages 1--52. The MIT Press

work page doi:10.7551/mitpress/4056.003.0004 2001
[8]

Nomi Erteschik-Shir. 1973. http://hdl.handle.net/1721.1/12991 On the nature of island constraints. Ph.D. thesis, Massachusetts Institute of Technology

1973
[9]

Danny Fox and David Pesetsky. 2005. https://doi.org/10.1515/thli.2005.31.1-2.1 Cyclic Linearization of Syntactic Structure . Theoretical Linguistics, 31(1-2):1--45

work page doi:10.1515/thli.2005.31.1-2.1 2005
[10]

Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, and Thomas Icard. 2023. https://arxiv.org/abs/2301.04709v4 Causal Abstraction : A Theoretical Foundation for Mechanistic Interpretability

work page arXiv 2023
[11]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://doi.org/10.48550/arXiv.2407.21783 Th...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
[12]

John Hewitt and Christopher D. Manning. 2019. https://doi.org/10.18653/v1/N19-1419 A Structural Probe for Finding Syntax in Word Representations . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies , Volume 1 ( Long and Short Papers ) , pages 4129--4138, Minnea...

work page doi:10.18653/v1/n19-1419 2019
[13]

Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. https://doi.org/10.5281/zenodo.1212303 spaCy : Industrial -strength natural language processing in python

work page doi:10.5281/zenodo.1212303 2020
[14]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. https://doi.org/10.48550/arX...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825 2023
[15]

Just and Patricia A

Marcel A. Just and Patricia A. Carpenter. 1980. https://doi.org/10.1037/0033-295X.87.4.329 A theory of reading: From eye fixations to comprehension . Psychological Review, 87(4):329--354

work page doi:10.1037/0033-295x.87.4.329 1980
[16]

Mary Kennedy. 2025. https://doi.org/10.18653/v1/2025.conll-1.25 Evidence of Generative Syntax in LLMs . In Proceedings of the 29th Conference on Computational Natural Language Learning , pages 377--396, Vienna, Austria. Association for Computational Linguistics

work page doi:10.18653/v1/2025.conll-1.25 2025
[17]

Keonwoo Koo and Hyosik Kim. 2026. https://doi.org/10.3389/fpsyg.2025.1699740 Successive-cyclic movement in humans and neural language models: testing wh-filler-gap dependencies . Frontiers in Psychology, 16

work page doi:10.3389/fpsyg.2025.1699740 2026
[18]

Vera Lee-Schoenfeld. 2008. https://doi.org/10.1111/j.1467-9612.2008.00118.x Binding, Phases , and Locality . Syntax, 11(3):281--298

work page doi:10.1111/j.1467-9612.2008.00118.x 2008
[19]

Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy

Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy. 2020. https://doi.org/10.1073/pnas.1907367117 Emergent linguistic structure in artificial neural networks trained by self-supervision . Proceedings of the National Academy of Sciences, 117(48):30046--30054

work page doi:10.1073/pnas.1907367117 2020
[20]

Rebecca Marvin and Tal Linzen. 2018. https://doi.org/10.18653/v1/D18-1151 Targeted Syntactic Evaluation of Language Models . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages 1192--1202, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/d18-1151 2018
[21]

Gereon Müller. 2011. https://doi.org/10.1075/lfab.7 Constraints on Displacement : A phase-based approach , volume 7 of Language Faculty and Beyond . John Benjamins Publishing Company, Amsterdam

work page doi:10.1075/lfab.7 2011
[22]

Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, and 24 others. 2025. https://doi.org/10.48550/arXiv.2412.15115 Qwen2.5 Technical Report . arXiv preprint. ArXiv:2412.15115 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025
[23]

Keith Rayner, Gretchen Kambe, and Susan A. Duffy. 2000. https://doi.org/10.1080/713755934 The Effect of Clause Wrap - Up on Eye Movements during Reading . The Quarterly Journal of Experimental Psychology Section A, 53(4):1061--1080

work page doi:10.1080/713755934 2000
[24]

Bowman, Miriam Connor, John Bauer, and Chris Manning

Natalia Silveira, Timothy Dozat, Marie-Catherine de Marneffe, Samuel R. Bowman, Miriam Connor, John Bauer, and Chris Manning. 2014. https://aclanthology.org/L14-1067/ A gold standard dependency corpus for E nglish . In Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14) , pages 2897--2904, Reykjavik, Iceland....

2014
[25]

Timothy Angus Stowell. 1981. https://dspace.mit.edu/handle/1721.1/15626 Origins of phrase structure . Thesis, Massachusetts Institute of Technology. Accepted: 2009-01-23T14:40:10Z

1981
[26]

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, and 197 others. 2025. https://doi.org/10.48550/arXiv.2...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.19786 2025
[27]

Juan Uriagereka. 1999. https://doi.org/10.7551/mitpress/7305.003.0012 Multiple Spell - Out . In Working Minimalism . The MIT Press

work page doi:10.7551/mitpress/7305.003.0012 1999
[28]

Juan Uriagereka. 2012. Spell-out and the minimalist program. Oxford linguistics. Oxford university press, Oxford New York

2012
[29]

Coppe van Urk. 2020. https://doi.org/10.1146/annurev-linguistics-011718-012318 Successive Cyclicity and the Syntax of Long - Distance Dependencies . Annual Review of Linguistics, 6(Volume 6, 2020):111--130

work page doi:10.1146/annurev-linguistics-011718-012318 2020
[30]

Ethan Wilcox, Roger Levy, Takashi Morita, and Richard Futrell. 2018. https://doi.org/10.18653/v1/W18-5423 What do RNN Language Models Learn about Filler – Gap Dependencies ? In Proceedings of the 2018 EMNLP Workshop BlackboxNLP : Analyzing and Interpreting Neural Networks for NLP , pages 211--221, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/w18-5423 2018
[31]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others. 2025. https://doi.org/10.48550/arXiv.2505.09388 Qwen3 Technical Report . arXiv preprint. ArXiv:2505.09388 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388 2025

[1] [1]

Ananth Agarwal, Jasper Jian, Christopher D Manning, and Shikhar Murty. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1712 Mechanisms vs. Outcomes : Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages 33737--33757, Suzhou, C...

work page doi:10.18653/v1/2025.emnlp-main.1712 2025

[2] [2]

Yoav Benjamini and Yosef Hochberg. 1995. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x Controlling the False Discovery Rate : A Practical and Powerful Approach to Multiple Testing . Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289--300

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995

[3] [3]

Željko Bošković. 2007. https://doi.org/10.1162/ling.2007.38.4.589 On the Locality and Motivation of Move and Agree : An Even More Minimal Theory . Linguistic Inquiry, 38(4):589--644

work page doi:10.1162/ling.2007.38.4.589 2007

[4] [4]

Rejean Canac Marquis. 2005. https://doi.org/10.21248/hpsg.2005.28 Phases and Binding of Reflexives and Pronouns in English . Proceedings of the International Conference on Head-Driven Phrase Structure Grammar

work page doi:10.21248/hpsg.2005.28 2005

[5] [5]

Noam Chomsky. 1986. Barriers. Linguistic Inquiry Monographs . MIT Press, Cambridge, MA, USA

1986

[6] [6]

Noam Chomsky. 2000. Minimalist Inquiries : The Framework . In Step by Step : Essays on Minimalist Syntax in Honor of Howard Lasnik , pages 89--155. MIT Press, Cambridge

2000

[7] [7]

Noam Chomsky. 2001. https://doi.org/10.7551/mitpress/4056.003.0004 Derivation by Phase . In Michael Kenstowicz, editor, Ken Hale , pages 1--52. The MIT Press

work page doi:10.7551/mitpress/4056.003.0004 2001

[8] [8]

Nomi Erteschik-Shir. 1973. http://hdl.handle.net/1721.1/12991 On the nature of island constraints. Ph.D. thesis, Massachusetts Institute of Technology

1973

[9] [9]

Danny Fox and David Pesetsky. 2005. https://doi.org/10.1515/thli.2005.31.1-2.1 Cyclic Linearization of Syntactic Structure . Theoretical Linguistics, 31(1-2):1--45

work page doi:10.1515/thli.2005.31.1-2.1 2005

[10] [10]

Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, and Thomas Icard. 2023. https://arxiv.org/abs/2301.04709v4 Causal Abstraction : A Theoretical Foundation for Mechanistic Interpretability

work page arXiv 2023

[11] [11]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://doi.org/10.48550/arXiv.2407.21783 Th...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024

[12] [12]

John Hewitt and Christopher D. Manning. 2019. https://doi.org/10.18653/v1/N19-1419 A Structural Probe for Finding Syntax in Word Representations . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies , Volume 1 ( Long and Short Papers ) , pages 4129--4138, Minnea...

work page doi:10.18653/v1/n19-1419 2019

[13] [13]

Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. https://doi.org/10.5281/zenodo.1212303 spaCy : Industrial -strength natural language processing in python

work page doi:10.5281/zenodo.1212303 2020

[14] [14]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. https://doi.org/10.48550/arX...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825 2023

[15] [15]

Just and Patricia A

Marcel A. Just and Patricia A. Carpenter. 1980. https://doi.org/10.1037/0033-295X.87.4.329 A theory of reading: From eye fixations to comprehension . Psychological Review, 87(4):329--354

work page doi:10.1037/0033-295x.87.4.329 1980

[16] [16]

Mary Kennedy. 2025. https://doi.org/10.18653/v1/2025.conll-1.25 Evidence of Generative Syntax in LLMs . In Proceedings of the 29th Conference on Computational Natural Language Learning , pages 377--396, Vienna, Austria. Association for Computational Linguistics

work page doi:10.18653/v1/2025.conll-1.25 2025

[17] [17]

Keonwoo Koo and Hyosik Kim. 2026. https://doi.org/10.3389/fpsyg.2025.1699740 Successive-cyclic movement in humans and neural language models: testing wh-filler-gap dependencies . Frontiers in Psychology, 16

work page doi:10.3389/fpsyg.2025.1699740 2026

[18] [18]

Vera Lee-Schoenfeld. 2008. https://doi.org/10.1111/j.1467-9612.2008.00118.x Binding, Phases , and Locality . Syntax, 11(3):281--298

work page doi:10.1111/j.1467-9612.2008.00118.x 2008

[19] [19]

Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy

Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy. 2020. https://doi.org/10.1073/pnas.1907367117 Emergent linguistic structure in artificial neural networks trained by self-supervision . Proceedings of the National Academy of Sciences, 117(48):30046--30054

work page doi:10.1073/pnas.1907367117 2020

[20] [20]

Rebecca Marvin and Tal Linzen. 2018. https://doi.org/10.18653/v1/D18-1151 Targeted Syntactic Evaluation of Language Models . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages 1192--1202, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/d18-1151 2018

[21] [21]

Gereon Müller. 2011. https://doi.org/10.1075/lfab.7 Constraints on Displacement : A phase-based approach , volume 7 of Language Faculty and Beyond . John Benjamins Publishing Company, Amsterdam

work page doi:10.1075/lfab.7 2011

[22] [22]

Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, and 24 others. 2025. https://doi.org/10.48550/arXiv.2412.15115 Qwen2.5 Technical Report . arXiv preprint. ArXiv:2412.15115 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025

[23] [23]

Keith Rayner, Gretchen Kambe, and Susan A. Duffy. 2000. https://doi.org/10.1080/713755934 The Effect of Clause Wrap - Up on Eye Movements during Reading . The Quarterly Journal of Experimental Psychology Section A, 53(4):1061--1080

work page doi:10.1080/713755934 2000

[24] [24]

Bowman, Miriam Connor, John Bauer, and Chris Manning

Natalia Silveira, Timothy Dozat, Marie-Catherine de Marneffe, Samuel R. Bowman, Miriam Connor, John Bauer, and Chris Manning. 2014. https://aclanthology.org/L14-1067/ A gold standard dependency corpus for E nglish . In Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14) , pages 2897--2904, Reykjavik, Iceland....

2014

[25] [25]

Timothy Angus Stowell. 1981. https://dspace.mit.edu/handle/1721.1/15626 Origins of phrase structure . Thesis, Massachusetts Institute of Technology. Accepted: 2009-01-23T14:40:10Z

1981

[26] [26]

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, and 197 others. 2025. https://doi.org/10.48550/arXiv.2...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.19786 2025

[27] [27]

Juan Uriagereka. 1999. https://doi.org/10.7551/mitpress/7305.003.0012 Multiple Spell - Out . In Working Minimalism . The MIT Press

work page doi:10.7551/mitpress/7305.003.0012 1999

[28] [28]

Juan Uriagereka. 2012. Spell-out and the minimalist program. Oxford linguistics. Oxford university press, Oxford New York

2012

[29] [29]

Coppe van Urk. 2020. https://doi.org/10.1146/annurev-linguistics-011718-012318 Successive Cyclicity and the Syntax of Long - Distance Dependencies . Annual Review of Linguistics, 6(Volume 6, 2020):111--130

work page doi:10.1146/annurev-linguistics-011718-012318 2020

[30] [30]

Ethan Wilcox, Roger Levy, Takashi Morita, and Richard Futrell. 2018. https://doi.org/10.18653/v1/W18-5423 What do RNN Language Models Learn about Filler – Gap Dependencies ? In Proceedings of the 2018 EMNLP Workshop BlackboxNLP : Analyzing and Interpreting Neural Networks for NLP , pages 211--221, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/w18-5423 2018

[31] [31]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others. 2025. https://doi.org/10.48550/arXiv.2505.09388 Qwen3 Technical Report . arXiv preprint. ArXiv:2505.09388 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388 2025