GiLT: Augmenting Transformer Language Models with Dependency Graphs

Chuyan Zhou; Kewei Tu; Tianyu Huang; Yida Zhao

arxiv: 2605.15562 · v1 · pith:N6JSGJ2Nnew · submitted 2026-05-15 · 💻 cs.CL

GiLT: Augmenting Transformer Language Models with Dependency Graphs

Tianyu Huang , Yida Zhao , Chuyan Zhou , Kewei Tu This is my paper

Pith reviewed 2026-05-20 19:36 UTC · model grok-4.3

classification 💻 cs.CL

keywords language modelingdependency graphstransformersyntactic generalizationattention modulationgraph infusionsemantic dependenciesfine-tuning

0 comments

The pith

Modulating Transformer attention with features from incrementally built dependency graphs improves syntactic generalization without extra tokens or perplexity loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes GiLT to augment standard Transformer language models by injecting information from dependency graphs. Instead of adding structural tokens, it builds the graph step by step as tokens are predicted and uses extracted features to adjust attention weights inside the model. Experiments show this yields stronger performance on syntactic generalization tasks while keeping perplexity close to baseline levels. The same approach works when starting from an already trained language model and then fine-tuning for specific tasks.

Core claim

GiLT injects structural information into language modeling by modulating attention weights in the Transformer with features extracted from the dependency graph that is incrementally constructed along with token prediction. GiLT with semantic dependency graphs achieves better syntactic generalization while maintaining competitive perplexity in comparison with Transformer language model baselines and can be finetuned from a pretrained language model to achieve improved downstream task performance.

What carries the argument

Graph-infused layers that extract features from the incrementally constructed dependency graph and use them to modulate attention weights during token prediction.

If this is right

GiLT produces measurable gains in syntactic generalization over plain Transformer baselines.
Perplexity remains competitive with unmodified Transformer language models.
The method transfers to fine-tuning a pretrained language model for better results on downstream tasks.
Structural information enters the model without inserting additional special tokens into the sequence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The incremental graph construction might be adapted to other graph types such as semantic role graphs or discourse graphs.
Models trained this way could require fewer examples to reach a given level of syntactic accuracy on parsing-related tasks.
Combining the attention-modulation step with other structural signals such as constituency information could produce further gains.

Load-bearing premise

Features taken from a dependency graph built token by token can be used to adjust attention in a way that improves syntactic generalization without raising perplexity or needing extra tokens.

What would settle it

Running the same syntactic generalization benchmarks on a version of GiLT with the graph-modulation component removed and finding no drop in performance would show the dependency-graph features are not what drives the reported gains.

Figures

Figures reproduced from arXiv: 2605.15562 by Chuyan Zhou, Kewei Tu, Tianyu Huang, Yida Zhao.

**Figure 2.** Figure 2: Scores on the 6 circuits of the SG test suites [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Left: visualization of attention scores of the first head in the last layer of GiLT (left) and TXL (right) given [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Augmenting Transformers with linguistic structures effectively enhances the syntactic generalization performance of language models. Previous work in this direction focuses on syntactic tree structures of languages, in particular constituency tree structures. We propose Graph-Infused Layers Transformer Language Model (GiLT) which leverages dependency graphs for augmenting Transformer language models. Unlike most previous work, GiLT does not insert extra structural tokens in language modeling; instead, it injects structural information into language modeling by modulating attention weights in the Transformer with features extracted from the dependency graph that is incrementally constructed along with token prediction. In our experiments, GiLT with semantic dependency graphs achieves better syntactic generalization while maintaining competitive perplexity in comparison with Transformer language model baselines. In addition, GiLT can be finetuned from a pretrained language model to achieve improved downstream task performance. Our code is released at https://github.com/cookie-pie-oops/GiLT-LM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GiLT modulates attention with incrementally built dependency graphs to avoid extra tokens, but the prefix-parsing step looks like the main place where things could go wrong.

read the letter

The central idea is straightforward: instead of adding structural tokens or relying on constituency trees, GiLT extracts features from a dependency graph that gets built token by token as the model generates, then uses those features to adjust attention weights inside the Transformer. The abstract says this yields better syntactic generalization on semantic dependency graphs while keeping perplexity competitive with plain baselines, and that the same model can be finetuned from a pretrained checkpoint for downstream gains. They also released code, which helps.

Referee Report

2 major / 2 minor

Summary. The paper proposes Graph-Infused Layers Transformer (GiLT), which augments Transformer language models by modulating attention weights using features extracted from dependency graphs that are incrementally constructed during token prediction. Unlike prior work that inserts extra structural tokens, GiLT avoids this overhead. Experiments claim that GiLT with semantic dependency graphs yields better syntactic generalization while preserving competitive perplexity relative to standard Transformer baselines, and that the model can be finetuned from a pretrained LM for improved downstream task performance.

Significance. If the empirical gains hold after proper controls, the approach offers an efficient route to infuse linguistic structure into LMs without token overhead or major perplexity degradation. The public code release supports reproducibility and is a clear strength.

major comments (2)

[§4 (Experiments)] §4 (Experiments) and associated ablations: the central syntactic-generalization claim depends on reliable attention modulation from incrementally built dependency graphs on model-generated prefixes. No oracle-vs-predicted parse comparison or error-injection ablation is reported; without these, it is impossible to rule out that gains arise from incidental regularization in the modulation mechanism rather than the intended structural signal. This directly bears on whether the reported improvements support the main claim.
[§3 (Method)] §3 (Method), attention-modulation equations: the precise mapping from graph features to attention-weight adjustments is underspecified with respect to additional parameters or learned components. If the modulation introduces extra degrees of freedom, the claim of a lightweight structural augmentation requires explicit quantification of parameter count relative to the baseline Transformer.

minor comments (2)

The abstract and §1 should explicitly cite the dependency parser and graph-construction procedure (including any preprocessing of semantic dependencies) so readers can assess reproducibility.
Table 1 (or equivalent results table) would benefit from reporting standard deviations over multiple runs and clearer baseline descriptions to allow direct comparison of perplexity and generalization metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments help clarify key aspects of our claims and method. We address each major comment below and describe the revisions we will make.

read point-by-point responses

Referee: §4 (Experiments) and associated ablations: the central syntactic-generalization claim depends on reliable attention modulation from incrementally built dependency graphs on model-generated prefixes. No oracle-vs-predicted parse comparison or error-injection ablation is reported; without these, it is impossible to rule out that gains arise from incidental regularization in the modulation mechanism rather than the intended structural signal. This directly bears on whether the reported improvements support the main claim.

Authors: We agree that additional controls would strengthen the evidence supporting our central claim. In the revised manuscript we will add an error-injection ablation: we will randomly corrupt a fraction of the edges in the incrementally predicted dependency graphs and show that the syntactic generalization gains are substantially reduced, indicating that the improvements arise from the structural signal rather than incidental regularization effects of the modulation. We will also include a short discussion noting that our experiments use predicted parses (to reflect realistic deployment) while acknowledging that oracle parses would likely produce an upper bound on performance. revision: yes
Referee: §3 (Method), attention-modulation equations: the precise mapping from graph features to attention-weight adjustments is underspecified with respect to additional parameters or learned components. If the modulation introduces extra degrees of freedom, the claim of a lightweight structural augmentation requires explicit quantification of parameter count relative to the baseline Transformer.

Authors: We appreciate this observation. The attention modulation in GiLT is a deterministic, parameter-free function that derives multiplicative adjustments directly from the extracted dependency-graph features (edge labels and distances) and applies them to the attention scores; no additional trainable weights or learned components are introduced. Consequently, GiLT has exactly the same parameter count as the baseline Transformer. In the revision we will expand Section 3 with the complete modulation equations and explicitly report this parameter equivalence to substantiate the lightweight claim. revision: yes

Circularity Check

0 steps flagged

Empirical augmentation technique with no load-bearing derivations or self-citation chains

full rationale

The paper presents GiLT as an architectural augmentation that modulates Transformer attention using features from an incrementally built dependency graph during token prediction. No equations, derivations, or fitted parameters are described that reduce to the experimental inputs by construction. The central claims rest on empirical comparisons of perplexity and syntactic generalization against baselines, with no self-definitional loops, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation. The method is self-contained as a proposed engineering technique whose validity is tested externally via held-out metrics rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are described in the abstract; the method is presented as an empirical augmentation relying on standard dependency parsing and Transformer components.

pith-pipeline@v0.9.0 · 5687 in / 1009 out tokens · 37380 ms · 2026-05-20T19:36:17.011341+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GiLT ... injects structural information into language modeling by modulating attention weights in the Transformer with features extracted from the dependency graph that is incrementally constructed along with token prediction.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we extract features from the partially constructed dependency graph and form a graph-based feature tape Gk = [g1k, g2k, · · ·, gkk] ∈ N^{3×k}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015
[2]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page
[3]

Attention Is

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and ukasz Kaiser,. Attention Is. Advances in. 2017 , volume =

work page 2017
[4]

Trends in Cognitive Sciences , author =

Structures,. Trends in Cognitive Sciences , author =. 2015 , pages =. doi:10.1016/j.tics.2015.09.008 , abstract =

work page doi:10.1016/j.tics.2015.09.008 2015
[5]

Recurrent Neural Network Grammars

Dyer, Chris and Kuncoro, Adhiguna and Ballesteros, Miguel and Smith, Noah A. Recurrent Neural Network Grammars. Proceedings of the 2016 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016. doi:10.18653/v1/N16-1024

work page doi:10.18653/v1/n16-1024 2016
[6]

Unsupervised Recurrent Neural Network Grammars

Kim, Yoon and Rush, Alexander and Yu, Lei and Kuncoro, Adhiguna and Dyer, Chris and Melis, G \'a bor. Unsupervised Recurrent Neural Network Grammars. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1114

work page doi:10.18653/v1/n19-1114 2019
[7]

Parsing as Language Modeling

Choe, Do Kook and Charniak, Eugene. Parsing as Language Modeling. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1257

work page doi:10.18653/v1/d16-1257 2016
[8]

Effective Batching for Recurrent Neural Network Grammars

Noji, Hiroshi and Oseki, Yohei. Effective Batching for Recurrent Neural Network Grammars. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.380

work page doi:10.18653/v1/2021.findings-acl.380 2021
[9]

Neural language models as psycholinguistic subjects: Representations of syntactic state

Futrell, Richard and Wilcox, Ethan and Morita, Takashi and Qian, Peng and Ballesteros, Miguel and Levy, Roger. Neural language models as psycholinguistic subjects: Representations of syntactic state. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Lo...

work page doi:10.18653/v1/n19-1004 2019
[10]

Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models

Zhao, Yida and Lou, Chao and Tu, Kewei. Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.84

work page doi:10.18653/v1/2024.acl-long.84 2024
[11]

Generative Incremental Dependency Parsing with Neural Networks

Buys, Jan and Blunsom, Phil. Generative Incremental Dependency Parsing with Neural Networks. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015. doi:10.3115/v1/P15-2142

work page doi:10.3115/v1/p15-2142 2015
[12]

Dependency Recurrent Neural Language Models for Sentence Completion

Mirowski, Piotr and Vlachos, Andreas. Dependency Recurrent Neural Language Models for Sentence Completion. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015. doi:10.3115/v1/P15-2084

work page doi:10.3115/v1/p15-2084 2015
[13]

Structural Guidance for Transformer Language Models

Qian, Peng and Naseem, Tahira and Levy, Roger and Fernandez Astudillo, Ram \'o n. Structural Guidance for Transformer Language Models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.289

work page doi:10.18653/v1/2021.acl-long.289 2021
[14]

Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

Murty, Shikhar and Sharma, Pratyusha and Andreas, Jacob and Manning, Christopher. Pushdown Layers: Encoding Recursive Structure in Transformer Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.195

work page doi:10.18653/v1/2023.emnlp-main.195 2023
[15]

Transactions of the Association for Computational Linguistics , volume =

Sartran, Laurent and Barrett, Samuel and Kuncoro, Adhiguna and Stanojević, Miloš and Blunsom, Phil and Dyer, Chris , title =. Transactions of the Association for Computational Linguistics , volume =. 2022 , month =. doi:10.1162/tacl_a_00526 , url =

work page doi:10.1162/tacl_a_00526 2022
[16]

Composition, Attention, or Both?

Yoshida, Ryo and Oseki, Yohei. Composition, Attention, or Both?. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.428

work page doi:10.18653/v1/2022.findings-emnlp.428 2022
[17]

Statistical machine translation using labeled semantic dependency graphs

Aue, Anthony and Menezes, Arul and Moore, Bob and Quirk, Chris and Ringger, Eric. Statistical machine translation using labeled semantic dependency graphs. Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages. 2004

work page 2004
[18]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,

Integrating Vision-Language Semantic Graphs in Multi-View Clustering , author =. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,. 2024 , month =. doi:10.24963/ijcai.2024/472 , url =

work page doi:10.24963/ijcai.2024/472 2024
[19]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , month =

Hu, Jennifer and Gauthier, Jon and Qian, Peng and Wilcox, Ethan and Levy, Roger. A Systematic Assessment of Syntactic Generalization in Neural Language Models. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.158

work page doi:10.18653/v1/2020.acl-main.158 2020
[20]

doi:10.35111/FWEW-DA58 , abstract =

Charniak, Eugene and. doi:10.35111/FWEW-DA58 , abstract =

work page doi:10.35111/fwew-da58
[21]

doi: 10.18653/v1/P19-1285

Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc and Salakhutdinov, Ruslan. Transformer- XL : Attentive Language Models beyond a Fixed-Length Context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1285

work page doi:10.18653/v1/p19-1285 2019
[22]

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Prange, Jakob and Schneider, Nathan and Kong, Lingpeng. Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.325

work page doi:10.18653/v1/2022.naacl-main.325 2022
[23]

Simpler but More Accurate Semantic Dependency Parsing

Dozat, Timothy and Manning, Christopher D. Simpler but More Accurate Semantic Dependency Parsing. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. doi:10.18653/v1/P18-2077

work page doi:10.18653/v1/p18-2077 2018
[24]

Effective Inference for Generative Neural Parsing

Stern, Mitchell and Fried, Daniel and Klein, Dan. Effective Inference for Generative Neural Parsing. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. doi:10.18653/v1/D17-1178

work page doi:10.18653/v1/d17-1178 2017
[25]

Tree Transformer: Integrating Tree Structures into Self-Attention

Wang, Yaushian and Lee, Hung-Yi and Chen, Yun-Nung. Tree Transformer: Integrating Tree Structures into Self-Attention. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1098

work page doi:10.18653/v1/d19-1098 2019
[26]

P a LM : A Hybrid Parser and Language Model

Peng, Hao and Schwartz, Roy and Smith, Noah A. P a LM : A Hybrid Parser and Language Model. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1376

work page doi:10.18653/v1/d19-1376 2019
[27]

Guiding Attention for Self-Supervised Learning with Transformers

Deshpande, Ameet and Narasimhan, Karthik. Guiding Attention for Self-Supervised Learning with Transformers. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.419

work page doi:10.18653/v1/2020.findings-emnlp.419 2020
[28]

Automated Concatenation of Embeddings for Structured Prediction

Wang, Xinyu and Jiang, Yong and Bach, Nguyen and Wang, Tao and Huang, Zhongqiang and Huang, Fei and Tu, Kewei. Automated Concatenation of Embeddings for Structured Prediction. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Pa...

work page doi:10.18653/v1/2021.acl-long.206 2021
[29]

International Conference on Learning Representations , year=

Deep Biaffine Attention for Neural Dependency Parsing , author=. International Conference on Learning Representations , year=

work page
[30]

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Kudo, Taku and Richardson, John. S entence P iece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2018. doi:10.18653/v1/D18-2012

work page internal anchor Pith review doi:10.18653/v1/d18-2012 2018
[31]

BL i MP : The Benchmark of Linguistic Minimal Pairs for E nglish

Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei and Wang, Sheng-Fu and Bowman, Samuel R. BL i MP : The Benchmark of Linguistic Minimal Pairs for E nglish. Transactions of the Association for Computational Linguistics. 2020. doi:10.1162/tacl_a_00321

work page doi:10.1162/tacl_a_00321 2020
[32]

Proceedings of the 2018

Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel. GLUE : A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5446

work page doi:10.18653/v1/w18-5446 2018
[33]

S em E val 2015 Task 18: Broad-Coverage Semantic Dependency Parsing

Oepen, Stephan and Kuhlmann, Marco and Miyao, Yusuke and Zeman, Daniel and Cinkov \'a , Silvie and Flickinger, Dan and Haji c , Jan and Ure s ov \'a , Zde n ka. S em E val 2015 Task 18: Broad-Coverage Semantic Dependency Parsing. Proceedings of the 9th International Workshop on Semantic Evaluation ( S em E val 2015). 2015. doi:10.18653/v1/S15-2153

work page doi:10.18653/v1/s15-2153 2015
[34]

Dan Flickinger , title =. Nat. Lang. Eng. , volume =. 2000 , url =

work page 2000
[35]

Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories

Flickinger, Daniel and Zhang, Yi and Kordoni, Valia , title =. Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories. International Workshop on Treebanks and Linguistic Theories (TLT-11), 11th, November 30-December 1, Lisbon, Portugal , year =

work page
[36]

Miyao, Yusuke

From linguistic theory to syntactic analysis : corpus-oriented grammar development and feature forest model , author="Miyao, Yusuke", year=

work page
[37]

International Conference on Language Resources and Evaluation , year=

Announcing Prague Czech-English Dependency Treebank 2.0 , author=. International Conference on Language Resources and Evaluation , year=

work page
[38]

Computational Linguistics , volume =

Palmer, Martha and Gildea, Daniel and Kingsbury, Paul , title =. Computational Linguistics , volume =. 2005 , month =. doi:10.1162/0891201053630264 , url =

work page doi:10.1162/0891201053630264 2005
[39]

Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale

Hu, Xiang and Ji, Pengyu and Zhu, Qingyang and Wu, Wei and Tu, Kewei. Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.145

work page doi:10.18653/v1/2024.acl-long.145 2024
[40]

2019 , institution=

Language Models are Unsupervised Multitask Learners , author=. 2019 , institution=

work page 2019

[1] [1]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015

[2] [2]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page

[3] [3]

Attention Is

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and ukasz Kaiser,. Attention Is. Advances in. 2017 , volume =

work page 2017

[4] [4]

Trends in Cognitive Sciences , author =

Structures,. Trends in Cognitive Sciences , author =. 2015 , pages =. doi:10.1016/j.tics.2015.09.008 , abstract =

work page doi:10.1016/j.tics.2015.09.008 2015

[5] [5]

Recurrent Neural Network Grammars

Dyer, Chris and Kuncoro, Adhiguna and Ballesteros, Miguel and Smith, Noah A. Recurrent Neural Network Grammars. Proceedings of the 2016 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016. doi:10.18653/v1/N16-1024

work page doi:10.18653/v1/n16-1024 2016

[6] [6]

Unsupervised Recurrent Neural Network Grammars

Kim, Yoon and Rush, Alexander and Yu, Lei and Kuncoro, Adhiguna and Dyer, Chris and Melis, G \'a bor. Unsupervised Recurrent Neural Network Grammars. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1114

work page doi:10.18653/v1/n19-1114 2019

[7] [7]

Parsing as Language Modeling

Choe, Do Kook and Charniak, Eugene. Parsing as Language Modeling. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016. doi:10.18653/v1/D16-1257

work page doi:10.18653/v1/d16-1257 2016

[8] [8]

Effective Batching for Recurrent Neural Network Grammars

Noji, Hiroshi and Oseki, Yohei. Effective Batching for Recurrent Neural Network Grammars. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.380

work page doi:10.18653/v1/2021.findings-acl.380 2021

[9] [9]

Neural language models as psycholinguistic subjects: Representations of syntactic state

Futrell, Richard and Wilcox, Ethan and Morita, Takashi and Qian, Peng and Ballesteros, Miguel and Levy, Roger. Neural language models as psycholinguistic subjects: Representations of syntactic state. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Lo...

work page doi:10.18653/v1/n19-1004 2019

[10] [10]

Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models

Zhao, Yida and Lou, Chao and Tu, Kewei. Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.84

work page doi:10.18653/v1/2024.acl-long.84 2024

[11] [11]

Generative Incremental Dependency Parsing with Neural Networks

Buys, Jan and Blunsom, Phil. Generative Incremental Dependency Parsing with Neural Networks. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015. doi:10.3115/v1/P15-2142

work page doi:10.3115/v1/p15-2142 2015

[12] [12]

Dependency Recurrent Neural Language Models for Sentence Completion

Mirowski, Piotr and Vlachos, Andreas. Dependency Recurrent Neural Language Models for Sentence Completion. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015. doi:10.3115/v1/P15-2084

work page doi:10.3115/v1/p15-2084 2015

[13] [13]

Structural Guidance for Transformer Language Models

Qian, Peng and Naseem, Tahira and Levy, Roger and Fernandez Astudillo, Ram \'o n. Structural Guidance for Transformer Language Models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. doi:10.18653/v1/2021.acl-long.289

work page doi:10.18653/v1/2021.acl-long.289 2021

[14] [14]

Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

Murty, Shikhar and Sharma, Pratyusha and Andreas, Jacob and Manning, Christopher. Pushdown Layers: Encoding Recursive Structure in Transformer Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.195

work page doi:10.18653/v1/2023.emnlp-main.195 2023

[15] [15]

Transactions of the Association for Computational Linguistics , volume =

Sartran, Laurent and Barrett, Samuel and Kuncoro, Adhiguna and Stanojević, Miloš and Blunsom, Phil and Dyer, Chris , title =. Transactions of the Association for Computational Linguistics , volume =. 2022 , month =. doi:10.1162/tacl_a_00526 , url =

work page doi:10.1162/tacl_a_00526 2022

[16] [16]

Composition, Attention, or Both?

Yoshida, Ryo and Oseki, Yohei. Composition, Attention, or Both?. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.428

work page doi:10.18653/v1/2022.findings-emnlp.428 2022

[17] [17]

Statistical machine translation using labeled semantic dependency graphs

Aue, Anthony and Menezes, Arul and Moore, Bob and Quirk, Chris and Ringger, Eric. Statistical machine translation using labeled semantic dependency graphs. Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages. 2004

work page 2004

[18] [18]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,

Integrating Vision-Language Semantic Graphs in Multi-View Clustering , author =. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,. 2024 , month =. doi:10.24963/ijcai.2024/472 , url =

work page doi:10.24963/ijcai.2024/472 2024

[19] [19]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , month =

Hu, Jennifer and Gauthier, Jon and Qian, Peng and Wilcox, Ethan and Levy, Roger. A Systematic Assessment of Syntactic Generalization in Neural Language Models. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.158

work page doi:10.18653/v1/2020.acl-main.158 2020

[20] [20]

doi:10.35111/FWEW-DA58 , abstract =

Charniak, Eugene and. doi:10.35111/FWEW-DA58 , abstract =

work page doi:10.35111/fwew-da58

[21] [21]

doi: 10.18653/v1/P19-1285

Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc and Salakhutdinov, Ruslan. Transformer- XL : Attentive Language Models beyond a Fixed-Length Context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1285

work page doi:10.18653/v1/p19-1285 2019

[22] [22]

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Prange, Jakob and Schneider, Nathan and Kong, Lingpeng. Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.325

work page doi:10.18653/v1/2022.naacl-main.325 2022

[23] [23]

Simpler but More Accurate Semantic Dependency Parsing

Dozat, Timothy and Manning, Christopher D. Simpler but More Accurate Semantic Dependency Parsing. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018. doi:10.18653/v1/P18-2077

work page doi:10.18653/v1/p18-2077 2018

[24] [24]

Effective Inference for Generative Neural Parsing

Stern, Mitchell and Fried, Daniel and Klein, Dan. Effective Inference for Generative Neural Parsing. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017. doi:10.18653/v1/D17-1178

work page doi:10.18653/v1/d17-1178 2017

[25] [25]

Tree Transformer: Integrating Tree Structures into Self-Attention

Wang, Yaushian and Lee, Hung-Yi and Chen, Yun-Nung. Tree Transformer: Integrating Tree Structures into Self-Attention. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1098

work page doi:10.18653/v1/d19-1098 2019

[26] [26]

P a LM : A Hybrid Parser and Language Model

Peng, Hao and Schwartz, Roy and Smith, Noah A. P a LM : A Hybrid Parser and Language Model. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1376

work page doi:10.18653/v1/d19-1376 2019

[27] [27]

Guiding Attention for Self-Supervised Learning with Transformers

Deshpande, Ameet and Narasimhan, Karthik. Guiding Attention for Self-Supervised Learning with Transformers. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.419

work page doi:10.18653/v1/2020.findings-emnlp.419 2020

[28] [28]

Automated Concatenation of Embeddings for Structured Prediction

Wang, Xinyu and Jiang, Yong and Bach, Nguyen and Wang, Tao and Huang, Zhongqiang and Huang, Fei and Tu, Kewei. Automated Concatenation of Embeddings for Structured Prediction. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Pa...

work page doi:10.18653/v1/2021.acl-long.206 2021

[29] [29]

International Conference on Learning Representations , year=

Deep Biaffine Attention for Neural Dependency Parsing , author=. International Conference on Learning Representations , year=

work page

[30] [30]

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Kudo, Taku and Richardson, John. S entence P iece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2018. doi:10.18653/v1/D18-2012

work page internal anchor Pith review doi:10.18653/v1/d18-2012 2018

[31] [31]

BL i MP : The Benchmark of Linguistic Minimal Pairs for E nglish

Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei and Wang, Sheng-Fu and Bowman, Samuel R. BL i MP : The Benchmark of Linguistic Minimal Pairs for E nglish. Transactions of the Association for Computational Linguistics. 2020. doi:10.1162/tacl_a_00321

work page doi:10.1162/tacl_a_00321 2020

[32] [32]

Proceedings of the 2018

Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel. GLUE : A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop B lackbox NLP : Analyzing and Interpreting Neural Networks for NLP. 2018. doi:10.18653/v1/W18-5446

work page doi:10.18653/v1/w18-5446 2018

[33] [33]

S em E val 2015 Task 18: Broad-Coverage Semantic Dependency Parsing

Oepen, Stephan and Kuhlmann, Marco and Miyao, Yusuke and Zeman, Daniel and Cinkov \'a , Silvie and Flickinger, Dan and Haji c , Jan and Ure s ov \'a , Zde n ka. S em E val 2015 Task 18: Broad-Coverage Semantic Dependency Parsing. Proceedings of the 9th International Workshop on Semantic Evaluation ( S em E val 2015). 2015. doi:10.18653/v1/S15-2153

work page doi:10.18653/v1/s15-2153 2015

[34] [34]

Dan Flickinger , title =. Nat. Lang. Eng. , volume =. 2000 , url =

work page 2000

[35] [35]

Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories

Flickinger, Daniel and Zhang, Yi and Kordoni, Valia , title =. Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories. International Workshop on Treebanks and Linguistic Theories (TLT-11), 11th, November 30-December 1, Lisbon, Portugal , year =

work page

[36] [36]

Miyao, Yusuke

From linguistic theory to syntactic analysis : corpus-oriented grammar development and feature forest model , author="Miyao, Yusuke", year=

work page

[37] [37]

International Conference on Language Resources and Evaluation , year=

Announcing Prague Czech-English Dependency Treebank 2.0 , author=. International Conference on Language Resources and Evaluation , year=

work page

[38] [38]

Computational Linguistics , volume =

Palmer, Martha and Gildea, Daniel and Kingsbury, Paul , title =. Computational Linguistics , volume =. 2005 , month =. doi:10.1162/0891201053630264 , url =

work page doi:10.1162/0891201053630264 2005

[39] [39]

Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale

Hu, Xiang and Ji, Pengyu and Zhu, Qingyang and Wu, Wei and Tu, Kewei. Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.145

work page doi:10.18653/v1/2024.acl-long.145 2024

[40] [40]

2019 , institution=

Language Models are Unsupervised Multitask Learners , author=. 2019 , institution=

work page 2019