pith. sign in

arxiv: 2112.07874 · v2 · submitted 2021-12-15 · 💻 cs.CL · cs.AI

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Pith reviewed 2026-05-24 12:29 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords neuro-symbolic language modelinglinguistic graphssemantic constituencyTransformer ensemblelanguage modeling performancesyntactic structuresdependency structurespart-of-speech effects
0
0 comments X

The pith

Semantic constituency structures improve neural language modeling more than syntactic or dependency structures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether different linguistic graph representations can complement and improve neural language modeling when combined with a pretrained Transformer. Using an ensemble setup with ground-truth graphs from seven formalisms, the authors compare syntactic and semantic constituency structures as well as syntactic and semantic dependency structures. They find that semantic constituency structures deliver the largest performance gains overall. These gains vary substantially depending on the part-of-speech class of the words being modeled. The results point to useful tendencies for future neuro-symbolic language modeling work.

Core claim

With an ensemble setup consisting of a pretrained Transformer and ground-truth graphs from one of 7 different formalisms, semantic constituency structures are most useful to language modeling performance outpacing syntactic constituency structures as well as syntactic and semantic dependency structures, with effects varying greatly depending on part-of-speech class.

What carries the argument

Ensemble setup combining a pretrained Transformer with ground-truth graphs from seven linguistic formalisms

Load-bearing premise

The ensemble integration method treats graphs from all seven formalisms comparably without systematic bias favoring semantic constituency structures due to how the graphs are encoded or combined with the Transformer.

What would settle it

Re-running the experiments with an adjusted integration technique that removes any encoding differences across formalisms and checking whether semantic constituency structures still show the largest gains.

Figures

Figures reproduced from arXiv: 2112.07874 by Jakob Prange, Lingpeng Kong, Nathan Schneider.

Figure 1
Figure 1. Figure 1: Contrasting GPT-2’s incremental attention mechanism (top right) with incremental context slices obtained from linguistic graphs (left four panels) of four different formalisms (§5.2). As shared tokenization we use GPT-2’s byte-pair encoding. Slice nodes are color-coded by local relation type (black: target, cyan: parent, blue: child, green: coparent, yellow: sibling, purple: grandparent, brown: aunt). Dash… view at source ↗
Figure 2
Figure 2. Figure 2: Example of subtle differences in con￾stituency (PTG) and dependency (PSD) versions of the same underlying formalism, the Prague Functional De￾scription. PTG has an abstract PRED node as well as a multiword anchor where PSD does not, which results in diverging slice representations for the last two tokens. we take the intersection of these sentences and OntoNotes 5.0, which contains the gold PTB syn￾tax ann… view at source ↗
Figure 3
Figure 3. Figure 3: Model perplexity (lower is better) with UPOS as additional input. Top left: nouns, verbs, and modifiers; top right: auxiliaries and pronouns; bottom left: adpositions and subordinating conjunctions; bottom right: deter￾miners and coordinating conjunctions. Big gray squares mark baseline (finetuned GPT-2) performance without (dark) and with (light) POS inputs and SLR-specific data points without/with POS in… view at source ↗
read the original abstract

We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling. With an ensemble setup consisting of a pretrained Transformer and ground-truth graphs from one of 7 different formalisms, we find that, overall, semantic constituency structures are most useful to language modeling performance -- outpacing syntactic constituency structures as well as syntactic and semantic dependency structures. Further, effects vary greatly depending on part-of-speech class. In sum, our findings point to promising tendencies in neuro-symbolic language modeling and invite future research quantifying the design choices made by different formalisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript examines the utility of linguistic graph representations from seven formalisms (syntactic/semantic constituency and dependency) when integrated in an ensemble with a pretrained Transformer for language modeling. It claims that semantic constituency structures are most useful overall, outpacing the others, with performance effects varying substantially by part-of-speech class; the work uses ground-truth graphs and points to design choices across formalisms as a direction for future neuro-symbolic research.

Significance. If the comparative results hold after addressing integration details, the paper provides empirical evidence favoring semantic constituency in neuro-symbolic LM augmentation and highlights the value of systematic cross-formalism comparisons using external pretrained models and established graphs. This avoids circularity and supplies a concrete baseline for quantifying formalism contributions.

major comments (2)
  1. [Methods / Ensemble Setup] The central claim that semantic constituency outperforms other structures rests on the ensemble integration; however, the methods provide no explicit controls or uniformity checks for how hierarchical constituency graphs versus flatter dependency graphs are encoded, embedded, or fused (e.g., via attention or node representations), leaving open the possibility that performance gaps arise from topology interactions rather than linguistic content.
  2. [Results] No statistical tests, confidence intervals, or per-formalism result tables with effect sizes are referenced to support the abstract's comparative findings; without these, the ranking of semantic constituency cannot be assessed for robustness against the noted encoding-bias risk.
minor comments (1)
  1. [Abstract] The abstract lists seven formalisms but does not name them; adding the explicit list would improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Methods / Ensemble Setup] The central claim that semantic constituency outperforms other structures rests on the ensemble integration; however, the methods provide no explicit controls or uniformity checks for how hierarchical constituency graphs versus flatter dependency graphs are encoded, embedded, or fused (e.g., via attention or node representations), leaving open the possibility that performance gaps arise from topology interactions rather than linguistic content.

    Authors: The ensemble uses an identical graph encoder architecture, embedding dimensions, attention-based fusion mechanism, and hyperparameter set for all seven formalisms. No topology-specific modifications were applied, so any performance differences are intended to reflect the linguistic content of the graphs. We agree that the Methods section would benefit from an explicit statement confirming this uniformity. We will add a paragraph detailing the shared pipeline and noting the absence of differential encoding steps. revision: yes

  2. Referee: [Results] No statistical tests, confidence intervals, or per-formalism result tables with effect sizes are referenced to support the abstract's comparative findings; without these, the ranking of semantic constituency cannot be assessed for robustness against the noted encoding-bias risk.

    Authors: The original manuscript reported mean performance but omitted formal statistical support. We will revise the Results section to include bootstrap confidence intervals, paired statistical tests between formalisms, and an expanded table (or supplement) reporting per-formalism scores with effect sizes. This addition will directly address concerns about robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of external graphs with pretrained model

full rationale

The paper reports direct empirical results from ensembling a fixed pretrained Transformer with ground-truth graphs drawn from seven established formalisms. No equations, fitted parameters, or predictions are defined in terms of the target performance metric. No self-citations are invoked to justify uniqueness or to close a derivation loop. The central claim (semantic constituency outperforming other structures) is a measured outcome on held-out data rather than a quantity that reduces to the inputs by construction. This is a standard non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical evaluation that relies on existing pretrained neural models and linguistic graph resources from prior literature rather than introducing new fitted parameters or postulated entities.

axioms (1)
  • domain assumption Ground-truth graphs from the seven formalisms provide accurate and unbiased representations suitable for fair comparison in the ensemble
    The experimental design assumes the provided graphs are correct inputs whose differences reflect the formalisms themselves.

pith-pipeline@v0.9.0 · 5619 in / 1163 out tokens · 49684 ms · 2026-05-24T12:29:59.444354+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

  1. [1]

    URL: " 'urlintro :=

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Omri Abend and Ari Rappoport. 2017. http://aclweb.org/anthology/P17-1008 The state of the art in semantic representation . In Proc. of ACL , pages 77--89, Vancouver, Canada

  4. [4]

    Jiangang Bai, Yujing Wang, Yiren Chen, Yaming Yang, Jing Bai, Jing Yu, and Yunhai Tong. 2021. https://doi.org/10.18653/v1/2021.eacl-main.262 Syntax- BERT : Improving pre-trained transformers with syntax trees . In Proc. of EACL, pages 3011--3020, Online. Association for Computational Linguistics

  5. [5]

    Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. http://www.aclweb.org/anthology/W13-2322 Abstract M eaning R epresentation for sembanking . In Proc. of LAW-ID, pages 178--186, Sofia, Bulgaria

  6. [6]

    Bender and Alexander Koller

    Emily M. Bender and Alexander Koller. 2020. https://doi.org/10.18653/v1/2020.acl-main.463 Climbing towards NLU : On meaning, form, and understanding in the age of data . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185--5198, Online. Association for Computational Linguistics

  7. [7]

    Alena B\" o hmov\' a , Jan Haji c , Eva Haji c ov\' a , and Barbora Hladk\' a . 2003. https://doi.org/10.1007/978-94-010-0201-1_7 The P rague D ependency T reebank: A three-level annotation scenario . In Anne Abeill\' e , editor, Treebanks: Building and Using Parsed Corpora, Text, Speech and Language Technology, pages 103--127. Springer Netherlands, Dordrecht

  8. [8]

    Do Kook Choe and Eugene Charniak. 2016. https://doi.org/10.18653/v1/D16-1257 Parsing as language modeling . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2331--2336, Austin, Texas. Association for Computational Linguistics

  9. [9]

    Leshem Choshen and Omri Abend. 2021. https://arxiv.org/abs/2101.12640 Transition based graph decoder for neural machine translation . ArXiv:2101.12640

  10. [10]

    Ann Copestake, Dan Flickinger, Carl Pollard, and Ivan A Sag. 2005. Minimal recursion semantics: An introduction. Research on language and computation, 3(2):281--332

  11. [11]

    Manning, Joakim Nivre, and Daniel Zeman

    Marie-Catherine de Marneffe, Christopher D. Manning, Joakim Nivre, and Daniel Zeman. 2021. https://doi.org/10.1162/coli_a_00402 Universal D ependencies . Computational Linguistics, 47(2):255--308

  12. [12]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : P re-training of deep bidirectional transformers for language understanding . In Proc. of NAACL-HLT, pages 4171--4186

  13. [13]

    Haim Dubossarsky, Eitan Grossman, and Daphna Weinshall. 2018. https://doi.org/10.18653/v1/D18-1200 Coming to your senses: on controls and evaluation sets in polysemy research . In Proc. of EMNLP, pages 1732--1740, Brussels, Belgium. Association for Computational Linguistics

  14. [14]

    Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. https://www.aclweb.org/anthology/N16-1024 Recurrent N eural N etwork G rammars . In Proc. of NAACL-HLT , pages 199--209, San Diego, CA , USA

  15. [15]

    Adam Ek, Jean-Philippe Bernardy, and Shalom Lappin. 2019. https://aclanthology.org/W19-6108 Language modeling with syntactic and semantic representation for sentence acceptability predictions . In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 76--85, Turku, Finland. Link \"o ping University Electronic Press

  16. [16]

    Dan Flickinger. 2000. On building a more effcient grammar by exploiting types. Natural Language Engineering, 6(1):15--28

  17. [17]

    Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. 2021. https://arxiv.org/abs/2106.02997 Causal abstractions of neural networks . In Proc. of NeurIPS

  18. [18]

    Joseph Gubbins and Andreas Vlachos. 2013. https://aclanthology.org/D13-1143 Dependency language models for sentence completion . In Proc. of EMNLP, pages 1405--1410, Seattle, Washington, USA. Association for Computational Linguistics

  19. [19]

    Valerie Hajdik, Jan Buys, Michael Wayne Goodman, and Emily M. Bender. 2019. https://doi.org/10.18653/v1/N19-1235 Neural text generation from rich semantic representations . In Proc. of NAACL-HLT, pages 2259--2266, Minneapolis, Minnesota. Association for Computational Linguistics

  20. [20]

    Jan Haji c , Eva Haji c ov \'a , Jarmila Panevov \'a , Petr Sgall, Ond r ej Bojar, Silvie Cinkov \'a , Eva Fu c \' kov \'a , Marie Mikulov \'a , Petr Pajas, Jan Popelka, Ji r \' Semeck \'y , Jana S indlerov \'a , Jan S t e p \'a nek, Josef Toman, Zde n ka Ure s ov \'a , and Zden e k Z abokrtsk \'y . 2012. http://www.lrec-conf.org/proceedings/lrec2012/pdf/...

  21. [21]

    Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux, and Omri Abend. 2020. https://doi.org/10.18653/v1/2020.coling-main.264 Comparison by conversion: Reverse-engineering UCCA from syntax and lexical semantics . In Proc. of COLING, pages 2947--2966, Barcelona, Spain (Online). International Committee on Computational Linguistics

  22. [22]

    John Hewitt and Percy Liang. 2019. https://doi.org/10.18653/v1/D19-1275 Designing and interpreting probes with control tasks . In Proc. of EMNLP-IJCNLP, pages 2733--2743, Hong Kong, China. Association for Computational Linguistics

  23. [23]

    John Hewitt and Christopher D. Manning. 2019. https://www.aclweb.org/anthology/N19-1419 A structural probe for finding syntax in word representations . In Proc. of NAACL-HLT , pages 4129--4138, Minneapolis, MN , USA

  24. [24]

    Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. 2006. http://www.aclweb.org/anthology/N06-2015 OntoNotes : the 90\ In Proc. of HLT-NAACL , pages 57--60, New York City, USA

  25. [25]

    Angelina Ivanova, Stephan Oepen, Lilja vrelid, and Dan Flickinger. 2012. https://www.aclweb.org/anthology/W12-3602 Who did what to whom? a contrastive study of syntacto-semantic dependencies . In Proc. of LAW, pages 2--11, Jeju, Republic of Korea. Association for Computational Linguistics

  26. [26]

    Bowman, and Ellie Pavlick

    Najoung Kim, Roma Patel, Adam Poliak, Patrick Xia, Alex Wang, Tom McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, and Ellie Pavlick. 2019. https://doi.org/10.18653/v1/S19-1026 Probing what different NLP tasks teach machines about function word comprehension . In Proc. of * SEM , pages 235--249, Minneapolis, Minnesota. Ass...

  27. [27]

    Semi-Supervised Classification with Graph Convolutional Networks

    Thomas N. Kipf and Max Welling. 2017. http://arxiv.org/abs/1609.02907 Semi- Supervised Classification with Graph Convolutional Networks . In Proc. of ICLR

  28. [28]

    Alexander Koller, Stephan Oepen, and Weiwei Sun. 2019. https://doi.org/10.18653/v1/P19-4002 Graph-based meaning representations: Design and processing . In Proc. of ACL: Tutorial Abstracts, pages 6--11, Florence, Italy. Association for Computational Linguistics

  29. [29]

    Artur Kulmizev, Vinit Ravishankar, Mostafa Abdou, and Joakim Nivre. 2020. https://doi.org/10.18653/v1/2020.acl-main.375 Do neural language models show preferences for syntactic formalisms? In Proc. of ACL, pages 4077--4091, Online. Association for Computational Linguistics

  30. [30]

    Ilia Kuznetsov and Iryna Gurevych. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.13 A matter of framing: T he impact of linguistic formalism on probing results . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 171--182, Online. Association for Computational Linguistics

  31. [31]

    Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. 2016. https://doi.org/10.1162/tacl_a_00115 Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies . TACL, 4:521--535

  32. [32]

    Liu, Matt Gardner, Yonatan Belinkov, Matthew E

    Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019. https://www.aclweb.org/anthology/N19-1112 Linguistic knowledge and transferability of contextual representations . In Proc. of NAACL-HLT , pages 1073--1094, Minneapolis, Minnesota

  33. [33]

    Ilya Loshchilov and Frank Hutter. 2019. https://openreview.net/forum?id=Bkg6RiCqY7 Decoupled weight decay regularization . In Proc. of ICLR , New Orleans, LA , USA

  34. [34]

    Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz

    Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. https://aclanthology.org/J93-2004 Building a large annotated corpus of E nglish: The P enn T reebank . Computational Linguistics, 19(2):313--330

  35. [35]

    William Merrill, Yoav Goldberg, Roy Schwartz, and Noah A. Smith. 2021. https://doi.org/10.1162/tacl_a_00412 Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand? TACL, 9:1047--1060

  36. [36]

    Piotr Mirowski and Andreas Vlachos. 2015. https://doi.org/10.3115/v1/P15-2084 Dependency recurrent neural language models for sentence completion . In Proc. of ACL-IJCNLP, pages 511--517, Beijing, China. Association for Computational Linguistics

  37. [37]

    Stefan M \"u ller. 2020. Grammatical theory: From transformational grammar to constraint-based approaches. Language Science Press

  38. [38]

    Manning, Ryan McDonald , Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman

    Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji c , Christopher D. Manning, Ryan McDonald , Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. http://www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf Universal D ependencies v1: a multilingual treebank collection . In Proc. of LREC ,...

  39. [39]

    Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman

    Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Haji c , Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman. 2020. https://aclanthology.org/2020.lrec-1.497 U niversal D ependencies v2: An evergrowing multilingual treebank collection . In Proc. of LREC, pages 4034--4043, Marseille, France. European Langu...

  40. [40]

    Stephan Oepen, Omri Abend, Lasha Abzianidze, Johan Bos, Jan Hajic, Daniel Hershcovich, Bin Li, Tim O ' Gorman, Nianwen Xue, and Daniel Zeman. 2020. https://doi.org/10.18653/v1/2020.conll-shared.1 MRP 2020: The second shared task on cross-framework and cross-lingual meaning representation parsing . In Proc. of MRP at CoNLL, pages 1--22, Online. Association...

  41. [41]

    Stephan Oepen, Omri Abend, Jan Hajic, Daniel Hershcovich, Marco Kuhlmann, Tim O ' Gorman, Nianwen Xue, Jayeol Chun, Milan Straka, and Zdenka Uresova. 2019. https://doi.org/10.18653/v1/K19-2001 MRP 2019: Cross-framework meaning representation parsing . In Proc. of MRP at CoNLL, pages 1--27, Hong Kong. Association for Computational Linguistics

  42. [42]

    Stephan Oepen and Jan Tore L nning. 2006. http://www.lrec-conf.org/proceedings/lrec2006/pdf/364_pdf.pdf Discriminant-based MRS banking . In Proc. of LREC, Genoa, Italy. European Language Resources Association (ELRA)

  43. [43]

    Adam Pauls and Dan Klein. 2012. https://aclanthology.org/P12-1101 Large-scale syntactic language modeling with treelets . In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 959--968, Jeju Island, Korea. Association for Computational Linguistics

  44. [44]

    Hao Peng, Roy Schwartz, and Noah A. Smith. 2019. https://doi.org/10.18653/v1/D19-1376 P a LM : A hybrid parser and language model . In Proc. of EMNLP-IJCNLP, pages 3644--3651, Hong Kong, China. Association for Computational Linguistics

  45. [45]

    Jakob Prange, Nathan Schneider, and Omri Abend. 2019 a . https://doi.org/10.18653/v1/K19-1017 Made for each other: Broad-coverage semantic structures meet preposition supersenses . In Proc. of CoNLL , pages 174--185, Hong Kong, China. Association for Computational Linguistics

  46. [46]

    Jakob Prange, Nathan Schneider, and Omri Abend. 2019 b . https://www.aclweb.org/anthology/W19-3319 Semantically constrained multilayer annotation: the case of coreference . In Proc. of DMR , pages 164--176, Florence, Italy

  47. [47]

    Peng Qian, Tahira Naseem, Roger Levy, and Ram \'o n Fernandez Astudillo. 2021. https://doi.org/10.18653/v1/2021.acl-long.289 Structural guidance for transformer language models . In Proc. of ACL-IJCNLP, pages 3735--3745, Online. Association for Computational Linguistics

  48. [48]

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf Language models are unsupervised multitask learners . OpenAI blog

  49. [49]

    Stefan Riezler and John T. Maxwell. 2005. https://aclanthology.org/W05-0908 On some pitfalls in automatic evaluation and significance testing for MT . In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , pages 57--64, Ann Arbor, Michigan. Association for Computational Linguistics

  50. [50]

    Michael Roth and Mirella Lapata. 2016. https://doi.org/10.18653/v1/P16-1113 Neural semantic role labeling with dependency path embeddings . In Proc. of ACL, pages 1192--1202, Berlin, Germany. Association for Computational Linguistics

  51. [51]

    Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Milo s Stanojevi \' c , Phil Blunsom, and Chris Dyer. 2022. https://arxiv.org/abs/2203.00633 T ransformer G rammars: Augmenting T ransformer language models with syntactic inductive biases at scale . ArXiv: 2203.00633

  52. [52]

    Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling

    Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. https://link.springer.com/chapter/10.1007\ relational data with graph convolutional networks . In The Semantic Web, pages 593--607, Cham. Springer International Publishing

  53. [53]

    Petr Sgall, Eva Haji c ov \'a , and Jarmila Panevov \'a . 1986. The meaning of the sentence and its semantic and pragmatic aspects. academia

  54. [54]

    Yikang Shen, Zhouhan Lin, Chin-wei Huang, and Aaron Courville. 2018. https://openreview.net/forum?id=rkgOLb-0W Neural language modeling by jointly learning syntax and lexicon . In Proc. of ICLR

  55. [55]

    Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. 2019. https://openreview.net/forum?id=B1l6qiR5F7 Ordered neurons: Integrating tree structures into recurrent neural networks . In Proc. of ICLR

  56. [56]

    Aviv Slobodkin, Leshem Choshen, and Omri Abend. 2021. https://arxiv.org/abs/2110.06920 Semantics-aware attention improves neural machine translation . ArXiv:2110.06920

  57. [57]

    Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. 2018. https://doi.org/10.18653/v1/D18-1548 Linguistically-informed self-attention for semantic role labeling . In Proc. of EMNLP, pages 5027--5038, Brussels, Belgium. Association for Computational Linguistics

  58. [58]

    Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, and Noah A. Smith. 2018. http://aclweb.org/anthology/D18-1412 Syntactic scaffolds for semantic structures . In Proc. of EMNLP , pages 3772--3782, Brussels, Belgium

  59. [59]

    Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. https://www.aclweb.org/anthology/P15-1150 Improved semantic representations from tree-structured L ong Short-Term M emory networks . In Proc. of ACL-IJCNLP , pages 1556--1566, Beijing, China

  60. [60]

    Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019 a . https://doi.org/10.18653/v1/P19-1452 BERT rediscovers the classical NLP pipeline . In Proc. of ACL, pages 4593--4601, Florence, Italy. Association for Computational Linguistics

  61. [61]

    Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Sam Bowman, Dipanjan Das, and Ellie Pavlick. 2019 b . https://openreview.net/forum?id=SJzSgnRcKX What do you learn from context? probing for sentence structure in contextualized word representations . In Proc. of ICLR

  62. [62]

    Sean Trott, Tiago Timponi Torrent, Nancy Chang, and Nathan Schneider. 2020. https://doi.org/10.18653/v1/2020.acl-main.462 ( R e)construing M eaning in NLP . In Proc. of ACL, pages 5170--5184, Online. Association for Computational Linguistics

  63. [63]

    Gomez, ukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf Attention is all you need . In Proc. of NeurIPS , pages 5998--6008, Long Beach, CA , USA

  64. [64]

    Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. https://openreview.net/forum?id=rJ4km2R5t7 GLUE : A multi-task benchmark and analysis platform for natural language understanding . In Proc. of ICLR

  65. [65]

    Zhaofeng Wu, Hao Peng, and Noah A. Smith. 2021. https://doi.org/10.1162/tacl_a_00363 Infusing Finetuning with Semantic Dependencies . TACL, 9:226--242

  66. [66]

    Zhiyong Wu, Yun Chen, Ben Kao, and Qun Liu. 2020. https://doi.org/10.18653/v1/2020.acl-main.383 Perturbed masking: Parameter-free probing for analyzing and interpreting BERT . In Proc. of ACL, pages 4166--4176, Online. Association for Computational Linguistics

  67. [67]

    Kaiyu Yang and Jia Deng. 2020. https://proceedings.neurips.cc/paper/2020/file/f7177163c833dff4b38fc8d2872f1ec6-Paper.pdf Strongly incremental constituency parsing with graph neural networks . In Proc. of NeurIPS, volume 33, pages 21687--21698. Curran Associates, Inc

  68. [68]

    Zden e k Z abokrtsk \'y , Daniel Zeman, and Magda S ev c \' kov \'a . 2020. https://doi.org/10.1162/coli_a_00385 Sentence meaning representations across languages: What can we learn from existing frameworks? Computational Linguistics, 46(3):605--665