Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Jakob Prange; Lingpeng Kong; Nathan Schneider

arxiv: 2112.07874 · v2 · submitted 2021-12-15 · 💻 cs.CL · cs.AI

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

Jakob Prange , Nathan Schneider , Lingpeng Kong This is my paper

Pith reviewed 2026-05-24 12:29 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords neuro-symbolic language modelinglinguistic graphssemantic constituencyTransformer ensemblelanguage modeling performancesyntactic structuresdependency structurespart-of-speech effects

0 comments

The pith

Semantic constituency structures improve neural language modeling more than syntactic or dependency structures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether different linguistic graph representations can complement and improve neural language modeling when combined with a pretrained Transformer. Using an ensemble setup with ground-truth graphs from seven formalisms, the authors compare syntactic and semantic constituency structures as well as syntactic and semantic dependency structures. They find that semantic constituency structures deliver the largest performance gains overall. These gains vary substantially depending on the part-of-speech class of the words being modeled. The results point to useful tendencies for future neuro-symbolic language modeling work.

Core claim

With an ensemble setup consisting of a pretrained Transformer and ground-truth graphs from one of 7 different formalisms, semantic constituency structures are most useful to language modeling performance outpacing syntactic constituency structures as well as syntactic and semantic dependency structures, with effects varying greatly depending on part-of-speech class.

What carries the argument

Ensemble setup combining a pretrained Transformer with ground-truth graphs from seven linguistic formalisms

Load-bearing premise

The ensemble integration method treats graphs from all seven formalisms comparably without systematic bias favoring semantic constituency structures due to how the graphs are encoded or combined with the Transformer.

What would settle it

Re-running the experiments with an adjusted integration technique that removes any encoding differences across formalisms and checking whether semantic constituency structures still show the largest gains.

Figures

Figures reproduced from arXiv: 2112.07874 by Jakob Prange, Lingpeng Kong, Nathan Schneider.

**Figure 1.** Figure 1: Contrasting GPT-2’s incremental attention mechanism (top right) with incremental context slices obtained from linguistic graphs (left four panels) of four different formalisms (§5.2). As shared tokenization we use GPT-2’s byte-pair encoding. Slice nodes are color-coded by local relation type (black: target, cyan: parent, blue: child, green: coparent, yellow: sibling, purple: grandparent, brown: aunt). Dash… view at source ↗

**Figure 2.** Figure 2: Example of subtle differences in constituency (PTG) and dependency (PSD) versions of the same underlying formalism, the Prague Functional Description. PTG has an abstract PRED node as well as a multiword anchor where PSD does not, which results in diverging slice representations for the last two tokens. we take the intersection of these sentences and OntoNotes 5.0, which contains the gold PTB syntax ann… view at source ↗

**Figure 3.** Figure 3: Model perplexity (lower is better) with UPOS as additional input. Top left: nouns, verbs, and modifiers; top right: auxiliaries and pronouns; bottom left: adpositions and subordinating conjunctions; bottom right: determiners and coordinating conjunctions. Big gray squares mark baseline (finetuned GPT-2) performance without (dark) and with (light) POS inputs and SLR-specific data points without/with POS in… view at source ↗

read the original abstract

We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling. With an ensemble setup consisting of a pretrained Transformer and ground-truth graphs from one of 7 different formalisms, we find that, overall, semantic constituency structures are most useful to language modeling performance -- outpacing syntactic constituency structures as well as syntactic and semantic dependency structures. Further, effects vary greatly depending on part-of-speech class. In sum, our findings point to promising tendencies in neuro-symbolic language modeling and invite future research quantifying the design choices made by different formalisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Semantic constituency graphs win in this ensemble comparison, but the abstract leaves open whether the fusion method treats all seven formalisms on equal footing.

read the letter

The main thing to know is that this paper runs the same pretrained Transformer plus ground-truth graph ensemble across seven linguistic formalisms and reports that semantic constituency structures improve language modeling most, ahead of syntactic constituency and both syntactic and semantic dependency structures. The gains also vary by part-of-speech class. That direct, single-setup comparison is the new empirical piece; prior work had not lined up these formalisms head-to-head this way. It gives people working on neuro-symbolic models a concrete data point on which graph types are worth trying first. The abstract is clear about the ordering and the POS variation, so the central claim is easy to test once the methods are available. The soft spot is the integration step. Constituency graphs are hierarchical while dependency graphs are flatter; if the encoding or combination layer interacts differently with those topologies, the performance edge could trace to representation mechanics rather than linguistic content. The abstract gives no details on how the graphs are turned into embeddings or fused, and no mention of controls for density or depth, so the fairness assumption still needs verification. No circularity or invented entities show up. This is for researchers already building hybrid models who need guidance on formalism choice. It is solid enough on its own terms to deserve a serious referee, mainly to check the encoding uniformity and run the right statistical tests. I would send it to review.

Referee Report

2 major / 1 minor

Summary. The manuscript examines the utility of linguistic graph representations from seven formalisms (syntactic/semantic constituency and dependency) when integrated in an ensemble with a pretrained Transformer for language modeling. It claims that semantic constituency structures are most useful overall, outpacing the others, with performance effects varying substantially by part-of-speech class; the work uses ground-truth graphs and points to design choices across formalisms as a direction for future neuro-symbolic research.

Significance. If the comparative results hold after addressing integration details, the paper provides empirical evidence favoring semantic constituency in neuro-symbolic LM augmentation and highlights the value of systematic cross-formalism comparisons using external pretrained models and established graphs. This avoids circularity and supplies a concrete baseline for quantifying formalism contributions.

major comments (2)

[Methods / Ensemble Setup] The central claim that semantic constituency outperforms other structures rests on the ensemble integration; however, the methods provide no explicit controls or uniformity checks for how hierarchical constituency graphs versus flatter dependency graphs are encoded, embedded, or fused (e.g., via attention or node representations), leaving open the possibility that performance gaps arise from topology interactions rather than linguistic content.
[Results] No statistical tests, confidence intervals, or per-formalism result tables with effect sizes are referenced to support the abstract's comparative findings; without these, the ranking of semantic constituency cannot be assessed for robustness against the noted encoding-bias risk.

minor comments (1)

[Abstract] The abstract lists seven formalisms but does not name them; adding the explicit list would improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [Methods / Ensemble Setup] The central claim that semantic constituency outperforms other structures rests on the ensemble integration; however, the methods provide no explicit controls or uniformity checks for how hierarchical constituency graphs versus flatter dependency graphs are encoded, embedded, or fused (e.g., via attention or node representations), leaving open the possibility that performance gaps arise from topology interactions rather than linguistic content.

Authors: The ensemble uses an identical graph encoder architecture, embedding dimensions, attention-based fusion mechanism, and hyperparameter set for all seven formalisms. No topology-specific modifications were applied, so any performance differences are intended to reflect the linguistic content of the graphs. We agree that the Methods section would benefit from an explicit statement confirming this uniformity. We will add a paragraph detailing the shared pipeline and noting the absence of differential encoding steps. revision: yes
Referee: [Results] No statistical tests, confidence intervals, or per-formalism result tables with effect sizes are referenced to support the abstract's comparative findings; without these, the ranking of semantic constituency cannot be assessed for robustness against the noted encoding-bias risk.

Authors: The original manuscript reported mean performance but omitted formal statistical support. We will revise the Results section to include bootstrap confidence intervals, paired statistical tests between formalisms, and an expanded table (or supplement) reporting per-formalism scores with effect sizes. This addition will directly address concerns about robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of external graphs with pretrained model

full rationale

The paper reports direct empirical results from ensembling a fixed pretrained Transformer with ground-truth graphs drawn from seven established formalisms. No equations, fitted parameters, or predictions are defined in terms of the target performance metric. No self-citations are invoked to justify uniqueness or to close a derivation loop. The central claim (semantic constituency outperforming other structures) is a measured outcome on held-out data rather than a quantity that reduces to the inputs by construction. This is a standard non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical evaluation that relies on existing pretrained neural models and linguistic graph resources from prior literature rather than introducing new fitted parameters or postulated entities.

axioms (1)

domain assumption Ground-truth graphs from the seven formalisms provide accurate and unbiased representations suitable for fair comparison in the ensemble
The experimental design assumes the provided graphs are correct inputs whose differences reflect the formalisms themselves.

pith-pipeline@v0.9.0 · 5619 in / 1163 out tokens · 49684 ms · 2026-05-24T12:29:59.444354+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

[1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Omri Abend and Ari Rappoport. 2017. http://aclweb.org/anthology/P17-1008 The state of the art in semantic representation . In Proc. of ACL , pages 77--89, Vancouver, Canada

work page 2017
[4]

Jiangang Bai, Yujing Wang, Yiren Chen, Yaming Yang, Jing Bai, Jing Yu, and Yunhai Tong. 2021. https://doi.org/10.18653/v1/2021.eacl-main.262 Syntax- BERT : Improving pre-trained transformers with syntax trees . In Proc. of EACL, pages 3011--3020, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2021.eacl-main.262 2021
[5]

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. http://www.aclweb.org/anthology/W13-2322 Abstract M eaning R epresentation for sembanking . In Proc. of LAW-ID, pages 178--186, Sofia, Bulgaria

work page 2013
[6]

Bender and Alexander Koller

Emily M. Bender and Alexander Koller. 2020. https://doi.org/10.18653/v1/2020.acl-main.463 Climbing towards NLU : On meaning, form, and understanding in the age of data . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185--5198, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.acl-main.463 2020
[7]

Alena B\" o hmov\' a , Jan Haji c , Eva Haji c ov\' a , and Barbora Hladk\' a . 2003. https://doi.org/10.1007/978-94-010-0201-1_7 The P rague D ependency T reebank: A three-level annotation scenario . In Anne Abeill\' e , editor, Treebanks: Building and Using Parsed Corpora, Text, Speech and Language Technology, pages 103--127. Springer Netherlands, Dordrecht

work page doi:10.1007/978-94-010-0201-1_7 2003
[8]

Do Kook Choe and Eugene Charniak. 2016. https://doi.org/10.18653/v1/D16-1257 Parsing as language modeling . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2331--2336, Austin, Texas. Association for Computational Linguistics

work page doi:10.18653/v1/d16-1257 2016
[9]

Leshem Choshen and Omri Abend. 2021. https://arxiv.org/abs/2101.12640 Transition based graph decoder for neural machine translation . ArXiv:2101.12640

work page arXiv 2021
[10]

Ann Copestake, Dan Flickinger, Carl Pollard, and Ivan A Sag. 2005. Minimal recursion semantics: An introduction. Research on language and computation, 3(2):281--332

work page 2005
[11]

Manning, Joakim Nivre, and Daniel Zeman

Marie-Catherine de Marneffe, Christopher D. Manning, Joakim Nivre, and Daniel Zeman. 2021. https://doi.org/10.1162/coli_a_00402 Universal D ependencies . Computational Linguistics, 47(2):255--308

work page doi:10.1162/coli_a_00402 2021
[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : P re-training of deep bidirectional transformers for language understanding . In Proc. of NAACL-HLT, pages 4171--4186

work page doi:10.18653/v1/n19-1423 2019
[13]

Haim Dubossarsky, Eitan Grossman, and Daphna Weinshall. 2018. https://doi.org/10.18653/v1/D18-1200 Coming to your senses: on controls and evaluation sets in polysemy research . In Proc. of EMNLP, pages 1732--1740, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/d18-1200 2018
[14]

Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. https://www.aclweb.org/anthology/N16-1024 Recurrent N eural N etwork G rammars . In Proc. of NAACL-HLT , pages 199--209, San Diego, CA , USA

work page 2016
[15]

Adam Ek, Jean-Philippe Bernardy, and Shalom Lappin. 2019. https://aclanthology.org/W19-6108 Language modeling with syntactic and semantic representation for sentence acceptability predictions . In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 76--85, Turku, Finland. Link \"o ping University Electronic Press

work page 2019
[16]

Dan Flickinger. 2000. On building a more effcient grammar by exploiting types. Natural Language Engineering, 6(1):15--28

work page 2000
[17]

Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. 2021. https://arxiv.org/abs/2106.02997 Causal abstractions of neural networks . In Proc. of NeurIPS

work page arXiv 2021
[18]

Joseph Gubbins and Andreas Vlachos. 2013. https://aclanthology.org/D13-1143 Dependency language models for sentence completion . In Proc. of EMNLP, pages 1405--1410, Seattle, Washington, USA. Association for Computational Linguistics

work page 2013
[19]

Valerie Hajdik, Jan Buys, Michael Wayne Goodman, and Emily M. Bender. 2019. https://doi.org/10.18653/v1/N19-1235 Neural text generation from rich semantic representations . In Proc. of NAACL-HLT, pages 2259--2266, Minneapolis, Minnesota. Association for Computational Linguistics

work page doi:10.18653/v1/n19-1235 2019
[20]

Jan Haji c , Eva Haji c ov \'a , Jarmila Panevov \'a , Petr Sgall, Ond r ej Bojar, Silvie Cinkov \'a , Eva Fu c \' kov \'a , Marie Mikulov \'a , Petr Pajas, Jan Popelka, Ji r \' Semeck \'y , Jana S indlerov \'a , Jan S t e p \'a nek, Josef Toman, Zde n ka Ure s ov \'a , and Zden e k Z abokrtsk \'y . 2012. http://www.lrec-conf.org/proceedings/lrec2012/pdf/...

work page 2012
[21]

Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux, and Omri Abend. 2020. https://doi.org/10.18653/v1/2020.coling-main.264 Comparison by conversion: Reverse-engineering UCCA from syntax and lexical semantics . In Proc. of COLING, pages 2947--2966, Barcelona, Spain (Online). International Committee on Computational Linguistics

work page doi:10.18653/v1/2020.coling-main.264 2020
[22]

John Hewitt and Percy Liang. 2019. https://doi.org/10.18653/v1/D19-1275 Designing and interpreting probes with control tasks . In Proc. of EMNLP-IJCNLP, pages 2733--2743, Hong Kong, China. Association for Computational Linguistics

work page doi:10.18653/v1/d19-1275 2019
[23]

John Hewitt and Christopher D. Manning. 2019. https://www.aclweb.org/anthology/N19-1419 A structural probe for finding syntax in word representations . In Proc. of NAACL-HLT , pages 4129--4138, Minneapolis, MN , USA

work page 2019
[24]

Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. 2006. http://www.aclweb.org/anthology/N06-2015 OntoNotes : the 90\ In Proc. of HLT-NAACL , pages 57--60, New York City, USA

work page 2006
[25]

Angelina Ivanova, Stephan Oepen, Lilja vrelid, and Dan Flickinger. 2012. https://www.aclweb.org/anthology/W12-3602 Who did what to whom? a contrastive study of syntacto-semantic dependencies . In Proc. of LAW, pages 2--11, Jeju, Republic of Korea. Association for Computational Linguistics

work page 2012
[26]

Bowman, and Ellie Pavlick

Najoung Kim, Roma Patel, Adam Poliak, Patrick Xia, Alex Wang, Tom McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, and Ellie Pavlick. 2019. https://doi.org/10.18653/v1/S19-1026 Probing what different NLP tasks teach machines about function word comprehension . In Proc. of * SEM , pages 235--249, Minneapolis, Minnesota. Ass...

work page doi:10.18653/v1/s19-1026 2019
[27]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling. 2017. http://arxiv.org/abs/1609.02907 Semi- Supervised Classification with Graph Convolutional Networks . In Proc. of ICLR

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

Alexander Koller, Stephan Oepen, and Weiwei Sun. 2019. https://doi.org/10.18653/v1/P19-4002 Graph-based meaning representations: Design and processing . In Proc. of ACL: Tutorial Abstracts, pages 6--11, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-4002 2019
[29]

Artur Kulmizev, Vinit Ravishankar, Mostafa Abdou, and Joakim Nivre. 2020. https://doi.org/10.18653/v1/2020.acl-main.375 Do neural language models show preferences for syntactic formalisms? In Proc. of ACL, pages 4077--4091, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.acl-main.375 2020
[30]

Ilia Kuznetsov and Iryna Gurevych. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.13 A matter of framing: T he impact of linguistic formalism on probing results . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 171--182, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.13 2020
[31]

Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. 2016. https://doi.org/10.1162/tacl_a_00115 Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies . TACL, 4:521--535

work page doi:10.1162/tacl_a_00115 2016
[32]

Liu, Matt Gardner, Yonatan Belinkov, Matthew E

Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019. https://www.aclweb.org/anthology/N19-1112 Linguistic knowledge and transferability of contextual representations . In Proc. of NAACL-HLT , pages 1073--1094, Minneapolis, Minnesota

work page 2019
[33]

Ilya Loshchilov and Frank Hutter. 2019. https://openreview.net/forum?id=Bkg6RiCqY7 Decoupled weight decay regularization . In Proc. of ICLR , New Orleans, LA , USA

work page 2019
[34]

Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz

Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. https://aclanthology.org/J93-2004 Building a large annotated corpus of E nglish: The P enn T reebank . Computational Linguistics, 19(2):313--330

work page 1993
[35]

William Merrill, Yoav Goldberg, Roy Schwartz, and Noah A. Smith. 2021. https://doi.org/10.1162/tacl_a_00412 Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand? TACL, 9:1047--1060

work page doi:10.1162/tacl_a_00412 2021
[36]

Piotr Mirowski and Andreas Vlachos. 2015. https://doi.org/10.3115/v1/P15-2084 Dependency recurrent neural language models for sentence completion . In Proc. of ACL-IJCNLP, pages 511--517, Beijing, China. Association for Computational Linguistics

work page doi:10.3115/v1/p15-2084 2015
[37]

Stefan M \"u ller. 2020. Grammatical theory: From transformational grammar to constraint-based approaches. Language Science Press

work page 2020
[38]

Manning, Ryan McDonald , Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji c , Christopher D. Manning, Ryan McDonald , Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. http://www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf Universal D ependencies v1: a multilingual treebank collection . In Proc. of LREC ,...

work page 2016
[39]

Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Haji c , Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman. 2020. https://aclanthology.org/2020.lrec-1.497 U niversal D ependencies v2: An evergrowing multilingual treebank collection . In Proc. of LREC, pages 4034--4043, Marseille, France. European Langu...

work page 2020
[40]

Stephan Oepen, Omri Abend, Lasha Abzianidze, Johan Bos, Jan Hajic, Daniel Hershcovich, Bin Li, Tim O ' Gorman, Nianwen Xue, and Daniel Zeman. 2020. https://doi.org/10.18653/v1/2020.conll-shared.1 MRP 2020: The second shared task on cross-framework and cross-lingual meaning representation parsing . In Proc. of MRP at CoNLL, pages 1--22, Online. Association...

work page doi:10.18653/v1/2020.conll-shared.1 2020
[41]

Stephan Oepen, Omri Abend, Jan Hajic, Daniel Hershcovich, Marco Kuhlmann, Tim O ' Gorman, Nianwen Xue, Jayeol Chun, Milan Straka, and Zdenka Uresova. 2019. https://doi.org/10.18653/v1/K19-2001 MRP 2019: Cross-framework meaning representation parsing . In Proc. of MRP at CoNLL, pages 1--27, Hong Kong. Association for Computational Linguistics

work page doi:10.18653/v1/k19-2001 2019
[42]

Stephan Oepen and Jan Tore L nning. 2006. http://www.lrec-conf.org/proceedings/lrec2006/pdf/364_pdf.pdf Discriminant-based MRS banking . In Proc. of LREC, Genoa, Italy. European Language Resources Association (ELRA)

work page 2006
[43]

Adam Pauls and Dan Klein. 2012. https://aclanthology.org/P12-1101 Large-scale syntactic language modeling with treelets . In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 959--968, Jeju Island, Korea. Association for Computational Linguistics

work page 2012
[44]

Hao Peng, Roy Schwartz, and Noah A. Smith. 2019. https://doi.org/10.18653/v1/D19-1376 P a LM : A hybrid parser and language model . In Proc. of EMNLP-IJCNLP, pages 3644--3651, Hong Kong, China. Association for Computational Linguistics

work page doi:10.18653/v1/d19-1376 2019
[45]

Jakob Prange, Nathan Schneider, and Omri Abend. 2019 a . https://doi.org/10.18653/v1/K19-1017 Made for each other: Broad-coverage semantic structures meet preposition supersenses . In Proc. of CoNLL , pages 174--185, Hong Kong, China. Association for Computational Linguistics

work page doi:10.18653/v1/k19-1017 2019
[46]

Jakob Prange, Nathan Schneider, and Omri Abend. 2019 b . https://www.aclweb.org/anthology/W19-3319 Semantically constrained multilayer annotation: the case of coreference . In Proc. of DMR , pages 164--176, Florence, Italy

work page 2019
[47]

Peng Qian, Tahira Naseem, Roger Levy, and Ram \'o n Fernandez Astudillo. 2021. https://doi.org/10.18653/v1/2021.acl-long.289 Structural guidance for transformer language models . In Proc. of ACL-IJCNLP, pages 3735--3745, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2021.acl-long.289 2021
[48]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf Language models are unsupervised multitask learners . OpenAI blog

work page 2019
[49]

Stefan Riezler and John T. Maxwell. 2005. https://aclanthology.org/W05-0908 On some pitfalls in automatic evaluation and significance testing for MT . In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , pages 57--64, Ann Arbor, Michigan. Association for Computational Linguistics

work page 2005
[50]

Michael Roth and Mirella Lapata. 2016. https://doi.org/10.18653/v1/P16-1113 Neural semantic role labeling with dependency path embeddings . In Proc. of ACL, pages 1192--1202, Berlin, Germany. Association for Computational Linguistics

work page doi:10.18653/v1/p16-1113 2016
[51]

Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Milo s Stanojevi \' c , Phil Blunsom, and Chris Dyer. 2022. https://arxiv.org/abs/2203.00633 T ransformer G rammars: Augmenting T ransformer language models with syntactic inductive biases at scale . ArXiv: 2203.00633

work page arXiv 2022
[52]

Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. https://link.springer.com/chapter/10.1007\ relational data with graph convolutional networks . In The Semantic Web, pages 593--607, Cham. Springer International Publishing

work page 2018
[53]

Petr Sgall, Eva Haji c ov \'a , and Jarmila Panevov \'a . 1986. The meaning of the sentence and its semantic and pragmatic aspects. academia

work page 1986
[54]

Yikang Shen, Zhouhan Lin, Chin-wei Huang, and Aaron Courville. 2018. https://openreview.net/forum?id=rkgOLb-0W Neural language modeling by jointly learning syntax and lexicon . In Proc. of ICLR

work page 2018
[55]

Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. 2019. https://openreview.net/forum?id=B1l6qiR5F7 Ordered neurons: Integrating tree structures into recurrent neural networks . In Proc. of ICLR

work page 2019
[56]

Aviv Slobodkin, Leshem Choshen, and Omri Abend. 2021. https://arxiv.org/abs/2110.06920 Semantics-aware attention improves neural machine translation . ArXiv:2110.06920

work page arXiv 2021
[57]

Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. 2018. https://doi.org/10.18653/v1/D18-1548 Linguistically-informed self-attention for semantic role labeling . In Proc. of EMNLP, pages 5027--5038, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/d18-1548 2018
[58]

Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, and Noah A. Smith. 2018. http://aclweb.org/anthology/D18-1412 Syntactic scaffolds for semantic structures . In Proc. of EMNLP , pages 3772--3782, Brussels, Belgium

work page 2018
[59]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. https://www.aclweb.org/anthology/P15-1150 Improved semantic representations from tree-structured L ong Short-Term M emory networks . In Proc. of ACL-IJCNLP , pages 1556--1566, Beijing, China

work page 2015
[60]

Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019 a . https://doi.org/10.18653/v1/P19-1452 BERT rediscovers the classical NLP pipeline . In Proc. of ACL, pages 4593--4601, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-1452 2019
[61]

Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Sam Bowman, Dipanjan Das, and Ellie Pavlick. 2019 b . https://openreview.net/forum?id=SJzSgnRcKX What do you learn from context? probing for sentence structure in contextualized word representations . In Proc. of ICLR

work page 2019
[62]

Sean Trott, Tiago Timponi Torrent, Nancy Chang, and Nathan Schneider. 2020. https://doi.org/10.18653/v1/2020.acl-main.462 ( R e)construing M eaning in NLP . In Proc. of ACL, pages 5170--5184, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.acl-main.462 2020
[63]

Gomez, ukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf Attention is all you need . In Proc. of NeurIPS , pages 5998--6008, Long Beach, CA , USA

work page 2017
[64]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. https://openreview.net/forum?id=rJ4km2R5t7 GLUE : A multi-task benchmark and analysis platform for natural language understanding . In Proc. of ICLR

work page 2019
[65]

Zhaofeng Wu, Hao Peng, and Noah A. Smith. 2021. https://doi.org/10.1162/tacl_a_00363 Infusing Finetuning with Semantic Dependencies . TACL, 9:226--242

work page doi:10.1162/tacl_a_00363 2021
[66]

Zhiyong Wu, Yun Chen, Ben Kao, and Qun Liu. 2020. https://doi.org/10.18653/v1/2020.acl-main.383 Perturbed masking: Parameter-free probing for analyzing and interpreting BERT . In Proc. of ACL, pages 4166--4176, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.acl-main.383 2020
[67]

Kaiyu Yang and Jia Deng. 2020. https://proceedings.neurips.cc/paper/2020/file/f7177163c833dff4b38fc8d2872f1ec6-Paper.pdf Strongly incremental constituency parsing with graph neural networks . In Proc. of NeurIPS, volume 33, pages 21687--21698. Curran Associates, Inc

work page 2020
[68]

Zden e k Z abokrtsk \'y , Daniel Zeman, and Magda S ev c \' kov \'a . 2020. https://doi.org/10.1162/coli_a_00385 Sentence meaning representations across languages: What can we learn from existing frameworks? Computational Linguistics, 46(3):605--665

work page doi:10.1162/coli_a_00385 2020

[1] [1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Omri Abend and Ari Rappoport. 2017. http://aclweb.org/anthology/P17-1008 The state of the art in semantic representation . In Proc. of ACL , pages 77--89, Vancouver, Canada

work page 2017

[4] [4]

Jiangang Bai, Yujing Wang, Yiren Chen, Yaming Yang, Jing Bai, Jing Yu, and Yunhai Tong. 2021. https://doi.org/10.18653/v1/2021.eacl-main.262 Syntax- BERT : Improving pre-trained transformers with syntax trees . In Proc. of EACL, pages 3011--3020, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2021.eacl-main.262 2021

[5] [5]

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. http://www.aclweb.org/anthology/W13-2322 Abstract M eaning R epresentation for sembanking . In Proc. of LAW-ID, pages 178--186, Sofia, Bulgaria

work page 2013

[6] [6]

Bender and Alexander Koller

Emily M. Bender and Alexander Koller. 2020. https://doi.org/10.18653/v1/2020.acl-main.463 Climbing towards NLU : On meaning, form, and understanding in the age of data . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185--5198, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.acl-main.463 2020

[7] [7]

Alena B\" o hmov\' a , Jan Haji c , Eva Haji c ov\' a , and Barbora Hladk\' a . 2003. https://doi.org/10.1007/978-94-010-0201-1_7 The P rague D ependency T reebank: A three-level annotation scenario . In Anne Abeill\' e , editor, Treebanks: Building and Using Parsed Corpora, Text, Speech and Language Technology, pages 103--127. Springer Netherlands, Dordrecht

work page doi:10.1007/978-94-010-0201-1_7 2003

[8] [8]

Do Kook Choe and Eugene Charniak. 2016. https://doi.org/10.18653/v1/D16-1257 Parsing as language modeling . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2331--2336, Austin, Texas. Association for Computational Linguistics

work page doi:10.18653/v1/d16-1257 2016

[9] [9]

Leshem Choshen and Omri Abend. 2021. https://arxiv.org/abs/2101.12640 Transition based graph decoder for neural machine translation . ArXiv:2101.12640

work page arXiv 2021

[10] [10]

Ann Copestake, Dan Flickinger, Carl Pollard, and Ivan A Sag. 2005. Minimal recursion semantics: An introduction. Research on language and computation, 3(2):281--332

work page 2005

[11] [11]

Manning, Joakim Nivre, and Daniel Zeman

Marie-Catherine de Marneffe, Christopher D. Manning, Joakim Nivre, and Daniel Zeman. 2021. https://doi.org/10.1162/coli_a_00402 Universal D ependencies . Computational Linguistics, 47(2):255--308

work page doi:10.1162/coli_a_00402 2021

[12] [12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : P re-training of deep bidirectional transformers for language understanding . In Proc. of NAACL-HLT, pages 4171--4186

work page doi:10.18653/v1/n19-1423 2019

[13] [13]

Haim Dubossarsky, Eitan Grossman, and Daphna Weinshall. 2018. https://doi.org/10.18653/v1/D18-1200 Coming to your senses: on controls and evaluation sets in polysemy research . In Proc. of EMNLP, pages 1732--1740, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/d18-1200 2018

[14] [14]

Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. https://www.aclweb.org/anthology/N16-1024 Recurrent N eural N etwork G rammars . In Proc. of NAACL-HLT , pages 199--209, San Diego, CA , USA

work page 2016

[15] [15]

Adam Ek, Jean-Philippe Bernardy, and Shalom Lappin. 2019. https://aclanthology.org/W19-6108 Language modeling with syntactic and semantic representation for sentence acceptability predictions . In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 76--85, Turku, Finland. Link \"o ping University Electronic Press

work page 2019

[16] [16]

Dan Flickinger. 2000. On building a more effcient grammar by exploiting types. Natural Language Engineering, 6(1):15--28

work page 2000

[17] [17]

Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. 2021. https://arxiv.org/abs/2106.02997 Causal abstractions of neural networks . In Proc. of NeurIPS

work page arXiv 2021

[18] [18]

Joseph Gubbins and Andreas Vlachos. 2013. https://aclanthology.org/D13-1143 Dependency language models for sentence completion . In Proc. of EMNLP, pages 1405--1410, Seattle, Washington, USA. Association for Computational Linguistics

work page 2013

[19] [19]

Valerie Hajdik, Jan Buys, Michael Wayne Goodman, and Emily M. Bender. 2019. https://doi.org/10.18653/v1/N19-1235 Neural text generation from rich semantic representations . In Proc. of NAACL-HLT, pages 2259--2266, Minneapolis, Minnesota. Association for Computational Linguistics

work page doi:10.18653/v1/n19-1235 2019

[20] [20]

Jan Haji c , Eva Haji c ov \'a , Jarmila Panevov \'a , Petr Sgall, Ond r ej Bojar, Silvie Cinkov \'a , Eva Fu c \' kov \'a , Marie Mikulov \'a , Petr Pajas, Jan Popelka, Ji r \' Semeck \'y , Jana S indlerov \'a , Jan S t e p \'a nek, Josef Toman, Zde n ka Ure s ov \'a , and Zden e k Z abokrtsk \'y . 2012. http://www.lrec-conf.org/proceedings/lrec2012/pdf/...

work page 2012

[21] [21]

Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux, and Omri Abend. 2020. https://doi.org/10.18653/v1/2020.coling-main.264 Comparison by conversion: Reverse-engineering UCCA from syntax and lexical semantics . In Proc. of COLING, pages 2947--2966, Barcelona, Spain (Online). International Committee on Computational Linguistics

work page doi:10.18653/v1/2020.coling-main.264 2020

[22] [22]

John Hewitt and Percy Liang. 2019. https://doi.org/10.18653/v1/D19-1275 Designing and interpreting probes with control tasks . In Proc. of EMNLP-IJCNLP, pages 2733--2743, Hong Kong, China. Association for Computational Linguistics

work page doi:10.18653/v1/d19-1275 2019

[23] [23]

John Hewitt and Christopher D. Manning. 2019. https://www.aclweb.org/anthology/N19-1419 A structural probe for finding syntax in word representations . In Proc. of NAACL-HLT , pages 4129--4138, Minneapolis, MN , USA

work page 2019

[24] [24]

Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. 2006. http://www.aclweb.org/anthology/N06-2015 OntoNotes : the 90\ In Proc. of HLT-NAACL , pages 57--60, New York City, USA

work page 2006

[25] [25]

Angelina Ivanova, Stephan Oepen, Lilja vrelid, and Dan Flickinger. 2012. https://www.aclweb.org/anthology/W12-3602 Who did what to whom? a contrastive study of syntacto-semantic dependencies . In Proc. of LAW, pages 2--11, Jeju, Republic of Korea. Association for Computational Linguistics

work page 2012

[26] [26]

Bowman, and Ellie Pavlick

Najoung Kim, Roma Patel, Adam Poliak, Patrick Xia, Alex Wang, Tom McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, and Ellie Pavlick. 2019. https://doi.org/10.18653/v1/S19-1026 Probing what different NLP tasks teach machines about function word comprehension . In Proc. of * SEM , pages 235--249, Minneapolis, Minnesota. Ass...

work page doi:10.18653/v1/s19-1026 2019

[27] [27]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N. Kipf and Max Welling. 2017. http://arxiv.org/abs/1609.02907 Semi- Supervised Classification with Graph Convolutional Networks . In Proc. of ICLR

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

Alexander Koller, Stephan Oepen, and Weiwei Sun. 2019. https://doi.org/10.18653/v1/P19-4002 Graph-based meaning representations: Design and processing . In Proc. of ACL: Tutorial Abstracts, pages 6--11, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-4002 2019

[29] [29]

Artur Kulmizev, Vinit Ravishankar, Mostafa Abdou, and Joakim Nivre. 2020. https://doi.org/10.18653/v1/2020.acl-main.375 Do neural language models show preferences for syntactic formalisms? In Proc. of ACL, pages 4077--4091, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.acl-main.375 2020

[30] [30]

Ilia Kuznetsov and Iryna Gurevych. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.13 A matter of framing: T he impact of linguistic formalism on probing results . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 171--182, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.13 2020

[31] [31]

Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. 2016. https://doi.org/10.1162/tacl_a_00115 Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies . TACL, 4:521--535

work page doi:10.1162/tacl_a_00115 2016

[32] [32]

Liu, Matt Gardner, Yonatan Belinkov, Matthew E

Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019. https://www.aclweb.org/anthology/N19-1112 Linguistic knowledge and transferability of contextual representations . In Proc. of NAACL-HLT , pages 1073--1094, Minneapolis, Minnesota

work page 2019

[33] [33]

Ilya Loshchilov and Frank Hutter. 2019. https://openreview.net/forum?id=Bkg6RiCqY7 Decoupled weight decay regularization . In Proc. of ICLR , New Orleans, LA , USA

work page 2019

[34] [34]

Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz

Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. https://aclanthology.org/J93-2004 Building a large annotated corpus of E nglish: The P enn T reebank . Computational Linguistics, 19(2):313--330

work page 1993

[35] [35]

William Merrill, Yoav Goldberg, Roy Schwartz, and Noah A. Smith. 2021. https://doi.org/10.1162/tacl_a_00412 Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand? TACL, 9:1047--1060

work page doi:10.1162/tacl_a_00412 2021

[36] [36]

Piotr Mirowski and Andreas Vlachos. 2015. https://doi.org/10.3115/v1/P15-2084 Dependency recurrent neural language models for sentence completion . In Proc. of ACL-IJCNLP, pages 511--517, Beijing, China. Association for Computational Linguistics

work page doi:10.3115/v1/p15-2084 2015

[37] [37]

Stefan M \"u ller. 2020. Grammatical theory: From transformational grammar to constraint-based approaches. Language Science Press

work page 2020

[38] [38]

Manning, Ryan McDonald , Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji c , Christopher D. Manning, Ryan McDonald , Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. http://www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf Universal D ependencies v1: a multilingual treebank collection . In Proc. of LREC ,...

work page 2016

[39] [39]

Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Haji c , Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, and Daniel Zeman. 2020. https://aclanthology.org/2020.lrec-1.497 U niversal D ependencies v2: An evergrowing multilingual treebank collection . In Proc. of LREC, pages 4034--4043, Marseille, France. European Langu...

work page 2020

[40] [40]

Stephan Oepen, Omri Abend, Lasha Abzianidze, Johan Bos, Jan Hajic, Daniel Hershcovich, Bin Li, Tim O ' Gorman, Nianwen Xue, and Daniel Zeman. 2020. https://doi.org/10.18653/v1/2020.conll-shared.1 MRP 2020: The second shared task on cross-framework and cross-lingual meaning representation parsing . In Proc. of MRP at CoNLL, pages 1--22, Online. Association...

work page doi:10.18653/v1/2020.conll-shared.1 2020

[41] [41]

Stephan Oepen, Omri Abend, Jan Hajic, Daniel Hershcovich, Marco Kuhlmann, Tim O ' Gorman, Nianwen Xue, Jayeol Chun, Milan Straka, and Zdenka Uresova. 2019. https://doi.org/10.18653/v1/K19-2001 MRP 2019: Cross-framework meaning representation parsing . In Proc. of MRP at CoNLL, pages 1--27, Hong Kong. Association for Computational Linguistics

work page doi:10.18653/v1/k19-2001 2019

[42] [42]

Stephan Oepen and Jan Tore L nning. 2006. http://www.lrec-conf.org/proceedings/lrec2006/pdf/364_pdf.pdf Discriminant-based MRS banking . In Proc. of LREC, Genoa, Italy. European Language Resources Association (ELRA)

work page 2006

[43] [43]

Adam Pauls and Dan Klein. 2012. https://aclanthology.org/P12-1101 Large-scale syntactic language modeling with treelets . In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 959--968, Jeju Island, Korea. Association for Computational Linguistics

work page 2012

[44] [44]

Hao Peng, Roy Schwartz, and Noah A. Smith. 2019. https://doi.org/10.18653/v1/D19-1376 P a LM : A hybrid parser and language model . In Proc. of EMNLP-IJCNLP, pages 3644--3651, Hong Kong, China. Association for Computational Linguistics

work page doi:10.18653/v1/d19-1376 2019

[45] [45]

Jakob Prange, Nathan Schneider, and Omri Abend. 2019 a . https://doi.org/10.18653/v1/K19-1017 Made for each other: Broad-coverage semantic structures meet preposition supersenses . In Proc. of CoNLL , pages 174--185, Hong Kong, China. Association for Computational Linguistics

work page doi:10.18653/v1/k19-1017 2019

[46] [46]

Jakob Prange, Nathan Schneider, and Omri Abend. 2019 b . https://www.aclweb.org/anthology/W19-3319 Semantically constrained multilayer annotation: the case of coreference . In Proc. of DMR , pages 164--176, Florence, Italy

work page 2019

[47] [47]

Peng Qian, Tahira Naseem, Roger Levy, and Ram \'o n Fernandez Astudillo. 2021. https://doi.org/10.18653/v1/2021.acl-long.289 Structural guidance for transformer language models . In Proc. of ACL-IJCNLP, pages 3735--3745, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2021.acl-long.289 2021

[48] [48]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf Language models are unsupervised multitask learners . OpenAI blog

work page 2019

[49] [49]

Stefan Riezler and John T. Maxwell. 2005. https://aclanthology.org/W05-0908 On some pitfalls in automatic evaluation and significance testing for MT . In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , pages 57--64, Ann Arbor, Michigan. Association for Computational Linguistics

work page 2005

[50] [50]

Michael Roth and Mirella Lapata. 2016. https://doi.org/10.18653/v1/P16-1113 Neural semantic role labeling with dependency path embeddings . In Proc. of ACL, pages 1192--1202, Berlin, Germany. Association for Computational Linguistics

work page doi:10.18653/v1/p16-1113 2016

[51] [51]

Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Milo s Stanojevi \' c , Phil Blunsom, and Chris Dyer. 2022. https://arxiv.org/abs/2203.00633 T ransformer G rammars: Augmenting T ransformer language models with syntactic inductive biases at scale . ArXiv: 2203.00633

work page arXiv 2022

[52] [52]

Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling

Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. https://link.springer.com/chapter/10.1007\ relational data with graph convolutional networks . In The Semantic Web, pages 593--607, Cham. Springer International Publishing

work page 2018

[53] [53]

Petr Sgall, Eva Haji c ov \'a , and Jarmila Panevov \'a . 1986. The meaning of the sentence and its semantic and pragmatic aspects. academia

work page 1986

[54] [54]

Yikang Shen, Zhouhan Lin, Chin-wei Huang, and Aaron Courville. 2018. https://openreview.net/forum?id=rkgOLb-0W Neural language modeling by jointly learning syntax and lexicon . In Proc. of ICLR

work page 2018

[55] [55]

Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. 2019. https://openreview.net/forum?id=B1l6qiR5F7 Ordered neurons: Integrating tree structures into recurrent neural networks . In Proc. of ICLR

work page 2019

[56] [56]

Aviv Slobodkin, Leshem Choshen, and Omri Abend. 2021. https://arxiv.org/abs/2110.06920 Semantics-aware attention improves neural machine translation . ArXiv:2110.06920

work page arXiv 2021

[57] [57]

Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. 2018. https://doi.org/10.18653/v1/D18-1548 Linguistically-informed self-attention for semantic role labeling . In Proc. of EMNLP, pages 5027--5038, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/d18-1548 2018

[58] [58]

Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, and Noah A. Smith. 2018. http://aclweb.org/anthology/D18-1412 Syntactic scaffolds for semantic structures . In Proc. of EMNLP , pages 3772--3782, Brussels, Belgium

work page 2018

[59] [59]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. https://www.aclweb.org/anthology/P15-1150 Improved semantic representations from tree-structured L ong Short-Term M emory networks . In Proc. of ACL-IJCNLP , pages 1556--1566, Beijing, China

work page 2015

[60] [60]

Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019 a . https://doi.org/10.18653/v1/P19-1452 BERT rediscovers the classical NLP pipeline . In Proc. of ACL, pages 4593--4601, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-1452 2019

[61] [61]

Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Sam Bowman, Dipanjan Das, and Ellie Pavlick. 2019 b . https://openreview.net/forum?id=SJzSgnRcKX What do you learn from context? probing for sentence structure in contextualized word representations . In Proc. of ICLR

work page 2019

[62] [62]

Sean Trott, Tiago Timponi Torrent, Nancy Chang, and Nathan Schneider. 2020. https://doi.org/10.18653/v1/2020.acl-main.462 ( R e)construing M eaning in NLP . In Proc. of ACL, pages 5170--5184, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.acl-main.462 2020

[63] [63]

Gomez, ukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf Attention is all you need . In Proc. of NeurIPS , pages 5998--6008, Long Beach, CA , USA

work page 2017

[64] [64]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. https://openreview.net/forum?id=rJ4km2R5t7 GLUE : A multi-task benchmark and analysis platform for natural language understanding . In Proc. of ICLR

work page 2019

[65] [65]

Zhaofeng Wu, Hao Peng, and Noah A. Smith. 2021. https://doi.org/10.1162/tacl_a_00363 Infusing Finetuning with Semantic Dependencies . TACL, 9:226--242

work page doi:10.1162/tacl_a_00363 2021

[66] [66]

Zhiyong Wu, Yun Chen, Ben Kao, and Qun Liu. 2020. https://doi.org/10.18653/v1/2020.acl-main.383 Perturbed masking: Parameter-free probing for analyzing and interpreting BERT . In Proc. of ACL, pages 4166--4176, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.acl-main.383 2020

[67] [67]

Kaiyu Yang and Jia Deng. 2020. https://proceedings.neurips.cc/paper/2020/file/f7177163c833dff4b38fc8d2872f1ec6-Paper.pdf Strongly incremental constituency parsing with graph neural networks . In Proc. of NeurIPS, volume 33, pages 21687--21698. Curran Associates, Inc

work page 2020

[68] [68]

Zden e k Z abokrtsk \'y , Daniel Zeman, and Magda S ev c \' kov \'a . 2020. https://doi.org/10.1162/coli_a_00385 Sentence meaning representations across languages: What can we learn from existing frameworks? Computational Linguistics, 46(3):605--665

work page doi:10.1162/coli_a_00385 2020