arxiv: 2605.02608 · v1 · submitted 2026-05-04 · 💻 cs.CL · cs.AI· cs.LG

Recognition: 3 theorem links

· Lean Theorem

Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages

Kevin Guan , Happy Buzaaba , Christiane Fellbaum

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:33 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords dependency parsinglow-resource languagesBiaffine LSTMtransformer modelsmorphological complexityAfrican languagestreebanks

0 comments

The pith

Biaffine LSTM outperforms transformers for dependency parsing in low-resource regimes until data volume reaches a moderate threshold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares four dependency parsers across ten typologically diverse languages, with emphasis on low-resource African languages. It shows that the Biaffine LSTM architecture delivers higher accuracy than transformer models when annotated training data is limited, yet transformers regain the lead once data volume increases. The switchover occurs at corpus sizes commonly seen in under-resourced treebanks. Morphological complexity, quantified by the MATTR metric, further widens the performance gap for transformers even after corpus size is controlled. These patterns offer direct guidance on architecture selection for syntactic tools in data-scarce settings.

Core claim

The Biaffine LSTM consistently outperforms transformer models in low-resource regimes, with transformers recovering their advantage as training data increases; the crossover falls within a resource range typical of treebanks for under-resourced languages, and morphological complexity measured via MATTR emerges as a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size.

What carries the argument

Direct head-to-head evaluation of the Biaffine LSTM and Stack-Pointer Network against pre-trained transformers (AfroXLMR-large and RemBERT) on controlled subsets of training data, with MATTR serving as the measure of morphological complexity.

If this is right

The Biaffine LSTM is the better choice for building syntactic tools when annotated data is scarce.
Transformers become preferable once treebank size exceeds the typical range for under-resourced languages.
Morphological complexity remains an independent factor that favors simpler LSTM parsers.
Resource-aware parser selection can improve parsing accuracy for languages with small treebanks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same data-size crossover may appear in other structured prediction tasks such as semantic role labeling or named-entity recognition.
Targeted data collection for morphologically complex languages could accelerate the point at which transformers become viable.
Hybrid systems that switch between LSTM and transformer backbones based on available data volume are worth testing.

Load-bearing premise

The ten chosen languages, especially the low-resource African ones, represent broader low-resource conditions, and the MATTR metric isolates morphological complexity independently of data quality or annotation consistency.

What would settle it

Repeating the experiments on a fresh set of low-resource languages while systematically varying training-set size and morphological complexity to check whether the same performance crossover and MATTR correlation appear.

Figures

Figures reproduced from arXiv: 2605.02608 by Christiane Fellbaum, Happy Buzaaba, Kevin Guan.

**Figure 1.** Figure 1: RER vs. sentence count. RER < 0 indicates fewer errors than Biaffine LSTM baseline. Model Metric log10(train) Sents. AfroXLMR-large LAS 2.923 838 RemBERT LAS 3.142 1,388 AfroXLMR-large UAS 2.918 829 RemBERT UAS 3.127 1,339 view at source ↗

**Figure 3.** Figure 3: Partial regression of RERLAS on MATTR for RemBERT (top) and AfroXLMR-large (bottom), after residualizing both variables for training size. Partial regression plots for RERUAS are omitted for brevity, although the relationship of RERUAS with morphological complexity is consistent with that of RERLAS and leads to the same conclusions. bank’s annotation conventions requires sufficient supervision. With littl… view at source ↗

read the original abstract

Transformer-based models achieve state-of-the-art dependency parsing for high-resource languages, yet their advantage over simpler architectures in low-resource settings remains poorly understood. We evaluate four parsers -- the Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT -- across ten typologically diverse languages, with a focus on low-resource African languages. We find that the Biaffine LSTM consistently outperforms transformer models in low-resource regimes, with transformers recovering their advantage as training data increases. The crossover falls within a resource range typical of treebanks for under-resourced languages. Morphological complexity (measured via MATTR) emerges as a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size. These results indicate that the Biaffine LSTM may be better suited for syntactic tool development in low-resource regimes until sufficient annotated data is available to leverage the representational capacity of pre-trained transformers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The scaling curves show LSTM parsers holding an edge on small treebanks until a practical crossover point, but the MATTR claim as a morphological predictor does not hold up.

read the letter

The main result worth knowing is that the Biaffine LSTM outperforms the tested transformer models on the smallest training sets for these languages, with the advantage flipping at data volumes that match many real low-resource treebanks. That crossover observation is the actionable piece. The paper runs a controlled scaling experiment across ten typologically varied languages, with extra attention to low-resource African cases, and tracks how performance changes as training data grows. This produces clear curves rather than isolated snapshots, which is the part that adds value over prior single-point comparisons. The choice of models is reasonable and the focus on under-resourced settings fills a gap that matters for tool-building. The secondary analysis tries to tie the transformer disadvantage to morphological complexity via MATTR after controlling for corpus size. That step is the soft spot. MATTR is a lexical-diversity measure based on moving-average type-token ratios; it does not directly capture inflectional patterns, paradigm size, or other standard markers of morphological complexity. The paper offers no separate validation that MATTR isolates morphology from vocabulary effects, data quality, or annotation consistency, so the explanatory claim stays loose even if the raw performance numbers reproduce. The main empirical comparison itself looks clean enough to stand on its own. This work is for people who need to pick a parser architecture when annotated data is scarce, especially in low-resource NLP pipelines. A reader in that position can extract a usable rule of thumb from the scaling results. It deserves peer review because the core finding is reproducible in principle and has direct implications for practice, though referees should press on the metric interpretation and ask for more detail on hyperparameter controls and statistical tests.

Referee Report

1 major / 2 minor

Summary. The manuscript evaluates four dependency parsing architectures—the Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT—across ten typologically diverse languages, emphasizing low-resource African languages. It reports that the Biaffine LSTM outperforms the transformer models in low-resource regimes, that transformers recover their advantage as training data increases, that the performance crossover occurs within a resource range typical of under-resourced treebanks, and that morphological complexity (measured via MATTR) is a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size. These findings are used to recommend the Biaffine LSTM for syntactic tool development in low-resource settings until sufficient data is available.

Significance. If the primary performance curves and crossover point are reproducible, the work provides practical guidance for architecture selection in low-resource dependency parsing, a topic of direct relevance to NLP tool development for under-resourced languages. The empirical focus on a resource spectrum and typologically diverse set (including African languages) adds value beyond single-language studies. The secondary MATTR-based predictor, however, requires stronger justification to support the mechanistic interpretation offered in the abstract.

major comments (1)

[Results and discussion of secondary predictors] The claim that MATTR measures morphological complexity and serves as a significant secondary predictor (abstract and results/discussion) is not adequately supported. MATTR is a standard lexical-diversity metric (moving-average type-token ratio), while morphological complexity is conventionally quantified by inflectional entropy, paradigm size, or average morphemes per word. The manuscript provides no validation that the regression isolates morphological effects from lexical diversity, annotation consistency, or data quality. Because this predictor is presented as explanatory support for the observed crossover and the recommendation for Biaffine LSTM, the mechanistic interpretation is insecure.

minor comments (2)

[Methods] Provide explicit details on data splits, hyperparameter search procedures, statistical significance tests for performance differences, and any controls for model size or pretraining corpus overlap to allow full reproducibility of the primary comparisons.
[Results] Clarify the exact regression model, included covariates, and reported coefficients or p-values for the MATTR analysis so readers can assess the strength of the secondary finding independently.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the interpretation and justification of MATTR below. The primary empirical findings on parser performance across resource levels and the recommended use of the Biaffine LSTM in low-resource settings are unaffected by this revision.

read point-by-point responses

Referee: The claim that MATTR measures morphological complexity and serves as a significant secondary predictor (abstract and results/discussion) is not adequately supported. MATTR is a standard lexical-diversity metric (moving-average type-token ratio), while morphological complexity is conventionally quantified by inflectional entropy, paradigm size, or average morphemes per word. The manuscript provides no validation that the regression isolates morphological effects from lexical diversity, annotation consistency, or data quality. Because this predictor is presented as explanatory support for the observed crossover and the recommendation for Biaffine LSTM, the mechanistic interpretation is insecure.

Authors: We acknowledge the referee's point that MATTR is conventionally a lexical-diversity metric rather than a direct measure of morphological complexity (such as inflectional entropy or paradigm size). The manuscript's phrasing in the abstract and discussion does overstate the direct link. In the revised version we will (1) replace the parenthetical claim with language describing MATTR as a lexical-diversity proxy that correlates with morphological richness in the languages studied, (2) add explicit discussion of the regression controls (corpus size already included as a covariate) and the limitations of this proxy, and (3) tone down the mechanistic interpretation to note that MATTR captures a secondary signal whose precise causal contribution requires further validation with dedicated morphological metrics. These changes will be made in the abstract, results, and discussion sections. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical measurements on held-out data

full rationale

The paper reports performance comparisons of four parsers across ten languages using standard train/dev/test splits and regression to identify predictors. All reported advantages, crossovers, and secondary correlations (including MATTR) are measured outcomes from observed data rather than quantities defined by the analysis itself or reduced to fitted inputs by construction. No derivations, uniqueness theorems, ansatzes, or self-citations are invoked as load-bearing steps in any claimed chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim is supported entirely by empirical measurements of parser accuracy across languages and data regimes; no free parameters, axioms, or invented entities are introduced to derive the result.

pith-pipeline@v0.9.0 · 5457 in / 1227 out tokens · 50717 ms · 2026-05-08T19:33:04.849919+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define: RER_LAS(ℓ) = (LAS_Biaffine LSTM(ℓ) − LAS_TF(ℓ)) / (100 − LAS_Biaffine LSTM(ℓ))
Foundation.AlphaCoordinateFixation alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AfroXLMR-large reaches the crossover at around 830 sentences, while RemBERT requires approximately 1,340–1,390.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 47 canonical work pages · 2 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[2]

Publications Manual , year = "1983", publisher =

1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[5]

Dan Gusfield , title =. 1997

1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[7]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
[8]

Can T ype- T oken R atio be U sed to S how M orphological C omplexity of L anguages?

Kimmo Kettunen. Can T ype- T oken R atio be U sed to S how M orphological C omplexity of L anguages?. Journal of Quantitative Linguistics. 2014. doi:10.1080/09296174.2014.911506

work page doi:10.1080/09296174.2014.911506 2014
[9]

A C omparison B etween M orphological C omplexity M easures: T ypological D ata vs

Christian Bentz and Tatyana Ruzsics and Alexander Koplenig and Tanja Samard z i \'c. A C omparison B etween M orphological C omplexity M easures: T ypological D ata vs. L anguage C orpora. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity ( CL 4 LC ). 2016

2016
[10]

Comparing learnability of two dependency schemes: `semantic' ( UD ) and `syntactic' ( SUD )

Tuora, Ryszard and Przepi \'o rkowski, Adam and Leczkowski, Aleksander. Comparing learnability of two dependency schemes: `semantic' ( UD ) and `syntactic' ( SUD ). Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.256

work page doi:10.18653/v1/2021.findings-emnlp.256 2021
[11]

A fri B ooms: An Online Treebank for A frikaans

Augustinus, Liesbeth and Dirix, Peter and van Niekerk, Daniel and Schuurman, Ineke and Vandeghinste, Vincent and Van Eynde, Frank and van Huyssteen, Gerhard. A fri B ooms: An Online Treebank for A frikaans. Proceedings of the Tenth International Conference on Language Resources and Evaluation ( LREC '16). 2016

2016
[12]

Covington and Joe D

Michael A. Covington and Joe D. McFall. Cutting the Gordian knot: T he moving-average type-token ratio ( MATTR ). Journal of Quantitative Linguistics. 2010. doi:10.1080/09296171003643098

work page doi:10.1080/09296171003643098 2010
[13]

Dependency Parsing

Sandra K \"u bler and Ryan McDonald and Joakim Nivre. Dependency Parsing. 2009. doi:10.1007/978-3-031-02131-2

work page doi:10.1007/978-3-031-02131-2 2009
[14]

Efficient Second-Order T ree CRF for Neural Dependency Parsing

Yu Zhang and Zhenghua Li and Min Zhang. Efficient Second-Order T ree CRF for Neural Dependency Parsing. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.302

work page doi:10.18653/v1/2020.acl-main.302 2020
[15]

Constraint Grammar: A Language-Independent Framework for Parsing Unrestricted Text

Fred Karlsson and Atro Voutilainen and Juha Heikkilä and Arto Anttila. Constraint Grammar: A Language-Independent Framework for Parsing Unrestricted Text. 1995

1995
[16]

A Very Nice Paper To Cite

Firstname1 Lastname1 and Firstname2 Lastname2. A Very Nice Paper To Cite. International Symposium on Computer Architecture. 2000

2000
[17]

An Efficient Algorithm for Projective Dependency Parsing

Nivre, Joakim. An Efficient Algorithm for Projective Dependency Parsing. Proceedings of the Eighth International Conference on Parsing Technologies. 2003

2003
[18]

Non-Projective Dependency Parsing using Spanning Tree Algorithms

McDonald, Ryan and Pereira, Fernando and Ribarov, Kiril and Haji c , Jan. Non-Projective Dependency Parsing using Spanning Tree Algorithms. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 2005

2005
[19]

A Fast and Accurate Dependency Parser using Neural Networks

Chen, Danqi and Manning, Christopher. A Fast and Accurate Dependency Parser using Neural Networks. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. doi:10.3115/v1/D14-1082

work page doi:10.3115/v1/d14-1082 2014
[20]

Neural Computation 9(8), 1735–1780 (1997)

Hochreiter, Sepp and Schmidhuber, J \"u rgen. Long Short-Term Memory. Neural Computation. 1997. doi:10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997
[21]

and Kaiser, Lukasz and Polosukhin, Illia

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017

2017
[22]

75 Languages, 1 Model: Parsing Universal Dependencies Universally

Kondratyuk, Dan and Straka, Milan. 75 Languages, 1 Model: Parsing Universal Dependencies Universally. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1279

work page doi:10.18653/v1/d19-1279 2019
[23]

Self-attentive Biaffine Dependency Parsing

Li, Ying and Li, Zhenghua and Zhang, Min and Wang, Rui and Li, Sheng and Si, Luo. Self-attentive Biaffine Dependency Parsing. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). 2019

2019
[24]

Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary

Grünewald, Stefan and Friedrich, Annemarie and Kuhn, Jonas. Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary. Proceedings of the 17th International Conference on Parsing Technologies (IWPT 2021). 2021

2021
[25]

On the importance of pre-training data volume for compact language models

Micheli, Vincent and d'Hoffschmidt, Martin and Fleuret, Fran c ois. On the importance of pre-training data volume for compact language models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.632

work page doi:10.18653/v1/2020.emnlp-main.632 2020
[26]

Learning Which Features Matter: R o BERT a Acquires a Preference for Linguistic Generalizations (Eventually)

Warstadt, Alex and Zhang, Yian and Li, Xiaocheng and Liu, Haokun and Bowman, Samuel R. Learning Which Features Matter: R o BERT a Acquires a Preference for Linguistic Generalizations (Eventually). Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.16

work page doi:10.18653/v1/2020.emnlp-main.16 2020
[27]

Natural language processing applications for low-resource languages

Pakray, Partha and Gelbukh, Alexander and Bandyopadhyay, Sivaji. Natural language processing applications for low-resource languages. Natural Language Processing. 2025. doi:10.1017/nlp.2024.33

work page doi:10.1017/nlp.2024.33 2025
[28]

A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing

Jha, Prabhat Kumar and Kumar, Rajesh and Sahula, Vikram. A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing. ACM Transactions on Asian and Low-Resource Language Information Processing. 2020. doi:10.1145/3383772

work page doi:10.1145/3383772 2020
[29]

Cross-lingual dependency parsing for a language with a unique script

Zhou, He and Dakota, Daniel and K \"u bler, Sandra. Cross-lingual dependency parsing for a language with a unique script. Natural Language Processing. 2025. doi:10.1017/nlp.2024.21

work page doi:10.1017/nlp.2024.21 2025
[30]

Extending Multilingual BERT to Low-Resource Languages

Wang, Zihan and K, Karthikeyan and Mayhew, Stephen and Roth, Dan. Extending Multilingual BERT to Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.240

work page doi:10.18653/v1/2020.findings-emnlp.240 2020
[31]

Are All Languages Created Equal in Multilingual BERT ?

Wu, Shijie and Dredze, Mark. Are All Languages Created Equal in Multilingual BERT ?. arXiv preprint arXiv:2005.09093. 2020

work page arXiv 2005
[32]

M icro BERT : Effective Training of Low-resource Monolingual BERT s through Parameter Reduction and Multitask Learning

Gessler, Luke and Zeldes, Amir. M icro BERT : Effective Training of Low-resource Monolingual BERT s through Parameter Reduction and Multitask Learning. Proceedings of the 2nd Workshop on Multi-lingual Representation Learning (MRL). 2022. doi:10.18653/v1/2022.mrl-1.9

work page doi:10.18653/v1/2022.mrl-1.9 2022
[33]

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN - T ransformer Architectures

Lindenmaier, Gabriel and Papay, Sean and Pad \'o , Sebastian. Efficient Language Modeling for Low-Resource Settings with Hybrid RNN - T ransformer Architectures. arXiv preprint arXiv:2502.00617. 2025

work page arXiv 2025
[34]

The Importance of Being Recurrent for Modeling Hierarchical Structure

Tran, Ke and Bisazza, Arianna and Monz, Christof. The Importance of Being Recurrent for Modeling Hierarchical Structure. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1503

work page doi:10.18653/v1/d18-1503 2018
[35]

Universal Transformers

Dehghani, Mostafa and Gouws, Stephan and Vinyals, Oriol and Uszkoreit, Jakob and Kaiser, Łukasz. Universal Transformers. arXiv preprint arXiv:1807.03819. 2018

work page internal anchor Pith review arXiv 2018
[36]

and Hinton, Geoffrey E

Sutskever, Ilya and Martens, James and Dahl, George E. and Hinton, Geoffrey E. On the Importance of Initialization and Momentum in Deep Learning. Proceedings of the 30th International Conference on Machine Learning. 2013

2013
[37]

Thomas and Frank, Robert and Linzen, Tal

McCoy, R. Thomas and Frank, Robert and Linzen, Tal. Does Syntax Need to Grow on Trees? S ources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks. Transactions of the Association for Computational Linguistics. 2020. doi:10.1162/tacl_a_00304

work page doi:10.1162/tacl_a_00304 2020
[38]

Dodge, G

Dodge, Jesse and Ilharco, Gabriel and Schwartz, Roy and Farhadi, Ali and Hajishirzi, Hannaneh and Smith, Noah A. Fine-Tuning Pretrained Language Models: W eight Initializations, Data Orders, and Early Stopping. arXiv preprint arXiv:2002.06305. 2020

work page arXiv 2002
[39]

Constraints on Non-Projective Dependency Parsing

Nivre, Joakim. Constraints on Non-Projective Dependency Parsing. 11th Conference of the E uropean Chapter of the Association for Computational Linguistics. 2006

2006
[40]

and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel

Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and Haji c , Jan and Manning, Christopher D. and Pyysalo, Sampo and Schuster, Sebastian and Tyers, Francis and Zeman, Daniel. U niversal D ependencies v2: An Evergrowing Multilingual Treebank Collection. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020

2020
[41]

SUD or Surface-Syntactic U niversal D ependencies: An annotation scheme near-isomorphic to UD

Gerdes, Kim and Guillaume, Bruno and Kahane, Sylvain and Perrier, Guy. SUD or Surface-Syntactic U niversal D ependencies: An annotation scheme near-isomorphic to UD. Proceedings of the Second Workshop on Universal Dependencies ( UDW 2018). 2018. doi:10.18653/v1/W18-6008

work page doi:10.18653/v1/w18-6008 2018
[42]

Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) , year=

Learning Word Vectors for 157 Languages , author=. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) , year=

2018
[43]

2009 , publisher =

Hajič, Jan and Smrž, Otakar and Zemánek, Petr and Pajas, Petr and Šnaidauf, Jan and Beška, Emanuel and Kráčmar, Jakub and Hassanová, Kamila , title =. 2009 , publisher =

2009
[44]

, journal =

Probert, Tracy N. , journal =. A comparison of the early reading strategies of. 2019 , doi =

2019
[45]

Linguistic Typology , volume =

The world's simplest grammars are creole grammars , author =. Linguistic Typology , volume =. 2001 , doi =

2001
[46]

, year =

Donaldson, Bruce C. , year =. A Grammar of
[47]

Findings of the Association for Computational Linguistics: NAACL 2024 , month = jun, year =

Low-resource neural machine translation with morphological modeling , author =. Findings of the Association for Computational Linguistics: NAACL 2024 , month = jun, year =. doi:10.18653/v1/2024.findings-naacl.13 , pages =

work page doi:10.18653/v1/2024.findings-naacl.13 2024
[48]

2021 , publisher =

The Oxford History of Romanian Morphology , author =. 2021 , publisher =

2021
[49]

Traitement Automatique des Langues

Un corpus arboré pour le français : le F rench T reebank [A parsed corpus for F rench: the F rench treebank ]. Traitement Automatique des Langues. 2019

2019
[50]

Linguistic Data Retrievable from a Treebank

Barbu Mititelu, Verginica and Irimia, Elena. Linguistic Data Retrievable from a Treebank. Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016). 2016

2016
[51]

Ma, Xuezhe , title =
[52]

2024 , note =

Zhang, Yu , title =. 2024 , note =

2024
[53]

Universal Dependencies

Universal Dependencies Contributors. Universal Dependencies. 2025

2025
[54]

Developing U niversal D ependencies for W olof

Dione, Cheikh Bamba. Developing U niversal D ependencies for W olof. Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019). 2019. doi:10.18653/v1/W19-8003

work page doi:10.18653/v1/w19-8003 2019
[55]

How Multilingual is Multilingual BERT ?

Pires, Telmo and Schlinger, Eva and Garrette, Dan. How multilingual is Multilingual BERT?. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1493

work page doi:10.18653/v1/p19-1493 2019
[56]

Unsupervised Cross-lingual Representation Learning at Scale

Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm \'a n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin. Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ...

work page doi:10.18653/v1/2020.acl-main.747 2020
[57]

Rethinking Embedding Coupling in Pre-trained Language Models

Chung, Hyung Won and Fevry, Thibault and Tsai, Henry and Johnson, Melvin and Ruder, Sebastian. Rethinking Embedding Coupling in Pre-trained Language Models. arXiv preprint arXiv:2010.12821. 2020

work page arXiv 2010
[58]

Nekoto, Wilhelmina and Marivate, Vukosi and Matsila, Tshinondiwa and Fasubaa, Timi and Fagbohungbe, Taiwo and Akinola, Solomon Oluwole and Muhammad, Shamsuddeen and Kabongo Kabenamualu, Salomon and Osei, Salomey and Sackey, Freshia and Niyongabo, Rubungo Andre and Macharm, Ricky and Ogayo, Perez and Ahia, Orevaoghene and Berhe, Musie Meressa and Adeyemi, ...

work page doi:10.18653/v1/2020.findings-emnlp.195 2020
[59]

Lost in the Middle: How Language Models Use Long Contexts

Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 2017. doi:10.1162/tacl\_a\_00051

work page internal anchor Pith review doi:10.1162/tacl 2017
[60]

SALT -31: A Machine Translation Benchmark Dataset for 31 U gandan Languages

Nsumba, Solomon and Akera, Benjamin and Ouma, Evelyn Nafula and Ssentanda, Medadi and Kawalya, Deo and Bainomugisha, Engineer and Mwebaze, Ernest Tonny and Quinn, John. SALT -31: A Machine Translation Benchmark Dataset for 31 U gandan Languages. Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026). 2026

2026
[61]

I bom NLP : A Step Toward Inclusive Natural Language Processing for N igeria's Minority Languages

Kalejaiye, Oluwadara and Beyene, Luel Hagos and Adelani, David Ifeoluwa and Edet, Mmekut-mfon Gabriel and Akpan, Aniefon Daniel and Urua, Eno-Abasi and Andy, Anietie. I bom NLP : A Step Toward Inclusive Natural Language Processing for N igeria's Minority Languages. Proceedings of the 14th International Joint Conference on Natural Language Processing and t...

2025
[62]

and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich

Alabi, Jesujoba O. and Adelani, David Ifeoluwa and Mosbach, Marius and Klakow, Dietrich. Adapting Pre-trained Language Models to A frican Languages via Multilingual Adaptive Fine-Tuning. Proceedings of the 29th International Conference on Computational Linguistics. 2022

2022
[63]

and Muhammad, Shamsuddeen H

Adelani, David Ifeoluwa and Neubig, Graham and Ruder, Sebastian and Rijhwani, Shruti and Beukman, Michael and Palen-Michel, Chester and Lignos, Constantine and Alabi, Jesujoba O. and Muhammad, Shamsuddeen H. and Nabende, Peter and Dione, Cheikh M. Bamba and Bukula, Andiswa and Mabuya, Rooweither and Dossou, Bonaventure F. P. and Sibanda, Blessing and Buza...

work page doi:10.18653/v1/2022.emnlp-main.298 2022
[64]

Dione, Cheikh M. Bamba and Adelani, David Ifeoluwa and Nabende, Peter and Alabi, Jesujoba and Sindane, Thapelo and Buzaaba, Happy and Muhammad, Shamsuddeen Hassan and Emezue, Chris Chinenye and Ogayo, Perez and Aremu, Anuoluwapo and Gitau, Catherine and Mbaye, Derguene and Mukiibi, Jonathan and Sibanda, Blessing and Dossou, Bonaventure F. P. and Bukula, A...

work page doi:10.18653/v1/2023.acl-long.609 2023
[65]

Adelani, David Ifeoluwa and Masiak, Marek and Azime, Israel Abebe and Alabi, Jesujoba and Tonja, Atnafu Lambebo and Mwase, Christine and Ogundepo, Odunayo and Dossou, Bonaventure F. P. and Oladipo, Akintunde and Nixdorf, Doreen and Emezue, Chris Chinenye and Al-azzawi, Sana and Sibanda, Blessing and David, Davis and Ndolela, Lolwethu and Mukiibi, Jonathan...

work page doi:10.18653/v1/2023.ijcnlp-main.10 2023
[66]

AfriBERTa: Towards Viable Multilingual Language Models for Low-Resource Languages

Ogueji, Kelechi. AfriBERTa: Towards Viable Multilingual Language Models for Low-Resource Languages. 2022

2022
[67]

Muhammad, Shamsuddeen Hassan and Abdulmumin, Idris and Ayele, Abinew Ali and Ousidhoum, Nedjma and Adelani, David Ifeoluwa and Yimam, Seid Muhie and Ahmad, Ibrahim Sa'id and Beloucif, Meriem and Mohammad, Saif M. and Ruder, Sebastian and Hourrane, Oumaima and Brazdil, Pavel and Jorge, Alipio and Ali, Felermino D \'a rio M \'a rio Ant \'o nio and David, Da...

work page doi:10.18653/v1/2023.emnlp-main.862 2023
[68]

AfricaNLP Resources

Adelani, David Ifeoluwa. AfricaNLP Resources. 2022

2022
[69]

arXiv preprint arXiv:2307.13405 , year=

Bella, G \'a bor and others. Towards Bridging the Digital Language Divide. arXiv preprint arXiv:2307.13405. 2023

work page arXiv 2023
[70]

Natural Language Processing for African Languages

Adelani, David Ifeoluwa. Natural Language Processing for African Languages. arXiv preprint arXiv:2507.00297. 2025

work page arXiv 2025
[71]

BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v1/N19-1423

work page doi:10.18653/v1/n19-1423 2019
[72]

A Surface-Syntactic UD Treebank for N aija

Caron, Bernard and Courtin, Marine and Gerdes, Kim and Kahane, Sylvain. A Surface-Syntactic UD Treebank for N aija. Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019). 2019. doi:10.18653/v1/W19-7803

work page doi:10.18653/v1/w19-7803 2019
[73]

and others , title =

Buzaaba, H. and others , title =. 2026 , howpublished =

2026
[74]

and Santorini, Beatrice and Marcinkiewicz, Mary Ann

Marcus, Mitchell P. and Santorini, Beatrice and Marcinkiewicz, Mary Ann. Building a Large Annotated Corpus of E nglish: The P enn T reebank. Computational Linguistics. 1993

1993
[75]

A Gold Standard Dependency Corpus for E nglish

Silveira, Natalia and Dozat, Timothy and de Marneffe, Marie-Catherine and Bowman, Samuel and Connor, Miriam and Bauer, John and Manning, Chris. A Gold Standard Dependency Corpus for E nglish. Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14). 2014

2014
[76]

Manning, Joakim Nivre, and Daniel Zeman

de Marneffe, Marie-Catherine and Manning, Christopher D. and Nivre, Joakim and Zeman, Daniel. U niversal D ependencies. Computational Linguistics. 2021. doi:10.1162/coli_a_00402

work page doi:10.1162/coli_a_00402 2021
[77]

A new proof of C ayley's formula for counting labeled trees

Shor, Peter W. A new proof of C ayley's formula for counting labeled trees. Journal of Combinatorial Theory, Series A. 1995. doi:10.1016/0097-3165(95)90022-5

work page doi:10.1016/0097-3165(95)90022-5 1995
[78]

Combining (Second-Order) Graph-Based and Headed-Span-Based Projective Dependency Parsing

Yang, Songlin and Tu, Kewei. Combining (Second-Order) Graph-Based and Headed-Span-Based Projective Dependency Parsing. Findings of the Association for Computational Linguistics: ACL 2022. 2022

2022
[79]

Three New Probabilistic Models for Dependency Parsing: An Exploration

Eisner, Jason M. Three New Probabilistic Models for Dependency Parsing: An Exploration. COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics. 1996

1996
[80]

On the shortest arborescence of a directed graph

Chu, Yoeng-Jin. On the shortest arborescence of a directed graph. Scientia Sinica. 1965

1965

Showing first 80 references.