The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
Pith reviewed 2026-05-20 10:40 UTC · model grok-4.3
The pith
Low-resource NLP model scaling has outpaced the human expertise and infrastructure needed for authentic evaluation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is the Annotation Scarcity Paradox, defined as the structural friction arising when the technical capacity to scale models vastly outpaces the sovereign human infrastructure required to authentically evaluate them. Tracing low-resource NLP evaluation across phases of early heuristic optimism, illusions of top-down benchmark scaling, and generative bottlenecks, and examining extractive data pipelines, undercompensated ghost work, and language data flaring, the paper argues that this paradox threatens the epistemic validity of reported progress.
What carries the argument
The Annotation Scarcity Paradox, which captures the mismatch between rapid model and benchmark scaling and the strained, inequitably distributed human expertise required for authentic evaluation.
If this is right
- Existing benchmarks in low-resource NLP may overestimate model capabilities because they rely on insufficiently deep or representative human judgments.
- Responses such as data augmentation, model-based evaluation, and active learning carry their own equity and validity trade-offs that must be weighed carefully.
- Sustainable progress requires shifting from transactional data extraction to relational, community-embedded evaluation practices grounded in epistemic governance and data sovereignty.
Where Pith is reading between the lines
- Practitioners could test whether involving local language communities earlier in evaluation design improves alignment between benchmark scores and real-world utility.
- The same mismatch between scaling capacity and human evaluation resources may surface in other AI areas that depend on large-scale human feedback.
- Long-term field stability may hinge on building distributed infrastructure for annotation rather than continuing to centralize evaluation through global benchmarks.
Load-bearing premise
The sociolinguistic expertise required for authentic evaluation of generative systems is severely strained, inequitably distributed, and structurally marginalised in a manner that directly undermines the validity of existing benchmarks and reported results.
What would settle it
A side-by-side study in which native-speaker experts from low-resource language communities re-evaluate a set of existing benchmarks and produce results that closely match the original reported scores would challenge the claim that the paradox undermines epistemic validity.
Figures
read the original abstract
Over the past decade, low-resource natural language processing (NLP) has experienced explosive growth, propelled by cross-lingual transfer, massively multilingual models, and the rapid proliferation of benchmarks. Yet this apparent progress masks a critical, insufficiently examined tension: the deep sociolinguistic expertise required to evaluate increasingly complex generative systems is severely strained, inequitably distributed, and structurally marginalised. We present a critical narrative survey of low-resource NLP evaluation (2014--present), tracing its evolution across three phases: early heuristic optimism, the illusions of top-down benchmark scaling, and the current era of generative bottlenecks. We conceptualise the \emph{Annotation Scarcity Paradox}, the structural friction arising when the technical capacity to scale models vastly outpaces the sovereign human infrastructure required to authentically evaluate them. By examining extractive data pipelines, undercompensated ``ghost work'', and language data flaring, we argue that this paradox threatens the epistemic validity of reported progress. We survey emerging responses -- including data augmentation, model-based evaluation, participatory curation, and annotation-efficient approaches via item response theory and active learning -- and assess their equity and validity trade-offs. We close with a practitioner call to action, arguing that overcoming this bottleneck requires a paradigm shift from transactional data extraction to relational, community-embedded evaluation rooted in epistemic governance, data sovereignty, and shared ownership.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a critical narrative survey of low-resource NLP evaluation from 2014 to the present, tracing three phases of development (early heuristic optimism, illusions of top-down benchmark scaling, and generative bottlenecks). It defines the Annotation Scarcity Paradox as the structural mismatch between rapid model scaling capacity and limited sovereign human infrastructure for authentic evaluation, supported by analysis of extractive data pipelines, undercompensated ghost work, and language data flaring. The authors argue this paradox undermines epistemic validity of reported progress, survey responses including data augmentation, model-based evaluation, participatory curation, and IRT/active learning methods, and advocate a paradigm shift to relational, community-embedded evaluation grounded in epistemic governance and data sovereignty.
Significance. If the interpretive synthesis holds, the work offers a timely framework for recognizing structural constraints in low-resource NLP evaluation that could encourage more equitable practices and improved validity in multilingual benchmarks. The survey of historical phases and assessment of equity-validity trade-offs in emerging methods provides a useful reference point for researchers addressing annotation challenges, though its impact would be strengthened by tighter linkages to concrete evaluation failures.
major comments (1)
- [Abstract and section on the Annotation Scarcity Paradox] Abstract and section on the Annotation Scarcity Paradox: The central claim that the paradox 'threatens the epistemic validity of reported progress' is load-bearing but rests on interpretive synthesis of practices like ghost work; the manuscript would benefit from at least one concrete example (with citation) of a low-resource benchmark whose results have been directly questioned due to annotation infrastructure issues, to move beyond narrative to a falsifiable link.
minor comments (2)
- [section tracing the three phases] The three-phase historical framing is clear but could note any overlap or counter-examples between phases to avoid implying a strictly linear progression.
- [section surveying emerging responses] In the survey of responses, the equity and validity trade-offs for IRT-based and participatory methods are assessed at a high level; adding a short table summarizing the trade-offs would improve clarity for readers.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments on our manuscript. The feedback highlights an opportunity to strengthen the empirical grounding of our central argument, and we have revised the paper accordingly while preserving its character as a critical narrative survey.
read point-by-point responses
-
Referee: [Abstract and section on the Annotation Scarcity Paradox] Abstract and section on the Annotation Scarcity Paradox: The central claim that the paradox 'threatens the epistemic validity of reported progress' is load-bearing but rests on interpretive synthesis of practices like ghost work; the manuscript would benefit from at least one concrete example (with citation) of a low-resource benchmark whose results have been directly questioned due to annotation infrastructure issues, to move beyond narrative to a falsifiable link.
Authors: We agree that a concrete, citable example would make the load-bearing claim more directly falsifiable and would help readers connect the structural analysis to specific evaluation outcomes. While the manuscript's strength lies in its synthesis of patterns across extractive pipelines, ghost work, and data flaring, we accept that an illustrative case would improve clarity. In the revised version we have added a concise example in the section defining the Annotation Scarcity Paradox: we now reference documented concerns about annotation quality and inter-annotator agreement in the MasakhaNER benchmark for low-resource African languages, where reliance on non-expert and non-native annotators has been shown in follow-up studies to affect the reliability of reported performance metrics. We have also lightly updated the abstract to signal this addition. This revision keeps the narrative framing intact while addressing the request for a more explicit link. revision: yes
Circularity Check
No significant circularity in conceptual survey
full rationale
The paper is a critical narrative survey that defines the Annotation Scarcity Paradox through synthesis of external documented practices (extractive pipelines, ghost work, data flaring) across three historical phases. It draws on external literature for support and presents interpretive analysis of equity/validity trade-offs in responses such as participatory curation and IRT methods. No quantitative predictions, fitted parameters, equations, or derivations exist that could reduce to inputs by construction. The central claim relies on observed structural mismatches rather than self-referential definitions or load-bearing self-citations, rendering the argument self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Low-resource NLP evaluation has evolved through three identifiable phases (early heuristic optimism, illusions of top-down benchmark scaling, and generative bottlenecks) since 2014.
invented entities (1)
-
Annotation Scarcity Paradox
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We conceptualise the Annotation Scarcity Paradox, the structural friction arising when the technical capacity to scale models vastly outpaces the sovereign human infrastructure required to authentically evaluate them.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By examining extractive data pipelines, undercompensated ghost work, and language data flaring, we argue that this paradox threatens the epistemic validity of reported progress.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Towards Neural Machine Translation for African Languages
Jade Z Abbott and Laura Martinus. Towards neural machine translation for african languages. arXiv preprint arXiv:1811.05467 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
Correcting FLORES evaluation dataset for four A frican languages
Idris Abdulmumin, Sthembiso Mkhwanazi, Mahlatse Mbooi, Shamsuddeen Hassan Muhammad, Ibrahim Said Ahmad, Neo Putini, Miehleketo Mathebula, Matimba Shingange, Tajuddeen Gwadabe, and Vukosi Marivate. Correcting FLORES evaluation dataset for four A frican languages. In Barry Haddow, Tom Kocmi, Philipp Koehn, and Christof Monz, editors, Proceedings of the Nint...
work page 2024
-
[3]
Will global health survive its decolonisation? The Lancet , 396(10263):1627--1628, 2020
Seye Abimbola and Madhukar Pai. Will global health survive its decolonisation? The Lancet , 396(10263):1627--1628, 2020
work page 2020
-
[4]
Cross-lingual word embeddings for low-resource language modeling
Oliver Adams, Adam Makarucha, Graham Neubig, Steven Bird, and Trevor Cohn. Cross-lingual word embeddings for low-resource language modeling. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers , pages 937--947, 2017
work page 2017
-
[5]
AI and language data flaring in A frica: Addressing the low-resource challenge
Ife Adebara. AI and language data flaring in A frica: Addressing the low-resource challenge. Policy Brief No. 216 , 2025
work page 2025
-
[6]
Masakhaner 2.0: Africa-centric transfer learning for named entity recognition
David Ifeoluwa Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba Alabi, Shamsuddeen H Muhammad, Peter Nabende, et al. Masakhaner 2.0: Africa-centric transfer learning for named entity recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Proce...
work page 2022
-
[7]
Irokobench: A new benchmark for african languages in the age of large language models
David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, et al. Irokobench: A new benchmark for african languages in the age of large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Associatio...
work page 2025
-
[8]
JW 300: A wide-coverage parallel corpus for low-resource languages
Z eljko Agi \'c and Ivan Vuli \'c . JW 300: A wide-coverage parallel corpus for low-resource languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 3204--3210, 2019
work page 2019
-
[9]
Mega: Multilingual evaluation of generative ai
Kabir Ahuja, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, Tanuja Ganu, Sameer Segal, Mohamed Ahmed, et al. Mega: Multilingual evaluation of generative ai. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages 4232--4267, 2023
work page 2023
-
[10]
Adapting pre-trained language models to A frican languages via multilingual adaptive fine-tuning
Jesujoba Alabi, David Ifeoluwa Adelani, Marius Mosbach, and Dietrich Klakow. Adapting pre-trained language models to A frican languages via multilingual adaptive fine-tuning. In Proceedings of the 29th International Conference on Computational Linguistics , pages 4336--4349, 2022
work page 2022
-
[11]
Charting the landscape of african nlp: Mapping progress and shaping the road ahead
Jesujoba Alabi, Michael A Hedderich, David Ifeoluwa Adelani, and Dietrich Klakow. Charting the landscape of african nlp: Mapping progress and shaping the road ahead. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages 27795--27829, 2025
work page 2025
-
[12]
Common voice: A massively-multilingual speech corpus
Rosana Ardila, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers, and Gregor Weber. Common voice: A massively-multilingual speech corpus. In Proceedings of the twelfth language resources and evaluation conference , pages 4218--4222, 2020
work page 2020
-
[13]
Tadesse Destaw Belay, Kedir Yassin Hussen, Sukairaj Hafiz Imam, Ibrahim Said Ahmad, Isa Inuwa-Dutse, Abrham Belete Haile, Grigori Sidorov, Iqra Ameer, Idris Abdulmumin, Tajuddeen Gwadabe, et al. The rise of africanlp: Contributions, contributors, and community impact (2005-2025). arXiv preprint arXiv:2509.25477 , 2025
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[14]
Emily M. Bender and Batya Friedman. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics , 6:587--604, 2018
work page 2018
-
[15]
Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages 610--623, 2021
work page 2021
-
[16]
Decolonising speech and language technology
Steven Bird. Decolonising speech and language technology. In Proceedings of the 28th international conference on computational linguistics , pages 3504--3519, 2020
work page 2020
-
[17]
Abeba Birhane and Vinay Uday Prabhu. Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) , pages 1536--1546. IEEE, 2021
work page 2021
-
[18]
The values encoded in machine learning research
Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao. The values encoded in machine learning research. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency , pages 173--184, 2022
work page 2022
-
[19]
Algorithmic colonization of africa
Abeba Birhane. Algorithmic colonization of africa. SCRIPTed , 17:389, 2020
work page 2020
-
[20]
Systematic inequalities in language technology performance across the world’s languages
Damian Blasi, Antonios Anastasopoulos, and Graham Neubig. Systematic inequalities in language technology performance across the world’s languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 5486--5505, 2022
work page 2022
-
[21]
The care principles for indigenous data governance
Stephanie Russo Carroll, Ibrahim Garba, Oscar L Figueroa-Rodr \' guez, Jarita Holbrook, Raymond Lovett, Simeon Materechera, Mark Parsons, Kay Raseroka, Desi Rodriguez-Lonebear, Robyn Rowe, et al. The care principles for indigenous data governance. Open Scholarship Press Curated Volumes: Policy , 2023
work page 2023
-
[22]
An empirical survey of data augmentation for limited data learning in nlp
Jiaao Chen, Derek Tam, Colin Raffel, Mohit Bansal, and Diyi Yang. An empirical survey of data augmentation for limited data learning in nlp. Transactions of the Association for Computational Linguistics , 11:191--211, 2023
work page 2023
-
[23]
Yu Ying Chiu, Liwei Jiang, Bill Yuchen Lin, Chan Young Park, Shuyue Stella Li, Sahithya Ravi, Mehar Bhatia, Maria Antoniak, Yulia Tsvetkov, Vered Shwartz, et al. Culturalbench: A robust, diverse and challenging benchmark for measuring lms’ cultural knowledge through human-ai red-teaming. In Proceedings of the 63rd Annual Meeting of the Association for Com...
work page 2025
-
[24]
Unsupervised cross-lingual representation learning at scale
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm \'a n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th annual meeting of the association for computational linguistics , pages 8440--8451, 2020
work page 2020
-
[25]
Scrambling for africa? universities and global health
Johanna Crane. Scrambling for africa? universities and global health. The Lancet , 377(9775):1388--1390, 2011
work page 2011
-
[26]
DataDotOrg . Digitisation of oral data for nlp of low-resource languages: Practical methods and processes for scalable and sustainable ecosystem development. Playbook, DataDotOrg, Washington, D.C., USA, 2026. A playbook for building sustainable African language technology ecosystems
work page 2026
-
[27]
Localising the mozilla common voice platform for south africa’s official languages
Febe de Wet, Andiswa Bukula, Willem Karsten, Martin Puttkammer, Erwin Schillack, Rone Wierenga, and Roald Eiselen. Localising the mozilla common voice platform for south africa’s official languages. Journal of the Digital Humanities Association of Southern Africa (DHASA) , 4(01), 2022
work page 2022
-
[28]
Bottom-up data trusts: Disturbing the ‘one size fits all’approach to data governance
Sylvie Delacroix and Neil D Lawrence. Bottom-up data trusts: Disturbing the ‘one size fits all’approach to data governance. International data privacy law , 9(4):236--252, 2019
work page 2019
-
[29]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages 4171--4186, 2019
work page 2019
-
[30]
Nl-augmenter: A framework for task-sensitive natural language augmentation
Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahadiran, Simon Mille, Ashish Shrivastava, Samson Tan, et al. Nl-augmenter: A framework for task-sensitive natural language augmentation. Northern European Journal of Language Technology , 9, 2023
work page 2023
-
[31]
David M. Eberhard, Gary F. Simons, and Charles D. Fennig. Ethnologue : Languages of the world. SIL International, 2025
work page 2025
-
[32]
Abteen Ebrahimi, Manuel Mager, Adam Wiemerslage, Pavel Denisov, Katharina Kann, et al. AmericasNLI : Evaluating zero-shot natural language understanding of pretrained multilingual models in truly low-resource languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 6279--6299, 2022
work page 2022
-
[33]
Jake Okechukwu Effoduh. Decolonizing the governance of artificial intelligence in africa: from normative mimicry to epistemic sovereignty. Science and Public Policy , 53(2):245--257, 2026
work page 2026
-
[34]
Developing text resources for ten S outh A frican languages
Roald Eiselen and Martin J Puttkammer. Developing text resources for ten S outh A frican languages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) , pages 3698--3703, 2014
work page 2014
-
[35]
A survey of data augmentation approaches for nlp
Steven Y Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, and Eduard Hovy. A survey of data augmentation approaches for nlp. In Findings of the association for computational linguistics: ACL-IJCNLP 2021 , pages 968--988, 2021
work page 2021
-
[36]
A typology of reviews: an analysis of 14 review types and associated methodologies
Maria J Grant and Andrew Booth. A typology of reviews: an analysis of 14 review types and associated methodologies. Health information & libraries journal , 26(2):91--108, 2009
work page 2009
-
[37]
Universal neural machine translation for extremely low resource languages
Jiatao Gu, Hany Hassan Awadalla, Jacob Devlin, and Victor OK Li. Universal neural machine translation for extremely low resource languages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages 344--354, 2018
work page 2018
-
[38]
The weirdest people in the world? Behavioral and Brain Sciences , 33(2-3):61--83, 2010
Joseph Henrich, Steven J Heine, and Ara Norenzayan. The weirdest people in the world? Behavioral and Brain Sciences , 33(2-3):61--83, 2010
work page 2010
-
[39]
Challenges and strategies in cross-cultural nlp
Daniel Hershcovich, Stella Frank, Heather Lent, Miryam De Lhoneux, Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, et al. Challenges and strategies in cross-cultural nlp. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 6...
work page 2022
-
[40]
XTREME : A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. XTREME : A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In International conference on machine learning , pages 4411--4421. PMLR, 2020
work page 2020
-
[41]
Lessons from archives: Strategies for collecting sociocultural data in machine learning
Eun Seo Jo and Timnit Gebru. Lessons from archives: Strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 conference on fairness, accountability, and transparency , pages 306--316, 2020
work page 2020
-
[42]
The state and fate of linguistic diversity and inclusion in the nlp world
Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. The state and fate of linguistic diversity and inclusion in the nlp world. In Proceedings of the 58th annual meeting of the association for computational linguistics , pages 6282--6293, 2020
work page 2020
-
[43]
Nataliia Kholodna, Sahib Julka, Mohammad Khodadadi, Muhammed Nurullah Gumus, and Michael Granitzer. Llms in the loop: Leveraging large language model annotations for active learning in low-resource languages. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages 397--412. Springer, 2024
work page 2024
-
[44]
Practical Natural Language Processing for Low-Resource Languages
Benjamin Philip King. Practical Natural Language Processing for Low-Resource Languages . PhD thesis, University of Michigan, 2015
work page 2015
-
[45]
Lessons learned from a citizen science project for natural language processing
Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, G \"o zde S ahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart De Castilho, and Iryna Gurevych. Lessons learned from a citizen science project for natural language processing. In Andreas Vlachos and Isabelle Augenstein, editors, Proceedings of the 17th Conference of the European Chapter of t...
work page 2023
-
[46]
The IIT B ombay E nglish- H indi parallel corpus
Anoop Kunchukuttan, Pratik Mehta, and Pushpak Bhattacharyya. The IIT B ombay E nglish- H indi parallel corpus. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) , 2018
work page 2018
-
[47]
John P. Lalor, Hao Wu, and Hong Yu. Learning latent parameters without human response patterns: Item response theory with artificial crowds. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP) , pages 4674--4684, Hong Kong, China, November 2019. Association for Computational Linguistics
work page 2019
-
[48]
Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, B \"o rje F. Karlsson, James J...
work page 2024
-
[49]
Challenges of language technologies for the indigenous languages of the A mericas
Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra, and Ivan Meza-Ruiz. Challenges of language technologies for the indigenous languages of the A mericas. In Proceedings of the 27th International Conference on Computational Linguistics , pages 55--69, 2018
work page 2018
-
[50]
Manuel Mager, Arturo Oncevay, Abteen Ebrahimi, John Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo Gim \'e nez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Rolando Coto-Solano, Alexis Palmer, Elisabeth Mager-Hois, Vishrav Chaudhary, Graham Neubig, Ngoc Thang Vu, and Katharina Kann. Findings of the A mericas NLP 2021 sh...
work page 2021
-
[51]
a ubener, Sophie Fellenz, Asja Fischer, Thomas G \
Laura Manduchi, Clara Meister, Kushagra Pandey, Robert Bamler, Ryan Cotterell, Sina D \"a ubener, Sophie Fellenz, Asja Fischer, Thomas G \"a rtner, Matthias Kirchler, et al. On the challenges and opportunities in generative ai. arXiv preprint arXiv:2403.00025 , 2024
-
[52]
Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI
Robert Munro Monarch. Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI . Simon and Schuster, 2021
work page 2021
-
[53]
Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa'id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Alipio Jorge, Felermino D \'a rio M \'a rio Ant \'o nio Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Taj...
work page 2023
-
[54]
Participatory research for low-resourced machine translation: A case study in A frican languages
Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon Kabongo Kabenamualu, Salomey Osei, Freshia Sackey, et al. Participatory research for low-resourced machine translation: A case study in A frican languages. In Findings of the Association for Computational Linguis...
work page 2020
-
[55]
Jessica Ojo, Odunayo Ogundepo, Akintunde Oladipo, Kelechi Ogueji, Jimmy Lin, Pontus Stenetorp, and David Ifeoluwa Adelani. Afrobench: how good are large language models on african languages? In Findings of the Association for Computational Linguistics: ACL 2025 , pages 19048--19095, 2025
work page 2025
-
[56]
Moving toward truly responsible AI development in the global AI market, 2024
Chinasa Okolo and Marie Tano. Moving toward truly responsible AI development in the global AI market, 2024. Brookings Institution
work page 2024
-
[57]
Reforming data regulation to advance AI governance in Africa , 2024
Chinasa Okolo. Reforming data regulation to advance AI governance in Africa , 2024
work page 2024
-
[58]
Chijioke Okorie and Melissa Omino. Addressing inequitable openness in licences for sharing african data and datasets through the nwulite obodo open data licence. Law, Tech. & Hum. , 7:94, 2025
work page 2025
-
[59]
It’s the noodl license--awesome and amazingly geeky! Available at SSRN 5339254 , 2025
Chijioke Okorie. It’s the noodl license--awesome and amazingly geeky! Available at SSRN 5339254 , 2025
work page 2025
-
[60]
Nokuthula Olorunju and Rachel Adams. African data trusts: new tools towards collective data governance? Information & Communications Technology Law , 33(1):85--98, 2024
work page 2024
-
[61]
Outreach programme to strengthen the AI4D network: final technical report
Davor Orlic. Outreach programme to strengthen the AI4D network: final technical report. Technical report, AI4D Africa, 2021
work page 2021
-
[62]
PazaBench : A speech and language model benchmark for low-resource african languages
Salomey Osei et al. PazaBench : A speech and language model benchmark for low-resource african languages. Microsoft Research, 2024
work page 2024
-
[63]
Ai by the people, for the people, July 2023
Billy Perrigo. Ai by the people, for the people, July 2023
work page 2023
-
[64]
tinybenchmarks: evaluating llms with fewer examples
Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, and Mikhail Yurochkin. tinybenchmarks: evaluating llms with fewer examples. In Proceedings of the 41st International Conference on Machine Learning , pages 34303--34326, 2024
work page 2024
-
[65]
On releasing annotator-level labels and information in datasets
Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, and Mark Diaz. On releasing annotator-level labels and information in datasets. In Claire Bonial and Nianwen Xue, editors, Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop , pages 133--138, Punta Cana, Dominican Republic, November 2...
work page 2021
-
[66]
Jenalea Rajab, Anuoluwapo Aremu, Everlyn Asiko Chimoto, Dale Dunbar, Graham Morrissey, Fadel Thior, Luandrie Potgieter, Jessica Ojo, Atnafu Lambebo Tonja, Wilhelmina NdapewaOnyothi Nekoto, et al. The esethu framework: Reimagining sustainable dataset governance and curation for low-resource languages. In Proceedings of the 63rd Annual Meeting of the Associ...
work page 2025
-
[67]
Lalor, Robin Jain, and Jordan Boyd-Graber
Pedro Rodriguez, Joe Barrow, Alexander Miserlis Hoyle, John P. Lalor, Robin Jain, and Jordan Boyd-Graber. Evaluation examples are not equally informative: How should that change NLP leaderboards? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 4489--4504, Online, August 2021. Assoc...
work page 2021
- [68]
-
[69]
Everyone wants to do the model work, not the data work: Data cascades in high-stakes ai
Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. Everyone wants to do the model work, not the data work: Data cascades in high-stakes ai. In proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , pages 1--15, 2021
work page 2021
-
[70]
Ai4d--african language dataset challenge
Kathleen Siminyu, Sackey Freshia, Jade Abbott, and Vukosi Marivate. Ai4d--african language dataset challenge. arXiv preprint arXiv:2007.11865 , 2020
-
[71]
Ai4d--african language program
Kathleen Siminyu, Godson Kalipe, Davor Orlic, Jade Abbott, Vukosi Marivate, Sackey Freshia, Prateek Sibal, Bhanu Neupane, David I Adelani, Amelia Taylor, et al. Ai4d--african language program. arXiv preprint arXiv:2104.02516 , 2021
-
[72]
Harman Singh, Nitish Gupta, Shikhar Bharadwaj, Dinesh Tewari, and Partha Talukdar. Indicgenbench: A multilingual benchmark to evaluate generation capabilities of llms on indic languages. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 11047--11073, 2024
work page 2024
-
[73]
Aya dataset: An open-access collection for multilingual instruction tuning
Shivalika Singh, Freddie Vargus, Daniel D’souza, B \"o rje F Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura O’Mahony, et al. Aya dataset: An open-access collection for multilingual instruction tuning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Lon...
work page 2024
-
[74]
Participation is not a design fix for machine learning
Mona Sloane, Emanuel Moss, Olaitan Awomolo, and Laura Forlano. Participation is not a design fix for machine learning. In Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization , pages 1--6, 2022
work page 2022
-
[75]
Literature review as a research methodology: An overview and guidelines
Hannah Snyder. Literature review as a research methodology: An overview and guidelines. Journal of business research , 104:333--339, 2019
work page 2019
-
[76]
Sea-helm: Southeast asian holistic evaluation of language models
Yosephine Susanto, Adithya Venkatadri Hulagadri, Jann Railey Montalan, Jian Gang Ngui, Xianbin Yong, Wei Qi Leong, Hamsawardhini Rengarajan, Peerat Limkonchotiwat, Yifan Mai, and William Chandra Tjhi. Sea-helm: Southeast asian holistic evaluation of language models. In Findings of the Association for Computational Linguistics: ACL 2025 , pages 12308--12336, 2025
work page 2025
-
[77]
Kaitiakitanga m \=a ori data sovereignty licences, 2021
Karaitiana Taiuru. Kaitiakitanga m \=a ori data sovereignty licences, 2021
work page 2021
-
[78]
Omnilingual asr: Open-source multilingual speech recognition for 1600+ languages
Omnilingual ASR team, Gil Keren, Artyom Kozhevnikov, Yen Meng, Christophe Ropers, Matthew Setzler, Skyler Wang, Ife Adebara, Michael Auli, Can Balioglu, Kevin Chan, Chierh Cheng, Joe Chuang, Caley Droof, Mark Duppenthaler, Paul-Ambroise Duquenne, Alexander Erben, Cynthia Gao, Gabriel Mejia Gonzalez, Kehan Lyu, Sagar Miglani, Vineel Pratap, Kaushik Ram Sad...
-
[79]
Introducing the asian language treebank (alt)
Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. Introducing the asian language treebank (alt). In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) , pages 1574--1578, 2016
work page 2016
-
[80]
Hao Yu, Tianyi Xu, Michael A Hedderich, Wassim Hamidouche, Syed Waqas Zamir, and David Ifeoluwa Adelani. Afriquellm: How data mixing and model architecture impact continued pre-training for african languages. arXiv preprint arXiv:2601.06395 , 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.