The study filters non-English Wikipedia, reveals quality problems, proposes a 4-level ranking, and shows filtered data matches or beats raw data in language modeling with largest gains for lower-quality editions.
Transactions of the Association for Computational Linguistics , volume =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
support 1representative citing papers
AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.
citing papers explorer
-
How Good is Your Wikipedia? Auditing Data Quality for Low-resource and Multilingual NLP
The study filters non-English Wikipedia, reveals quality problems, proposes a 4-level ranking, and shows filtered data matches or beats raw data in language modeling with largest gains for lower-quality editions.
-
Are Researchers Being Replaced by Artificial Intelligence?
AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.