The authors built and publicly released sentence-aligned simplification corpora for five languages by processing crowd-sourced data from comparable documents.
Align and Shine: Building High-Quality Sentence-Aligned Corpora for Multilingual Text Simplification
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Text simplification plays a crucial role in improving the accessibility and comprehensibility of written information for diverse audiences, including language learners and readers with limited literacy. Despite its importance, large-scale, high-quality datasets for training and evaluating text simplification models remain scarce for languages other than English. This paper reports an experimental study on the collection and processing of crowd-sourced simplification data from comparable corpora to construct a corpus suitable for both training and testing text simplification systems across multiple languages (Catalan, English, French, Italian and Spanish). We report mechanisms for sentence-level alignment from document-level data. The resulting dataset of the aligned sentence pairs is publicly available.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Align and Shine: Building High-Quality Sentence-Aligned Corpora for Multilingual Text Simplification
The authors built and publicly released sentence-aligned simplification corpora for five languages by processing crowd-sourced data from comparable documents.