SiDiaC is a new historical corpus of Sinhala literary works spanning the 5th to 20th centuries, constructed via OCR digitization, orthography modernization, and genre-based annotation.
Whitt, Martin Durrell, and Paul Bennett
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SiDiaC: Sinhala Diachronic Corpus
SiDiaC is a new historical corpus of Sinhala literary works spanning the 5th to 20th centuries, constructed via OCR digitization, orthography modernization, and genre-based annotation.