A syllable-based tokenizer for Turkish enables a tiny 1.5M-parameter model to reach 50.3% Recall@5 on TQuAD retrieval, beating a much larger morphology baseline.
BERT: Pre-training of deep bidirectional transformers for language understanding
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval
A syllable-based tokenizer for Turkish enables a tiny 1.5M-parameter model to reach 50.3% Recall@5 on TQuAD retrieval, beating a much larger morphology baseline.