Recursive character-based chunking at 300 characters outperforms Sentence-Based, Khmer-Aware, and LLM-Based methods on L2 distance, answer relevance, and Khmer IoU in a 5-fold evaluation on 18 Khmer agricultural QA pairs.
Late chunking: Contextual chunk embeddings using long-context embedding models,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Evaluation of Chunking Strategies for Effective Text Embedding in Low-Resource Language on Agricultural Documents
Recursive character-based chunking at 300 characters outperforms Sentence-Based, Khmer-Aware, and LLM-Based methods on L2 distance, answer relevance, and Khmer IoU in a 5-fold evaluation on 18 Khmer agricultural QA pairs.