PARAMΔ upcycles dense models to MoE for per-language experts and grafts post-training deltas to enable data-efficient language expansion while preserving original capabilities.
The Belebele benchmark: A parallel reading comprehension dataset in 122 language variants
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 1polarities
unclear 1representative citing papers
Language composition in training data creates opposing effects on CLIR and mono-IR performance for Korean-English retrieval, which model merging can partially resolve.
Pith review generated a malformed one-line summary.
Coreference resolution improves retrieval relevance and QA performance in RAG systems, with mean pooling performing best and smaller models benefiting more.
Qwen2.5 LLMs scale pre-training data to 18 trillion tokens and apply multistage reinforcement learning, achieving competitive performance on benchmarks with models up to 5 times larger.
citing papers explorer
-
A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE
PARAMΔ upcycles dense models to MoE for per-language experts and grafts post-training deltas to enable data-efficient language expansion while preserving original capabilities.
-
Qwen3 Technical Report
Pith review generated a malformed one-line summary.
-
From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems
Coreference resolution improves retrieval relevance and QA performance in RAG systems, with mean pooling performing best and smaller models benefiting more.
-
Qwen2.5 Technical Report
Qwen2.5 LLMs scale pre-training data to 18 trillion tokens and apply multistage reinforcement learning, achieving competitive performance on benchmarks with models up to 5 times larger.