M-DaQ introduces a diversity-aware sampling framework combining a quality scoring model with maximal marginal relevance selection to build multilingual instruction fine-tuning datasets, yielding models with over 60% average win rates on Alpaca-Eval and MT-Bench across 18 languages.
Unsupervised cross-lingual represen- tation learning at scale,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
M-DaQ: Retrieving Samples with Multilingual Diversity and Quality for Instruction Fine-Tuning Datasets
M-DaQ introduces a diversity-aware sampling framework combining a quality scoring model with maximal marginal relevance selection to build multilingual instruction fine-tuning datasets, yielding models with over 60% average win rates on Alpaca-Eval and MT-Bench across 18 languages.