A 300M multilingual embedding model matches or exceeds 7B retrieval performance via optimized data scale, hard negatives, and task diversity over language diversity.
InProceedings of the 2021 Conference on Empirical Methods in Natural Language Process- ing, pages 6894–6910, Online and Punta Cana, Do- minican Republic
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Retrofitting Small Multilingual Models for Retrieval: Matching 7B Performance with 300M Parameters
A 300M multilingual embedding model matches or exceeds 7B retrieval performance via optimized data scale, hard negatives, and task diversity over language diversity.