MultiSynt/MT supplies 4.8 trillion translated tokens in 36 languages from 100B English tokens, letting LLMs match native-data baselines with 72% fewer tokens and beat them by 15% at equal budget.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Compact 0.8B-7B models for bidirectional Japanese-English translation outperform large multilingual models on real-world domain benchmarks.
citing papers explorer
-
MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages
MultiSynt/MT supplies 4.8 trillion translated tokens in 36 languages from 100B English tokens, letting LLMs match native-data baselines with 72% fewer tokens and beat them by 15% at equal budget.
-
CAT-Translate: Building Compact Open-Source Models for Japanese-English Translation
Compact 0.8B-7B models for bidirectional Japanese-English translation outperform large multilingual models on real-world domain benchmarks.