Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.
citing papers explorer
-
Scaling Laws for Mixture Pretraining Under Data Constraints
Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.
-
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.