KletterMix is a translated German corpus from English pretraining data that yields measurable gains on German downstream tasks in controlled pretraining experiments.
The german commons - 154 billion tokens of openly licensed text for german language models, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
KletterMix: Climbing Toward High-Quality German Pretraining Data
KletterMix is a translated German corpus from English pretraining data that yields measurable gains on German downstream tasks in controlled pretraining experiments.