Hynek Kydl\'i\v{c}ek
Identifiers
No identifiers captured yet.
Papers (3)
- How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data cs.CL · 2026 · author #5
- SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model cs.CL · 2025 · author #8
- The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale cs.CL · 2024 · author #2
Mentions
No mention provenance yet.
Frequent Coauthors
- Colin Raffel 3 shared papers
- Guilherme Penedo 3 shared papers
- Leandro Von Werra 3 shared papers
- Thomas Wolf 3 shared papers
- Anton Lozhkov 2 shared papers
- Elie Bakouch 2 shared papers
- Lewis Tunstall 2 shared papers
- Loubna Ben Allal 2 shared papers
- Agust\'in Piqueres Lajar\'in 1 shared papers
- Andr\'es Marafioti 1 shared papers
- Atsuki Yamaguchi 1 shared papers
- Ben Burtenshaw 1 shared papers
- Caleb Fahlgren 1 shared papers
- Cl\'ementine Fourrier 1 shared papers
- Cyril Zakka 1 shared papers
- Edward Emanuel Beeching 1 shared papers
- Gabriel Mart\'in Bl\'azquez 1 shared papers
- Haojun Zhao 1 shared papers
- Hugo Larcher 1 shared papers
- Joel Niklaus 1 shared papers