pith. sign in

Hynek Kydl\'i\v{c}ek

Identifiers

No identifiers captured yet.

Papers (3)

  1. How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data cs.CL · 2026 · author #5
  2. SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model cs.CL · 2025 · author #8
  3. The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale cs.CL · 2024 · author #2

Mentions

No mention provenance yet.

Frequent Coauthors