Title resolution pending

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research , author= · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

DiHAL uses geometry proxies to pick where to replace the lower layers of a pretrained transformer with a diffusion bridge for hidden-state reconstruction, improving over token-level diffusion baselines on 8B models.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

cs.CL · 2025-02-04 · unverdicted · novelty 5.0

SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.

citing papers explorer

Showing 3 of 3 citing papers.

Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement cs.CL · 2026-05-14 · unverdicted · none · ref 48
DiHAL uses geometry proxies to pick where to replace the lower layers of a pretrained transformer with a diffusion bridge for hidden-state reconstruction, improving over token-level diffusion baselines on 8B models.
ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 121
ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model cs.CL · 2025-02-04 · unverdicted · none · ref 3
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer