Title resolution pending

Gao, Leo, Tow, Jonathan, Abbasi, Baber, Biderman, Stella, Black, Sid, DiPofi, Anthony

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

SynPro uses RL-optimized rephrasing and reformatting of organic data to generate synthetic pretraining tokens that deliver 3.7-5.2x the effective learning of simple repetition and can exceed training on unique data at 1.1B scale.

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

cs.CL · 2024-10-23 · conditional · novelty 6.0

Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment cs.CL · 2026-05-13 · unverdicted · none · ref 77
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
Generating Pretraining Tokens from Organic Data for Data-Bound Scaling cs.CL · 2026-05-18 · unverdicted · none · ref 49
SynPro uses RL-optimized rephrasing and reformatting of organic data to generate synthetic pretraining tokens that deliver 3.7-5.2x the effective learning of simple repetition and can exceed training on unique data at 1.1B scale.
Scaling Diffusion Language Models via Adaptation from Autoregressive Models cs.CL · 2024-10-23 · conditional · none · ref 46
Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer