pith. sign in

hub Canonical reference

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Canonical reference. 83% of citing Pith papers cite this work as background.

23 Pith papers citing it
Background 83% of classified citations

hub tools

citation-role summary

background 5 dataset 1

citation-polarity summary

representative citing papers

All is Not Lost: LLM Recovery without Checkpoints

cs.DC · 2025-06-18 · conditional · novelty 7.0

CheckFree recovers intermediate stage failures in pipeline-parallel LLM training via neighbor averaging; CheckFree+ adds out-of-order execution to handle first/last stages by copying neighbors, with small embedding storage, outperforming checkpointing and redundancy at 5-10% failure rates by up to

Towards Human-Level Book-Writing Capability

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

A prompt-to-book training framework that derives hierarchical summaries from public-domain novels and inverts them to supervise long-context models toward human literary prose instead of assistant-style output.

Primal-Dual Guided Decoding for Constrained Discrete Diffusion

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

Primal-dual guided decoding casts constrained discrete diffusion as a KL-regularized optimization solved online with adaptive Lagrangian multipliers to satisfy constraints while staying close to the unconstrained model distribution.

TextLDM: Language Modeling with Continuous Latent Diffusion

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

TextLDM applies DiT-style latent diffusion with flow matching to language modeling via a REPA-aligned VAE, outperforming prior diffusion LMs and matching GPT-2 when trained from scratch on OpenWebText2.

Latent Planning Emerges with Scale

cs.CL · 2026-04-14 · unverdicted · novelty 6.0

Latent planning ability in LLMs emerges and strengthens with scale, shown through internal features that represent future words and influence token choices on planning and rhyming tasks.

Textbooks Are All You Need II: phi-1.5 technical report

cs.CL · 2023-09-11 · unverdicted · novelty 6.0

phi-1.5 is a 1.3B parameter model trained on synthetic textbook data that matches the reasoning performance of models five times larger on natural language, math, and basic coding tasks.

Textbooks Are All You Need

cs.CL · 2023-06-20 · unverdicted · novelty 6.0

A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.

citing papers explorer

Showing 23 of 23 citing papers.