NITP augments standard next-token prediction with implicit semantic prediction in representation space using shallow-layer self-supervision, reporting consistent downstream gains on 0.5B-9B models including 5.7% on MMLU-Pro for a 9B MoE.
The lambada dataset: Word prediction requiring a broad discourse context
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Manta-LM approximates the HJB equation via flow matching in latent control space to realize closed-loop optimal control for language generation.
citing papers explorer
-
NITP: Next Implicit Token Prediction for LLM Pre-training
NITP augments standard next-token prediction with implicit semantic prediction in representation space using shallow-layer self-supervision, reporting consistent downstream gains on 0.5B-9B models including 5.7% on MMLU-Pro for a 9B MoE.
-
Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space
Manta-LM approximates the HJB equation via flow matching in latent control space to realize closed-loop optimal control for language generation.