TOPD improves on-policy distillation for LLM reasoning by using near-future guidance to identify divergent states, raising average accuracy from 47.8% to 52.2% on math benchmarks including AIME24 and AIME25.
Quantized-tinyllava: A new multimodal foundation model enables efficient split learning
5 Pith papers cite this work. Polarity classification is still indexing.
5
Pith papers citing it
years
2026 5representative citing papers
Rock Tokens in on-policy distillation persist at high loss, account for up to 18% of outputs, absorb large gradient norms, but add negligible value to reasoning performance.
MOSAIC combines frozen-LLM semantic embeddings with hierarchical consistency objectives to report up to 3.4% AUC gains on knowledge-tracing benchmarks including a new MOOC dataset.
SynGR is a new framework for generative recommendation that constrains overreliance on single modalities to exploit synergistic cross-modal information for better item semantics and user preference modeling.