Self-generated QA supervision for language models is fragile due to non-uniform question selection and instruction compliance during answering, with mitigations that reduce compliance from 88% to 13%.
Efficient knowledge injection in llms via self-distillation
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Language models can use a two-stage sleep process of upward distillation for memory consolidation and RL-based dreaming for unsupervised self-improvement to enable continual learning.
Attention-state memory externalizes long prefixes into a lightweight lookup table of precomputed attention states, yielding higher accuracy than standard in-context learning at fixed memory budgets and lower latency than full attention.
Uni-OPD unifies on-policy distillation across LLMs and MLLMs with dual-perspective strategies that promote student exploration and enforce order-consistent teacher supervision based on outcome rewards.
Existing methods for turning LLM interaction experience into parametric skills collapse over multiple iterations; principle-level experience, step-wise injection, and off-policy teacher distillation yield more stable continual learning.
citing papers explorer
-
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
Language models can use a two-stage sleep process of upward distillation for memory consolidation and RL-based dreaming for unsupervised self-improvement to enable continual learning.
-
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
Uni-OPD unifies on-policy distillation across LLMs and MLLMs with dual-perspective strategies that promote student exploration and enforce order-consistent teacher supervision based on outcome rewards.