Self-generated QA supervision for language models is fragile due to non-uniform question selection and instruction compliance during answering, with mitigations that reduce compliance from 88% to 13%.
Efficient knowledge injection in llms via self-distillation
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Attention-state memory externalizes long prefixes into a lightweight lookup table of precomputed attention states, yielding higher accuracy than standard in-context learning at fixed memory budgets and lower latency than full attention.
Uni-OPD unifies on-policy distillation across LLMs and MLLMs with dual-perspective strategies that promote student exploration and enforce order-consistent teacher supervision based on outcome rewards.
citing papers explorer
-
Self-Study Reconsidered: The Hidden Fragility of Learning from Self-Generated QA
Self-generated QA supervision for language models is fragile due to non-uniform question selection and instruction compliance during answering, with mitigations that reduce compliance from 88% to 13%.
-
Context Memorization for Efficient Long Context Generation
Attention-state memory externalizes long prefixes into a lightweight lookup table of precomputed attention states, yielding higher accuracy than standard in-context learning at fixed memory budgets and lower latency than full attention.
-
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
Uni-OPD unifies on-policy distillation across LLMs and MLLMs with dual-perspective strategies that promote student exploration and enforce order-consistent teacher supervision based on outcome rewards.