SimCT enlarges the supervision space in cross-tokenizer on-policy distillation using short jointly tokenizable multi-token continuations, producing consistent gains over shared-token baselines on math and code benchmarks.
Diversity-aware reverse kullback-leibler divergence for large language model distillation.arXiv preprint
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
CIST uses per-sample adaptive temperatures for both teacher and student in knowledge distillation to ensure consistent entropy in soft labels and reports gains on vision and language tasks.
citing papers explorer
-
SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation
SimCT enlarges the supervision space in cross-tokenizer on-policy distillation using short jointly tokenizable multi-token continuations, producing consistent gains over shared-token baselines on math and code benchmarks.
-
Consistently Informative Soft-Label Temperature for Knowledge Distillation
CIST uses per-sample adaptive temperatures for both teacher and student in knowledge distillation to ensure consistent entropy in soft labels and reports gains on vision and language tasks.