A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Gen-SSD improves chain-of-thought distillation by letting the student model guide the teacher's generation process through real-time selection of learnable reasoning branches, yielding 5.9-point gains over standard KD on math benchmarks.
LiveClawBench is a pilot benchmark and Triple-Axis Complexity Framework for evaluating LLM agents on compositional real-world assistant tasks derived from real usage data.
citing papers explorer
-
Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning
A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.
-
Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection
Gen-SSD improves chain-of-thought distillation by letting the student model guide the teacher's generation process through real-time selection of learnable reasoning branches, yielding 5.9-point gains over standard KD on math benchmarks.
-
LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks
LiveClawBench is a pilot benchmark and Triple-Axis Complexity Framework for evaluating LLM agents on compositional real-world assistant tasks derived from real usage data.