Chain-based Distillation constructs a sequence of anchor models to enable efficient initialization of variable-sized SLMs through interpolation, with bridge distillation for cross-architecture transfer, yielding better performance than scratch training.
International conference on machine learning , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.
citing papers explorer
-
Chain-based Distillation for Effective Initialization of Variable-Sized Small Language Models
Chain-based Distillation constructs a sequence of anchor models to enable efficient initialization of variable-sized SLMs through interpolation, with bridge distillation for cross-architecture transfer, yielding better performance than scratch training.
-
Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems
Nexa learns a response-conditioned policy that starts with parallel agent execution and adds at most one round of sequential message passing via a predicted sparse DAG, strictly subsuming pure parallel mode.