Distilling hidden representations from a curvature-regularized linearized teacher into a conventionally fine-tuned non-linear student transfers disentangled task-vector behavior, enabling effective model merging and unlearning with no inference overhead.
Rethinking layer-wise model merging through chain of merges.arXiv preprint arXiv:2508.21421,
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.
citing papers explorer
-
Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic
Distilling hidden representations from a curvature-regularized linearized teacher into a conventionally fine-tuned non-linear student transfers disentangled task-vector behavior, enabling effective model merging and unlearning with no inference overhead.
-
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.