Rethinking layer-wise model merging through chain of merges.arXiv preprint arXiv:2508.21421,

Buzzega, P · 2025 · arXiv 2508.21421

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic

cs.LG · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Distilling hidden representations from a curvature-regularized linearized teacher into a conventionally fine-tuned non-linear student transfers disentangled task-vector behavior, enabling effective model merging and unlearning with no inference overhead.

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.

citing papers explorer

Showing 2 of 2 citing papers.

Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic cs.LG · 2026-05-18 · unverdicted · none · ref 2 · 2 links
Distilling hidden representations from a curvature-regularized linearized teacher into a conventionally fine-tuned non-linear student transfers disentangled task-vector behavior, enabling effective model merging and unlearning with no inference overhead.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment cs.LG · 2026-04-07 · unverdicted · none · ref 11
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MATH when transferring CoT from 14B to 7B models.

Rethinking layer-wise model merging through chain of merges.arXiv preprint arXiv:2508.21421,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer