Model merging is cast as PoE inference with EBM experts, revealing Gaussian assumptions in prior work and proposing convergent Cauchy experts that improve empirical performance.
Dataless knowledge fusion by merging weights of language models
13 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 13roles
background 2polarities
background 2representative citing papers
MergeProbe forecasts LoRA adapter mergeability from first-few-percent training signals and outperforms interference-aware baselines on retention while adding low overhead on a five-domain benchmark.
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Model merging is generalized as Fréchet averaging on symmetry-invariant manifolds, containing Fisher merging as a special case and offering a new approach for LoRA adapters.
Merging breaks MoE routing via softmax sensitivity; HARC uses Hessian curvature for closed-form router calibration that improves merged model performance without retraining.
DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.
ADR achieves theoretically zero-forgetting class-incremental graph learning by combining backpropagation adaptation with ridge-regression-based layer-wise merging of GNN linear transformations.
ACE-Merging estimates task input covariances from parameter differences to enable closed-form data-free merging that reduces interference and outperforms prior baselines on vision and language tasks.
Empirical scaling laws for LLM merging show a size-dependent floor and 1/k-like tail in cross-entropy loss that holds across architectures and merging methods.
ReLoRA reduces time-to-readiness for LoRA adapters on updated LLMs by up to 8.9x through adaptive Bayesian initialization and scheduled regularization while improving accuracy by up to 4.6%.
Post-processing via random selection or linear combination of differentially private models allows meeting arbitrary target privacy parameters without additional training.
MAny addresses dual-forgetting in multimodal continual instruction tuning via CPM and LPM merging strategies, delivering up to 8.57% accuracy gains on UCIT benchmarks without additional training.
citing papers explorer
-
Model Merging Scaling Laws in Large Language Models
Empirical scaling laws for LLM merging show a size-dependent floor and 1/k-like tail in cross-entropy loss that holds across architectures and merging methods.