Dataless knowledge fusion by merging weights of language models

Dataless knowledge fusion by merging weights of language models , author= · 2022 · arXiv 2212.09849

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Model Merging as Probabilistic Inference in Fine-Tuning Parameter Space

cs.LG · 2026-07-02 · unverdicted · novelty 7.0

Model merging is cast as PoE inference with EBM experts, revealing Gaussian assumptions in prior work and proposing convergent Cauchy experts that improve empirical performance.

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

MergeProbe forecasts LoRA adapter mergeability from first-few-percent training signals and outperforms interference-aware baselines on retention while adding low overhead on a five-domain benchmark.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Generalizing the Geometry of Model Merging Through Frechet Averages

cs.LG · 2026-04-29 · unverdicted · novelty 7.0 · 2 refs

Model merging is generalized as Fréchet averaging on symmetry-invariant manifolds, containing Fisher merging as a special case and offering a new approach for LoRA adapters.

When Model Merging Breaks Routing: Training-Free Calibration for MoE

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Merging breaks MoE routing via softmax sensitivity; HARC uses Hessian curvature for closed-form router calibration that improves merged model performance without retraining.

Dynamic Model Merging Made Slim

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.

Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training

cs.RO · 2026-04-25 · unverdicted · novelty 6.0

DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

ADR achieves theoretically zero-forgetting class-incremental graph learning by combining backpropagation adaptation with ridge-regression-based layer-wise merging of GNN linear transformations.

ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation

cs.CL · 2026-03-03 · unverdicted · novelty 6.0

ACE-Merging estimates task input covariances from parameter differences to enable closed-form data-free merging that reduces interference and outperforms prior baselines on vision and language tasks.

Model Merging Scaling Laws in Large Language Models

cs.AI · 2025-09-29 · unverdicted · novelty 6.0

Empirical scaling laws for LLM merging show a size-dependent floor and 1/k-like tail in cross-entropy loss that holds across architectures and merging methods.

ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services

cs.LG · 2026-05-23 · unverdicted · novelty 5.0

ReLoRA reduces time-to-readiness for LoRA adapters on updated LLMs by up to 8.9x through adaptive Bayesian initialization and scheduled regularization while improving accuracy by up to 4.6%.

Differentially Private Model Merging

cs.LG · 2026-04-22 · unverdicted · novelty 5.0

Post-processing via random selection or linear combination of differentially private models allows meeting arbitrary target privacy parameters without additional training.

MAny: Merge Anything for Multimodal Continual Instruction Tuning

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

MAny addresses dual-forgetting in multimodal continual instruction tuning via CPM and LPM merging strategies, delivering up to 8.57% accuracy gains on UCIT benchmarks without additional training.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Model Merging Scaling Laws in Large Language Models cs.AI · 2025-09-29 · unverdicted · none · ref 11
Empirical scaling laws for LLM merging show a size-dependent floor and 1/k-like tail in cross-entropy loss that holds across architectures and merging methods.

Dataless knowledge fusion by merging weights of language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer