Dataless knowledge fusion by merging weights of language models

· 2022 · arXiv 2212.09849

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Generalizing the Geometry of Model Merging Through Frechet Averages

cs.LG · 2026-04-29 · unverdicted · novelty 7.0 · 2 refs

Model merging is generalized as Fréchet averaging on symmetry-invariant manifolds, containing Fisher merging as a special case and offering a new approach for LoRA adapters.

Dynamic Model Merging Made Slim

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.

Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training

cs.RO · 2026-04-25 · unverdicted · novelty 6.0

DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

ADR achieves theoretically zero-forgetting class-incremental graph learning by combining backpropagation adaptation with ridge-regression-based layer-wise merging of GNN linear transformations.

ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation

cs.CL · 2026-03-03 · unverdicted · novelty 6.0

ACE-Merging estimates task input covariances from parameter differences to enable closed-form data-free merging that reduces interference and outperforms prior baselines on vision and language tasks.

Model Merging Scaling Laws in Large Language Models

cs.AI · 2025-09-29 · unverdicted · novelty 6.0

Empirical scaling laws for LLM merging show a size-dependent floor and 1/k-like tail in cross-entropy loss that holds across architectures and merging methods.

ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services

cs.LG · 2026-05-23 · unverdicted · novelty 5.0

ReLoRA reduces time-to-readiness for LoRA adapters on updated LLMs by up to 8.9x through adaptive Bayesian initialization and scheduled regularization while improving accuracy by up to 4.6%.

Differentially Private Model Merging

cs.LG · 2026-04-22 · unverdicted · novelty 5.0

Post-processing via random selection or linear combination of differentially private models allows meeting arbitrary target privacy parameters without additional training.

MAny: Merge Anything for Multimodal Continual Instruction Tuning

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

MAny addresses dual-forgetting in multimodal continual instruction tuning via CPM and LPM merging strategies, delivering up to 8.57% accuracy gains on UCIT benchmarks without additional training.

citing papers explorer

Showing 10 of 10 citing papers after filters.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling cs.LG · 2026-05-14 · unverdicted · none · ref 236
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Generalizing the Geometry of Model Merging Through Frechet Averages cs.LG · 2026-04-29 · unverdicted · none · ref 4 · 2 links
Model merging is generalized as Fréchet averaging on symmetry-invariant manifolds, containing Fisher merging as a special case and offering a new approach for LoRA adapters.
Dynamic Model Merging Made Slim cs.LG · 2026-05-17 · unverdicted · none · ref 43
DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.
Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training cs.RO · 2026-04-25 · unverdicted · none · ref 21
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.
Analytic Drift Resister for Non-Exemplar Continual Graph Learning cs.LG · 2026-04-03 · unverdicted · none · ref 34
ADR achieves theoretically zero-forgetting class-incremental graph learning by combining backpropagation adaptation with ridge-regression-based layer-wise merging of GNN linear transformations.
ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation cs.CL · 2026-03-03 · unverdicted · none · ref 7
ACE-Merging estimates task input covariances from parameter differences to enable closed-form data-free merging that reduces interference and outperforms prior baselines on vision and language tasks.
Model Merging Scaling Laws in Large Language Models cs.AI · 2025-09-29 · unverdicted · none · ref 11
Empirical scaling laws for LLM merging show a size-dependent floor and 1/k-like tail in cross-entropy loss that holds across architectures and merging methods.
ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services cs.LG · 2026-05-23 · unverdicted · none · ref 42
ReLoRA reduces time-to-readiness for LoRA adapters on updated LLMs by up to 8.9x through adaptive Bayesian initialization and scheduled regularization while improving accuracy by up to 4.6%.
Differentially Private Model Merging cs.LG · 2026-04-22 · unverdicted · none · ref 8
Post-processing via random selection or linear combination of differentially private models allows meeting arbitrary target privacy parameters without additional training.
MAny: Merge Anything for Multimodal Continual Instruction Tuning cs.LG · 2026-04-15 · unverdicted · none · ref 5
MAny addresses dual-forgetting in multimodal continual instruction tuning via CPM and LPM merging strategies, delivering up to 8.57% accuracy gains on UCIT benchmarks without additional training.

Dataless knowledge fusion by merging weights of language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer