ResMerge: Residual-based Spectral Merging of Large Language Models

Haiyun Guo; Haokai Ma; Hongyan An; Jinqiao Wang; Junfeng Fang; Weizhen Wang; Yandu Sun; Yuheng Jia; Zhiyan Hou

arxiv: 2606.02252 · v1 · pith:WOVNTNBAnew · submitted 2026-06-01 · 💻 cs.CL

ResMerge: Residual-based Spectral Merging of Large Language Models

Yandu Sun , Zhiyan Hou , Haokai Ma , Yuheng Jia , Junfeng Fang , Haiyun Guo , Hongyan An , weizhen wang

show 1 more author

Jinqiao Wang

This is my paper

Pith reviewed 2026-06-28 15:01 UTC · model grok-4.3

classification 💻 cs.CL

keywords model mergingspectral mergingreinforcement learningtask vectorslarge language modelsexpert fusionresidual components

0 comments

The pith

For RL task vectors, residual components provide a stable merging basis while leading heads require careful gated reintroduction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard assumptions in spectral merging fail for reinforcement learning experts because their task vectors have concentrated but conflicting leading singular directions and more stable dispersed residuals. ResMerge addresses this by first creating a reliability-weighted residual consensus on the Frobenius sphere and then applying lightweight corrections from the heads only where experts show positive agreement. Experiments demonstrate that this results in merged models that retain more of the original expert capabilities compared to task-vector arithmetic and other spectral methods.

Core claim

Decomposing RL task vectors via SVD reveals that both the leading spectral head and the residual independently recover substantial behavior knowledge, yet the head tends toward sharp cross-expert conflicts while the residual offers a dispersed and stable aggregation basis; ResMerge therefore constructs a stable residual backbone via Spherical Residual Consensus Adaptation and reintroduces leading-head information through a Lightweight Head Correction module gated by positive cross-expert agreement.

What carries the argument

Spherical Residual Consensus Adaptation for building the residual backbone and Lightweight Head Correction for gated head reintroduction.

If this is right

Merged models better preserve capabilities of individual RL experts.
Reduced impact of cross-expert conflicts in leading spectral directions.
Improved performance across multiple capability domains and expert groups.
Task vectors can be merged without assuming leading directions contain the main signal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar spectral properties might appear in other fine-tuning regimes, suggesting broader use of residual merging.
The method could be combined with other merging strategies to handle larger sets of experts.
Testing on models with different sizes or training objectives would check the generality of the head-residual distinction.

Load-bearing premise

The residual components of RL task vectors remain more stable for aggregation across different experts than their leading singular heads.

What would settle it

A test showing that direct merging of leading heads without residuals or corrections achieves higher capability preservation than ResMerge on the same RL expert sets.

Figures

Figures reproduced from arXiv: 2606.02252 by Haiyun Guo, Haokai Ma, Hongyan An, Jinqiao Wang, Junfeng Fang, Weizhen Wang, Yandu Sun, Yuheng Jia, Zhiyan Hou.

**Figure 1.** Figure 1: Component-level recovery under RL and SFT post-training. After applying singular value decomposition (SVD) to each task vector, Head-only retains the rank-1 head formed by the top singular direction, while Residual-only retains the remaining spectral residual after removing this head. Both components are more recoverable in RL task vectors than in SFT task vectors, supporting component-wise treatment of… view at source ↗

**Figure 2.** Figure 2: Overview of the ResMerge. Given RL expert task vectors, we decompose each update into a rank-1 spectral head and a residual component. The residual components are merged into a stable SRC-A backbone, while the rank-1 heads are added back only as reliability-gated lightweight corrections. 4.1 Layer-wise Spectral Decomposition For each mergeable layer, we apply SVD to the task matrix ∆i using the notation in… view at source ↗

**Figure 3.** Figure 3: Effect of head rank k and residual-relative head budget ρ on the Qwen2.5-7B-Base expert group. 5.4 Spectral Structure Analysis We compare the singular-value structure of task matrices from RL and SFT post-training. For each mergeable matrix-shaped layer, we compute the SVD and systematically analyze leading-energy concentration, layer-wise spectral concentration, and the resulting change of stable/effectiv… view at source ↗

**Figure 4.** Figure 4: Spectral comparison between RL and SFT task vectors. RL task vectors show stronger concentration in leading singular directions across groups and layers. Removing the leading rank-1 component increases stable and effective ranks, revealing a more dispersed residual component, especially in RL experts. cR SH 0.0 0.2 0.4 0.6 0.8 Metric value (a) Consensus vs. agreement RetR RetH 0.6 0.7 0.8 0.9 Retention rat… view at source ↗

**Figure 5.** Figure 5: Geometric consistency analysis of spectral components. We compare residual spherical coherence, head directional agreement, and mean retention to show that residual components provide a more reliable aggregation backbone while rank-1 heads require reliability-gated lightweight correction. 5.5 Geometric Consistency Analysis of Spectral Components The spectral analysis above reveals structural differences… view at source ↗

**Figure 6.** Figure 6: Additional component-level recovery results [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Pairwise similarity of rank-1 heads and residual components across experts. Rank-1 heads show higher [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

Model merging offers a training-free way to combine multiple post-trained expert models, but merging experts obtained through reinforcement learning (RL) remains challenging. Existing spectral merging methods often assume that leading singular directions contain the main task signal, while lower-energy residual components can be compressed, selected, or attenuated to reduce interference. We find that this assumption does not hold for RL task vectors: after decomposing each task vector into a leading spectral head and a residual component, both parts can independently recover substantial behavior knowledge, while exhibiting different merging properties. The head is highly concentrated and informative but more prone to sharp cross-expert conflicts, whereas the residual component is more dispersed and provides a more stable basis for aggregation. Based on this observation, we propose ResMerge, a residual-based spectral merging framework for RL experts. ResMerge first constructs a stable residual backbone with Spherical Residual Consensus Adaptation, which estimates a reliability-weighted consensus direction on the Frobenius sphere. It then reintroduces leading-head information through a Lightweight Head Correction module gated by positive cross-expert agreement. Experiments across multiple RL expert groups and capability domains show that ResMerge better preserves expert capabilities than representative task-vector and spectral merging baselines. The implementation of ResMerge is publicly available at https://github.com/sunyd0303-cpu/ResMerge-release.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ResMerge flips the usual spectral merge assumption for RL experts by building on the residual instead of the leading head, but the whole thing rests on an observation whose details are not shown here.

read the letter

The main takeaway is that this paper offers a concrete alternative for merging RL fine-tuned models: decompose each task vector via SVD, treat the residual as the stable backbone via Spherical Residual Consensus Adaptation on the Frobenius sphere, then add back head information only through a gated Lightweight Head Correction when cross-expert agreement is positive. That split and the two modules are the actual new pieces relative to earlier task-vector and spectral work.

It does a reasonable job of spelling out why RL vectors might need different handling than the supervised cases that prior methods targeted. The public code link is useful for anyone who wants to check the implementation directly.

The soft spot is exactly the one the stress-test flags. The design is justified by the claim that the leading head recovers behavior but conflicts sharply while the residual is dispersed and stable, yet the abstract gives no plots, recovery percentages, or cross-expert conflict metrics to show how strong or general that pattern is. Without those numbers or the ablation controls, it is difficult to tell whether the reported gains come from the residual-first logic or from other choices in the pipeline. The experiments are described only at the level of “better preserves capabilities,” with no effect sizes or dataset details visible.

This is for people already working on training-free merging of specialized LLMs. A reader who needs a new baseline or wants to test residual-based ideas on their own RL experts could get value from trying the modules. It is coherent enough on its own terms to deserve a serious referee, mainly so the core observation and the quantitative results can be checked properly.

Recommendation: send it to review but require the SVD analysis and full ablations in the first round.

Referee Report

2 major / 2 minor

Summary. The paper claims that standard spectral merging assumptions fail for RL task vectors in LLMs: after SVD decomposition, the leading spectral head recovers substantial behavior but is prone to sharp cross-expert conflicts, while the residual component is more dispersed and stable for aggregation. Based on this, ResMerge constructs a residual backbone via Spherical Residual Consensus Adaptation (reliability-weighted consensus on the Frobenius sphere) and reintroduces head information via a Lightweight Head Correction module gated by positive cross-expert agreement. Experiments across multiple RL expert groups and capability domains reportedly show better preservation of expert capabilities than task-vector and spectral baselines, with code released publicly.

Significance. If the SVD-based observation on head/residual properties holds and generalizes, the work would challenge existing spectral merging practices and supply a practical, training-free method for combining RL experts. The public GitHub release is a clear strength supporting reproducibility.

major comments (2)

[Observation / motivation section] The design of Spherical Residual Consensus Adaptation and Lightweight Head Correction is motivated directly by the claim that SVD of RL task vectors yields a conflict-prone leading head and a stable residual (both independently recovering substantial behavior). This premise is load-bearing; the manuscript must supply quantitative support such as per-component recovery accuracies, cross-expert conflict metrics, or dispersion statistics to justify why prior spectral assumptions are falsified here.
[Experiments section] The superiority claim ('Experiments across multiple RL expert groups and capability domains show that ResMerge better preserves expert capabilities than representative task-vector and spectral merging baselines') is central yet unsupported by any reported numbers, model sizes, dataset descriptions, ablation results, or error bars in the available text. Without these, attribution of gains specifically to the residual-based approach versus implementation details cannot be assessed.

minor comments (2)

[Abstract] Define all acronyms at first use (RL, SVD, etc.) and ensure consistent notation for 'task vector' versus 'RL task vector'.
[Abstract] The GitHub link is a positive; confirm it contains the exact code and seeds used for the reported experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for stronger quantitative grounding in both the motivation and experimental sections. We will revise the manuscript to incorporate the requested analyses and details while preserving the core contributions.

read point-by-point responses

Referee: [Observation / motivation section] The design of Spherical Residual Consensus Adaptation and Lightweight Head Correction is motivated directly by the claim that SVD of RL task vectors yields a conflict-prone leading head and a stable residual (both independently recovering substantial behavior). This premise is load-bearing; the manuscript must supply quantitative support such as per-component recovery accuracies, cross-expert conflict metrics, or dispersion statistics to justify why prior spectral assumptions are falsified here.

Authors: We agree that the SVD-based observation requires explicit quantitative backing to justify departing from prior spectral merging assumptions. In the revision we will add a dedicated analysis subsection reporting: (i) per-component recovery accuracies on held-out evaluation tasks for the leading head versus residual separately, (ii) cross-expert conflict metrics (e.g., pairwise prediction disagreement rates), and (iii) dispersion statistics (singular-value spread and Frobenius-norm variance across experts). These additions will directly support the design choices of Spherical Residual Consensus Adaptation and the gated head correction. revision: yes
Referee: [Experiments section] The superiority claim ('Experiments across multiple RL expert groups and capability domains show that ResMerge better preserves expert capabilities than representative task-vector and spectral merging baselines') is central yet unsupported by any reported numbers, model sizes, dataset descriptions, ablation results, or error bars in the available text. Without these, attribution of gains specifically to the residual-based approach versus implementation details cannot be assessed.

Authors: We acknowledge that the submitted manuscript text does not contain the numerical results, model specifications, or ablations needed to substantiate the superiority claim. The revision will expand the experiments section with: full tables of performance metrics across all RL expert groups and domains, model sizes and architectures, dataset descriptions, ablation studies isolating each ResMerge component, and error bars from repeated runs. This will enable readers to evaluate the contribution of the residual backbone versus implementation choices. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical observation on SVD properties of RL task vectors (leading head vs. residual) as a premise that motivates the definition of new algorithmic modules (Spherical Residual Consensus Adaptation and Lightweight Head Correction). No equations, predictions, or first-principles results reduce to their own inputs by construction; the method is defined by independent procedural steps rather than fitted parameters or self-referential renaming. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked as load-bearing justification in the abstract or described structure. The central claims rest on experimental comparisons rather than tautological constructions, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on standard linear-algebra decomposition plus two newly introduced algorithmic modules whose parameters and weighting rules are not detailed in the abstract.

axioms (1)

standard math Task vectors admit an SVD decomposition separating a leading spectral head from a residual component
Invoked to motivate the head/residual split and their differing merging properties.

invented entities (2)

Spherical Residual Consensus Adaptation no independent evidence
purpose: Estimates a reliability-weighted consensus direction on the Frobenius sphere to form the residual backbone
New module introduced to stabilize aggregation of dispersed residuals.
Lightweight Head Correction no independent evidence
purpose: Reintroduces leading-head information gated by positive cross-expert agreement
New module introduced to selectively restore concentrated head information.

pith-pipeline@v0.9.1-grok · 5785 in / 1285 out tokens · 33479 ms · 2026-06-28T15:01:06.820480+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 2 canonical work pages · 1 internal anchor

[1]

arXiv preprint arXiv:2403.07974 , doi =

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , author =. arXiv preprint arXiv:2403.07974 , doi =

Pith/arXiv arXiv
[2]

arXiv preprint arXiv:2107.03374 , doi =

Evaluating Large Language Models Trained on Code , author =. arXiv preprint arXiv:2107.03374 , doi =

Pith/arXiv arXiv
[3]

arXiv preprint arXiv:2108.07732 , doi =

Program Synthesis with Large Language Models , author =. arXiv preprint arXiv:2108.07732 , doi =

Pith/arXiv arXiv
[4]

Advances in Neural Information Processing Systems , doi =

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation , author =. Advances in Neural Information Processing Systems , doi =
[5]

arXiv preprint arXiv:2311.12022 , doi =

GPQA: A Graduate-Level Google-Proof Q&A Benchmark , author =. arXiv preprint arXiv:2311.12022 , doi =

Pith/arXiv arXiv
[6]

Advances in Neural Information Processing Systems Datasets and Benchmarks Track , doi =

Measuring Mathematical Problem Solving With the MATH Dataset , author =. Advances in Neural Information Processing Systems Datasets and Benchmarks Track , doi =
[7]

arXiv preprint arXiv:2305.20050 , doi =

Let's Verify Step by Step , author =. arXiv preprint arXiv:2305.20050 , doi =

Pith/arXiv arXiv
[8]

Proceedings of the 42nd International Conference on Machine Learning , pages =

The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , editor =

2025
[9]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , author =. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =

2018
[10]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages =

SQuAD: 100,000+ Questions for Machine Comprehension of Text , author =. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages =

2016
[11]

Proceedings of the 11th International Conference on Learning Representations , doi =

Editing Models with Task Arithmetic , author =. Proceedings of the 11th International Conference on Learning Representations , doi =
[12]

Advances in Neural Information Processing Systems , doi =

TIES-Merging: Resolving Interference When Merging Models , author =. Advances in Neural Information Processing Systems , doi =
[13]

arXiv preprint arXiv:2311.03099 , doi =

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch , author =. arXiv preprint arXiv:2311.03099 , doi =

arXiv
[14]

arXiv preprint arXiv:2412.00081 , doi =

Task Singular Vectors: Reducing Task Interference in Model Merging , author =. arXiv preprint arXiv:2412.00081 , doi =

arXiv
[15]

arXiv preprint arXiv:2502.04959 , doi =

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces , author =. arXiv preprint arXiv:2502.04959 , doi =

arXiv
[16]

arXiv preprint arXiv:2601.13572 , doi =

Behavior Knowledge Merge in Reinforced Agentic Models , author =. arXiv preprint arXiv:2601.13572 , doi =

arXiv
[17]

Advances in Neural Information Processing Systems , doi =

Training Language Models to Follow Instructions with Human Feedback , author =. Advances in Neural Information Processing Systems , doi =
[18]

Nature , volume =

DeepSeek-R1 Incentivizes Reasoning in LLMs through Reinforcement Learning , author =. Nature , volume =. 2025 , doi =

2025
[19]

Proceedings of the 39th International Conference on Machine Learning , series =

Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy Without Increasing Inference Time , author =. Proceedings of the 39th International Conference on Machine Learning , series =
[20]

STAR: Spectral Truncation and Rescale for Model Merging , author =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) , pages =. doi:10.18653/v1/2025.naacl-short.42 , url =

work page doi:10.18653/v1/2025.naacl-short.42 2025
[21]

arXiv preprint arXiv:2503.22178 , doi =

AdaRank: Adaptive Rank Pruning for Enhanced Model Merging , author =. arXiv preprint arXiv:2503.22178 , doi =

arXiv
[22]

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging , author =. arXiv preprint arXiv:2602.05536 , note =. doi:10.48550/arXiv.2602.05536 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.05536
[23]

Advances in Neural Information Processing Systems , doi =

Rewarded Soups: Towards Pareto-Optimal Alignment by Interpolating Weights Fine-Tuned on Diverse Rewards , author =. Advances in Neural Information Processing Systems , doi =
[24]

2024 , eprint =

WARP: On the Benefits of Weight Averaged Rewarded Policies , author =. 2024 , eprint =

2024
[25]

arXiv preprint arXiv:2411.01798 , doi =

SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF , author =. arXiv preprint arXiv:2411.01798 , doi =

arXiv
[26]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

Mitigating the Alignment Tax of RLHF , author =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

2024
[27]

arXiv preprint arXiv:2412.15115 , doi =

Qwen2.5 Technical Report , author =. arXiv preprint arXiv:2412.15115 , doi =

Pith/arXiv arXiv
[28]

arXiv preprint arXiv:2503.18892 , doi =

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild , author =. arXiv preprint arXiv:2503.18892 , doi =

Pith/arXiv arXiv
[29]

arXiv preprint arXiv:2503.24290 , doi =

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model , author =. arXiv preprint arXiv:2503.24290 , doi =

Pith/arXiv arXiv
[30]

arXiv preprint arXiv:2505.14652 , doi =

General-Reasoner: Advancing LLM Reasoning Across All Domains , author =. arXiv preprint arXiv:2505.14652 , doi =

arXiv

[1] [1]

arXiv preprint arXiv:2403.07974 , doi =

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code , author =. arXiv preprint arXiv:2403.07974 , doi =

Pith/arXiv arXiv

[2] [2]

arXiv preprint arXiv:2107.03374 , doi =

Evaluating Large Language Models Trained on Code , author =. arXiv preprint arXiv:2107.03374 , doi =

Pith/arXiv arXiv

[3] [3]

arXiv preprint arXiv:2108.07732 , doi =

Program Synthesis with Large Language Models , author =. arXiv preprint arXiv:2108.07732 , doi =

Pith/arXiv arXiv

[4] [4]

Advances in Neural Information Processing Systems , doi =

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation , author =. Advances in Neural Information Processing Systems , doi =

[5] [5]

arXiv preprint arXiv:2311.12022 , doi =

GPQA: A Graduate-Level Google-Proof Q&A Benchmark , author =. arXiv preprint arXiv:2311.12022 , doi =

Pith/arXiv arXiv

[6] [6]

Advances in Neural Information Processing Systems Datasets and Benchmarks Track , doi =

Measuring Mathematical Problem Solving With the MATH Dataset , author =. Advances in Neural Information Processing Systems Datasets and Benchmarks Track , doi =

[7] [7]

arXiv preprint arXiv:2305.20050 , doi =

Let's Verify Step by Step , author =. arXiv preprint arXiv:2305.20050 , doi =

Pith/arXiv arXiv

[8] [8]

Proceedings of the 42nd International Conference on Machine Learning , pages =

The Berkeley Function Calling Leaderboard (BFCL): From Tool Use to Agentic Evaluation of Large Language Models , author =. Proceedings of the 42nd International Conference on Machine Learning , pages =. 2025 , editor =

2025

[9] [9]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , author =. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =

2018

[10] [10]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages =

SQuAD: 100,000+ Questions for Machine Comprehension of Text , author =. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages =

2016

[11] [11]

Proceedings of the 11th International Conference on Learning Representations , doi =

Editing Models with Task Arithmetic , author =. Proceedings of the 11th International Conference on Learning Representations , doi =

[12] [12]

Advances in Neural Information Processing Systems , doi =

TIES-Merging: Resolving Interference When Merging Models , author =. Advances in Neural Information Processing Systems , doi =

[13] [13]

arXiv preprint arXiv:2311.03099 , doi =

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch , author =. arXiv preprint arXiv:2311.03099 , doi =

arXiv

[14] [14]

arXiv preprint arXiv:2412.00081 , doi =

Task Singular Vectors: Reducing Task Interference in Model Merging , author =. arXiv preprint arXiv:2412.00081 , doi =

arXiv

[15] [15]

arXiv preprint arXiv:2502.04959 , doi =

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces , author =. arXiv preprint arXiv:2502.04959 , doi =

arXiv

[16] [16]

arXiv preprint arXiv:2601.13572 , doi =

Behavior Knowledge Merge in Reinforced Agentic Models , author =. arXiv preprint arXiv:2601.13572 , doi =

arXiv

[17] [17]

Advances in Neural Information Processing Systems , doi =

Training Language Models to Follow Instructions with Human Feedback , author =. Advances in Neural Information Processing Systems , doi =

[18] [18]

Nature , volume =

DeepSeek-R1 Incentivizes Reasoning in LLMs through Reinforcement Learning , author =. Nature , volume =. 2025 , doi =

2025

[19] [19]

Proceedings of the 39th International Conference on Machine Learning , series =

Model Soups: Averaging Weights of Multiple Fine-Tuned Models Improves Accuracy Without Increasing Inference Time , author =. Proceedings of the 39th International Conference on Machine Learning , series =

[20] [20]

STAR: Spectral Truncation and Rescale for Model Merging , author =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) , pages =. doi:10.18653/v1/2025.naacl-short.42 , url =

work page doi:10.18653/v1/2025.naacl-short.42 2025

[21] [21]

arXiv preprint arXiv:2503.22178 , doi =

AdaRank: Adaptive Rank Pruning for Enhanced Model Merging , author =. arXiv preprint arXiv:2503.22178 , doi =

arXiv

[22] [22]

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging , author =. arXiv preprint arXiv:2602.05536 , note =. doi:10.48550/arXiv.2602.05536 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.05536

[23] [23]

Advances in Neural Information Processing Systems , doi =

Rewarded Soups: Towards Pareto-Optimal Alignment by Interpolating Weights Fine-Tuned on Diverse Rewards , author =. Advances in Neural Information Processing Systems , doi =

[24] [24]

2024 , eprint =

WARP: On the Benefits of Weight Averaged Rewarded Policies , author =. 2024 , eprint =

2024

[25] [25]

arXiv preprint arXiv:2411.01798 , doi =

SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF , author =. arXiv preprint arXiv:2411.01798 , doi =

arXiv

[26] [26]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

Mitigating the Alignment Tax of RLHF , author =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

2024

[27] [27]

arXiv preprint arXiv:2412.15115 , doi =

Qwen2.5 Technical Report , author =. arXiv preprint arXiv:2412.15115 , doi =

Pith/arXiv arXiv

[28] [28]

arXiv preprint arXiv:2503.18892 , doi =

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild , author =. arXiv preprint arXiv:2503.18892 , doi =

Pith/arXiv arXiv

[29] [29]

arXiv preprint arXiv:2503.24290 , doi =

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model , author =. arXiv preprint arXiv:2503.24290 , doi =

Pith/arXiv arXiv

[30] [30]

arXiv preprint arXiv:2505.14652 , doi =

General-Reasoner: Advancing LLM Reasoning Across All Domains , author =. arXiv preprint arXiv:2505.14652 , doi =

arXiv