RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging

Dang Huu-Tien; Le-Minh Nguyen; Takeshi Suzuki; The-Hai Nguyen

arxiv: 2508.03121 · v3 · submitted 2025-08-05 · 💻 cs.LG

RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging

The-Hai Nguyen , Dang Huu-Tien , Takeshi Suzuki , Le-Minh Nguyen This is my paper

Pith reviewed 2026-05-18 23:59 UTC · model grok-4.3

classification 💻 cs.LG

keywords model mergingregression meanlayer dependenciesmulti-model fusiongeneralizationdistribution shiftsclosed-form solution

0 comments

The pith

RegMean++ adds explicit intra-layer and cross-layer dependency terms to the per-layer regression objective used for model merging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RegMean++ as a direct extension of RegMean, which solves model merging by finding closed-form weights for each linear layer that minimize prediction differences between merged and original models. RegMean treats every layer in isolation. RegMean++ augments the objective with additional terms that capture how activations and information move inside a single layer and from one layer to the next. The resulting merged model is claimed to reproduce the combined behavior of the source models more faithfully. Experiments across in-domain, out-of-domain, sequential, large-scale, and distribution-shift settings show consistent gains over the original RegMean baseline and competitive results against other merging methods.

Core claim

RegMean++ extends RegMean's independent per-layer linear regression by inserting explicit intra-layer and cross-layer dependency terms into the objective; the augmented problem still admits a closed-form solution and produces merged weights whose overall predictions align more closely with those of the source models.

What carries the argument

The augmented regression objective that couples each layer's weight solution to both intra-layer feature correlations and cross-layer propagation effects.

If this is right

Merged models achieve higher accuracy on both in-domain and out-of-domain test sets.
Performance remains stable when models are merged sequentially rather than all at once.
The method scales to large tasks while preserving robustness under input distribution shifts.
Results stay competitive with more complex merging techniques that do not rely on closed-form regression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Layer-wise merging methods may systematically underperform when they ignore how early-layer changes affect later-layer inputs.
Similar dependency modeling could be inserted into other closed-form or optimization-based merging procedures.
The approach highlights a possible route to diagnose which inter-layer interactions matter most for successful model combination.

Load-bearing premise

That adding the dependency terms improves the merged model's overall behavior without creating new optimization difficulties or harming generalization.

What would settle it

An experiment in which RegMean++ yields equal or lower accuracy than plain RegMean on a standard set of merging tasks with held-out validation data.

read the original abstract

Regression Mean (RegMean), an approach that formulates model merging as a linear regression problem, aims to find the optimal weights for each linear layer in the merged model by minimizing the discrepancy in predictions between the merged and candidate models. RegMean provides a precise closed-form solution for the merging problem; therefore, it offers explainability and computational efficiency. However, RegMean merges each linear layer independently, overlooking how the features and information in earlier layers propagate through deeper layers and influence the final predictions of the merged model. Here, we introduce RegMean++, a simple yet effective alternative to RegMean, that explicitly incorporates both intra-layer and cross-layer dependencies between merged models' layers into RegMean's objective. By accounting for these dependencies, RegMean++ better captures the behaviors of the merged model. Extensive experiments demonstrate that RegMean++ consistently outperforms RegMean across diverse settings, including in-domain (ID) and out-of-domain (OOD) generalization, sequential merging, large-scale tasks, and robustness under several types of distribution shifts. Furthermore, RegMean++ achieves competitive performance across diverse settings compared to various advanced model merging methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RegMean++ adds cross-layer dependency terms to the RegMean objective but the abstract supplies no numbers or solver details, leaving the practical payoff unclear.

read the letter

RegMean++ extends RegMean by folding intra-layer and cross-layer dependency terms into the regression objective instead of merging layers independently. This directly targets the propagation issue the original method ignores. The change keeps the overall regression framing while trying to capture more of how features move through the network, which is a reasonable structural adjustment. The experiments mentioned span ID and OOD generalization, sequential merging, large-scale tasks, and robustness under distribution shifts, which are the right settings to check whether the addition helps in practice. The paper does a clean job of stating the limitation in the prior work and proposing a fix that stays within the same explainable, closed-form spirit. The citation pattern is straightforward and builds on the cited RegMean result without obvious self-reinforcement loops. The soft spots sit mainly in verification. The abstract claims consistent outperformance yet gives no quantitative results, error bars, or the actual augmented loss equation. Without those, it is difficult to judge whether coupling layers preserves a stable closed-form solution or introduces iterative optimization whose convergence and sensitivity need checking. The stress-test concern about possible loss of uniqueness or extra degrees of freedom driving the gains is worth a direct answer in the full text. If the joint problem remains tractable and the improvements survive ablation on the new terms, the method is useful; if not, the gains may be artifactual. This work is aimed at researchers already using regression-style merging who want a lightweight way to account for layer interactions. A reader familiar with RegMean would get immediate value from testing the idea. I would send it for peer review because the claim is testable with standard merging benchmarks and the subfield can use incremental, well-documented improvements when the math and results are shown clearly.

Referee Report

2 major / 2 minor

Summary. The paper introduces RegMean++, which augments the per-layer linear regression objective of RegMean with explicit intra-layer and cross-layer dependency terms to account for feature propagation in merged models. It claims this structural change yields merged models that better match the candidate models' overall behavior and reports consistent empirical outperformance over RegMean across in-domain/out-of-domain generalization, sequential merging, large-scale tasks, and robustness to distribution shifts, while remaining competitive with other merging methods.

Significance. If the gains are reproducible and attributable to the dependency modeling rather than extra degrees of freedom, the work would meaningfully advance model merging by relaxing the independent-layer assumption that limits RegMean. The emphasis on generalization and robustness under shifts is relevant for practical use of merged models.

major comments (2)

[§3] §3 (Method): The augmented objective couples parameters across layers via the new dependency terms, yet the manuscript provides neither a closed-form solution nor a convexity/uniqueness analysis for the resulting problem. This is load-bearing for the central claim that RegMean++ 'better captures the behaviors of the merged model' without introducing new optimization difficulties.
[§5] §5 (Experiments): The reported outperformance is presented without error bars, statistical significance tests, or ablations isolating the contribution of the cross-layer terms versus implicit regularization. This weakens the ability to confirm that observed gains stem from correctly modeling feature propagation rather than other factors.

minor comments (2)

[§3] Notation for the intra- and cross-layer terms should be introduced more explicitly with a small illustrative diagram to aid readability.
[Abstract] The abstract states 'extensive experiments' but the main text should include a table summarizing all compared methods and settings for quick reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of the method's theoretical grounding and the need for stronger empirical validation. We address each major comment below and will revise the manuscript accordingly to improve rigor and clarity.

read point-by-point responses

Referee: [§3] §3 (Method): The augmented objective couples parameters across layers via the new dependency terms, yet the manuscript provides neither a closed-form solution nor a convexity/uniqueness analysis for the resulting problem. This is load-bearing for the central claim that RegMean++ 'better captures the behaviors of the merged model' without introducing new optimization difficulties.

Authors: We agree that explicitly addressing the optimization properties of the coupled objective is important for supporting our central claim. The augmented objective is a convex quadratic form (as the intra- and cross-layer terms are constructed from linear feature maps and positive semi-definite Gram matrices), and the unique minimizer can be recovered in closed form by solving the resulting block-structured linear system. In the revision we will add a dedicated subsection deriving this closed-form solution, proving convexity, and stating the uniqueness condition under the same full-rank assumptions used in the original RegMean analysis. This shows that no fundamentally new optimization difficulties are introduced. revision: yes
Referee: [§5] §5 (Experiments): The reported outperformance is presented without error bars, statistical significance tests, or ablations isolating the contribution of the cross-layer terms versus implicit regularization. This weakens the ability to confirm that observed gains stem from correctly modeling feature propagation rather than other factors.

Authors: We concur that the current experimental presentation would benefit from additional statistical controls and targeted ablations. In the revised manuscript we will (i) report mean and standard deviation across at least five independent runs with different random seeds for all main tables, (ii) include paired t-test p-values to establish statistical significance of the reported gains over RegMean, and (iii) add an ablation that systematically disables the cross-layer dependency terms while keeping the intra-layer terms and regularization strength fixed. These changes will help isolate the contribution of modeling feature propagation. revision: yes

Circularity Check

0 steps flagged

No circularity: RegMean++ adds explicit dependency terms as a structural change to the objective

full rationale

The paper's derivation starts from RegMean's per-layer linear regression formulation and augments it with intra-layer and cross-layer dependency terms to account for feature propagation. This modification is presented as a direct modeling choice rather than a fit to data or a self-referential definition. No equations, predictions, or results in the abstract reduce by construction to the original RegMean inputs; the claimed improvements in generalization and robustness are positioned as consequences of the new objective structure. The derivation chain is self-contained against external benchmarks and does not rely on self-citation loops or uniqueness theorems imported from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method appears to rest on the standard linear regression formulation plus the new dependency modeling choice.

pith-pipeline@v0.9.0 · 5737 in / 1068 out tokens · 49814 ms · 2026-05-18T23:59:46.723522+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RegMean++ computes X(l,j)i based on the activations produced by the previous merge layer f(l−1)M … while RegMean relies on … f(l−1)i
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

closed-form solution … W(l)M = [∑ bG(l)i ]⁻¹ ∑ bG(l)i W(l)i

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FeatCal: Feature Calibration for Post-Merging Models
cs.LG 2026-05 conditional novelty 7.0

FeatCal reduces feature drift in merged models via layer-wise closed-form calibration on a small dataset, outperforming prior post-merging methods on CLIP and GLUE benchmarks with high sample efficiency.
ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation
cs.CL 2026-03 unverdicted novelty 6.0

ACE-Merging estimates task input covariances from parameter differences to enable closed-form data-free merging that reduces interference and outperforms prior baselines on vision and language tasks.