RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging
Pith reviewed 2026-05-18 23:59 UTC · model grok-4.3
The pith
RegMean++ adds explicit intra-layer and cross-layer dependency terms to the per-layer regression objective used for model merging.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RegMean++ extends RegMean's independent per-layer linear regression by inserting explicit intra-layer and cross-layer dependency terms into the objective; the augmented problem still admits a closed-form solution and produces merged weights whose overall predictions align more closely with those of the source models.
What carries the argument
The augmented regression objective that couples each layer's weight solution to both intra-layer feature correlations and cross-layer propagation effects.
If this is right
- Merged models achieve higher accuracy on both in-domain and out-of-domain test sets.
- Performance remains stable when models are merged sequentially rather than all at once.
- The method scales to large tasks while preserving robustness under input distribution shifts.
- Results stay competitive with more complex merging techniques that do not rely on closed-form regression.
Where Pith is reading between the lines
- Layer-wise merging methods may systematically underperform when they ignore how early-layer changes affect later-layer inputs.
- Similar dependency modeling could be inserted into other closed-form or optimization-based merging procedures.
- The approach highlights a possible route to diagnose which inter-layer interactions matter most for successful model combination.
Load-bearing premise
That adding the dependency terms improves the merged model's overall behavior without creating new optimization difficulties or harming generalization.
What would settle it
An experiment in which RegMean++ yields equal or lower accuracy than plain RegMean on a standard set of merging tasks with held-out validation data.
read the original abstract
Regression Mean (RegMean), an approach that formulates model merging as a linear regression problem, aims to find the optimal weights for each linear layer in the merged model by minimizing the discrepancy in predictions between the merged and candidate models. RegMean provides a precise closed-form solution for the merging problem; therefore, it offers explainability and computational efficiency. However, RegMean merges each linear layer independently, overlooking how the features and information in earlier layers propagate through deeper layers and influence the final predictions of the merged model. Here, we introduce RegMean++, a simple yet effective alternative to RegMean, that explicitly incorporates both intra-layer and cross-layer dependencies between merged models' layers into RegMean's objective. By accounting for these dependencies, RegMean++ better captures the behaviors of the merged model. Extensive experiments demonstrate that RegMean++ consistently outperforms RegMean across diverse settings, including in-domain (ID) and out-of-domain (OOD) generalization, sequential merging, large-scale tasks, and robustness under several types of distribution shifts. Furthermore, RegMean++ achieves competitive performance across diverse settings compared to various advanced model merging methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces RegMean++, which augments the per-layer linear regression objective of RegMean with explicit intra-layer and cross-layer dependency terms to account for feature propagation in merged models. It claims this structural change yields merged models that better match the candidate models' overall behavior and reports consistent empirical outperformance over RegMean across in-domain/out-of-domain generalization, sequential merging, large-scale tasks, and robustness to distribution shifts, while remaining competitive with other merging methods.
Significance. If the gains are reproducible and attributable to the dependency modeling rather than extra degrees of freedom, the work would meaningfully advance model merging by relaxing the independent-layer assumption that limits RegMean. The emphasis on generalization and robustness under shifts is relevant for practical use of merged models.
major comments (2)
- [§3] §3 (Method): The augmented objective couples parameters across layers via the new dependency terms, yet the manuscript provides neither a closed-form solution nor a convexity/uniqueness analysis for the resulting problem. This is load-bearing for the central claim that RegMean++ 'better captures the behaviors of the merged model' without introducing new optimization difficulties.
- [§5] §5 (Experiments): The reported outperformance is presented without error bars, statistical significance tests, or ablations isolating the contribution of the cross-layer terms versus implicit regularization. This weakens the ability to confirm that observed gains stem from correctly modeling feature propagation rather than other factors.
minor comments (2)
- [§3] Notation for the intra- and cross-layer terms should be introduced more explicitly with a small illustrative diagram to aid readability.
- [Abstract] The abstract states 'extensive experiments' but the main text should include a table summarizing all compared methods and settings for quick reference.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of the method's theoretical grounding and the need for stronger empirical validation. We address each major comment below and will revise the manuscript accordingly to improve rigor and clarity.
read point-by-point responses
-
Referee: [§3] §3 (Method): The augmented objective couples parameters across layers via the new dependency terms, yet the manuscript provides neither a closed-form solution nor a convexity/uniqueness analysis for the resulting problem. This is load-bearing for the central claim that RegMean++ 'better captures the behaviors of the merged model' without introducing new optimization difficulties.
Authors: We agree that explicitly addressing the optimization properties of the coupled objective is important for supporting our central claim. The augmented objective is a convex quadratic form (as the intra- and cross-layer terms are constructed from linear feature maps and positive semi-definite Gram matrices), and the unique minimizer can be recovered in closed form by solving the resulting block-structured linear system. In the revision we will add a dedicated subsection deriving this closed-form solution, proving convexity, and stating the uniqueness condition under the same full-rank assumptions used in the original RegMean analysis. This shows that no fundamentally new optimization difficulties are introduced. revision: yes
-
Referee: [§5] §5 (Experiments): The reported outperformance is presented without error bars, statistical significance tests, or ablations isolating the contribution of the cross-layer terms versus implicit regularization. This weakens the ability to confirm that observed gains stem from correctly modeling feature propagation rather than other factors.
Authors: We concur that the current experimental presentation would benefit from additional statistical controls and targeted ablations. In the revised manuscript we will (i) report mean and standard deviation across at least five independent runs with different random seeds for all main tables, (ii) include paired t-test p-values to establish statistical significance of the reported gains over RegMean, and (iii) add an ablation that systematically disables the cross-layer dependency terms while keeping the intra-layer terms and regularization strength fixed. These changes will help isolate the contribution of modeling feature propagation. revision: yes
Circularity Check
No circularity: RegMean++ adds explicit dependency terms as a structural change to the objective
full rationale
The paper's derivation starts from RegMean's per-layer linear regression formulation and augments it with intra-layer and cross-layer dependency terms to account for feature propagation. This modification is presented as a direct modeling choice rather than a fit to data or a self-referential definition. No equations, predictions, or results in the abstract reduce by construction to the original RegMean inputs; the claimed improvements in generalization and robustness are positioned as consequences of the new objective structure. The derivation chain is self-contained against external benchmarks and does not rely on self-citation loops or uniqueness theorems imported from prior author work.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RegMean++ computes X(l,j)i based on the activations produced by the previous merge layer f(l−1)M … while RegMean relies on … f(l−1)i
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
closed-form solution … W(l)M = [∑ bG(l)i ]⁻¹ ∑ bG(l)i W(l)i
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
FeatCal: Feature Calibration for Post-Merging Models
FeatCal reduces feature drift in merged models via layer-wise closed-form calibration on a small dataset, outperforming prior post-merging methods on CLIP and GLUE benchmarks with high sample efficiency.
-
ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation
ACE-Merging estimates task input covariances from parameter differences to enable closed-form data-free merging that reduces interference and outperforms prior baselines on vision and language tasks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.