pith. sign in

arxiv: 2510.09174 · v3 · pith:VMNUPXBJnew · submitted 2025-10-10 · 💻 cs.LG

Robustness and Regularization in Hierarchical Re-Basin

Pith reviewed 2026-05-21 20:29 UTC · model grok-4.3

classification 💻 cs.LG
keywords Re-Basinmodel merginghierarchical mergingadversarial robustnessperturbation robustnessneural network ensemblesregularization
0
0 comments X

The pith

Hierarchical Re-Basin merges models while building resistance to adversarial attacks and input perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hierarchical scheme for merging trained neural networks with the Re-Basin method and shows it beats the standard flat MergeMany algorithm. The merged models gain resistance to adversarial examples and small perturbations, and this resistance grows stronger when more base models participate in the hierarchy. The same procedure produces a larger drop in clean-data performance than earlier Re-Basin work reported.

Core claim

Re-Basin applied through a hierarchical merging procedure induces adversarial and perturbation robustness into the resulting models, with the robustness effect becoming stronger the more models participate in the hierarchy. The hierarchical algorithm also delivers better merged-model performance than the flat MergeMany baseline, although the accuracy cost on standard tasks is larger than previously observed.

What carries the argument

The hierarchical merging scheme, which applies Re-Basin recursively to successive groups of models in a tree structure instead of merging all models at once.

If this is right

  • Merged models gain resistance to adversarial perturbations that scales with the number of base models.
  • The hierarchical scheme produces stronger overall merged performance than flat merging.
  • Clean-data accuracy falls more than earlier Re-Basin reports indicated.
  • Robustness benefits appear consistently across the tested merging depths.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The hierarchy may act as a built-in regularizer that trades some accuracy for robustness.
  • Similar robustness patterns could be tested by applying hierarchy to other merging algorithms.
  • Deployment settings that value robustness over peak accuracy might benefit from deeper hierarchies.

Load-bearing premise

The observed robustness gains and performance drop arise from the hierarchical merging procedure itself rather than from the particular models, training runs, or evaluation protocols used.

What would settle it

Run the same set of base models through both flat Re-Basin and the hierarchical version, then measure whether the hierarchical version still shows higher adversarial and perturbation accuracy.

Figures

Figures reproduced from arXiv: 2510.09174 by Arne Raulf, Benedikt Franke, Florian Heinrich, Markus Lange.

Figure 2
Figure 2. Figure 2: Our proposed hierarchical Merg￾ing Scheme, exemplified for merging eight models. While [5] provides the MergeMany algorithm to apply Git Re-Basin to more than 2 models, we found the al￾gorithm to have an important theo￾retical weakness: In each round of the algorithm, one of the n input models Θi is permuted towards the mean Θ of ¯ the other n − 1 models with Θ = ¯ 1 n P j∈{1,...n}\i Θj [5]. However, the m… view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of test set accuracies on CIFAR-10 by merging algorithm [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean Accuracy of different Re-Basin stages over attack strength [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Impact of Different Re-Basin Stages on Weight Norm and Lipschitz [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
read the original abstract

This paper takes a closer look at Git Re-Basin, an interesting new approach to merge trained models. We propose a hierarchical model merging scheme that significantly outperforms the standard MergeMany algorithm. With our new algorithm, we find that Re-Basin induces adversarial and perturbation robustness into the merged models, with the effect becoming stronger the more models participate in the hierarchical merging scheme. However, in our experiments Re-Basin induces a much bigger performance drop than reported by the original authors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a hierarchical variant of the Git Re-Basin model-merging procedure that is claimed to outperform the standard MergeMany baseline. It reports that Re-Basin merging induces adversarial and perturbation robustness, with the magnitude of this robustness increasing as more models participate in the hierarchy, while also documenting a substantially larger clean-data performance drop than was reported in the original Re-Basin work.

Significance. If the robustness scaling is shown to be causally attributable to the hierarchical alignment-and-merge steps rather than to model-selection or training artifacts, the result would be of moderate interest to the model-merging community: it would identify a new, training-free regularization pathway whose strength can be tuned by hierarchy depth. The larger performance drop observation is also potentially useful for understanding the robustness-accuracy trade-off in merging, provided it is placed in context with prior baselines.

major comments (2)
  1. [§4 Experiments] §4 Experiments (and associated tables): the central claim that robustness strengthens with hierarchy depth requires an ablation in which the exact set of base models and training seeds is held fixed while only the merging depth is varied. No such controlled comparison is described; without it the scaling effect cannot be confidently attributed to the hierarchical Re-Basin procedure itself rather than to incidental differences in the participating models.
  2. [§4.2] §4.2 and Table 3: the reported adversarial and perturbation robustness numbers are presented without error bars across independent training runs or statistical significance tests. Given that the abstract already notes a larger performance drop than prior work, the absence of these controls makes it impossible to judge whether the robustness gains are reliable or simply correlated with the larger accuracy degradation.
minor comments (2)
  1. The abstract states empirical findings without any reference to experimental setup, number of models, datasets, or evaluation protocols; a one-sentence summary of the experimental regime would improve readability.
  2. [§3] Notation for the hierarchical merging levels is introduced without a clear diagram or pseudocode; a small figure illustrating the tree structure would clarify the algorithm.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the strength of our claims. We address each major point below and indicate the revisions we will incorporate.

read point-by-point responses
  1. Referee: [§4 Experiments] §4 Experiments (and associated tables): the central claim that robustness strengthens with hierarchy depth requires an ablation in which the exact set of base models and training seeds is held fixed while only the merging depth is varied. No such controlled comparison is described; without it the scaling effect cannot be confidently attributed to the hierarchical Re-Basin procedure itself rather than to incidental differences in the participating models.

    Authors: We agree that the current experimental design does not fully isolate the effect of hierarchy depth from the number of participating models. In our reported results, deeper hierarchies incorporate additional models by construction, which could introduce confounding factors. We will add a controlled ablation in the revised manuscript that fixes the exact set of base models and training seeds while varying only the merging depth (e.g., by constructing hierarchies of different depths from the same pool of models). This will allow a direct attribution of any robustness scaling to the hierarchical alignment-and-merge steps. revision: yes

  2. Referee: [§4.2] §4.2 and Table 3: the reported adversarial and perturbation robustness numbers are presented without error bars across independent training runs or statistical significance tests. Given that the abstract already notes a larger performance drop than prior work, the absence of these controls makes it impossible to judge whether the robustness gains are reliable or simply correlated with the larger accuracy degradation.

    Authors: We acknowledge that the absence of error bars and statistical tests limits the ability to assess reliability, particularly in light of the larger clean accuracy drop we report. We will rerun the primary experiments across multiple independent training seeds, report standard deviations or confidence intervals for both clean accuracy and robustness metrics, and include statistical significance tests (e.g., paired t-tests) comparing hierarchical Re-Basin against baselines. These additions will be incorporated into the revised §4.2 and Table 3. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on experiments, not derivations

full rationale

The paper introduces a hierarchical variant of Re-Basin merging and reports experimental observations of induced robustness that scales with the number of models. No equations, parameter-fitting steps, or mathematical derivations are described that could reduce a claimed prediction to an input by construction. Claims are supported by direct comparisons to MergeMany and ablation-style scaling experiments, which remain externally falsifiable. This is a standard empirical ML paper whose central results do not rely on self-referential definitions or load-bearing self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all such elements remain unknown.

pith-pipeline@v0.9.0 · 5598 in / 1072 out tokens · 36002 ms · 2026-05-21T20:29:50.437867+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Linear mode connectivity and the lottery ticket hypothesis

    Jonathan Frankle et al. Linear mode connectivity and the lottery ticket hypothesis. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, volume 119 ofProceedings of Machine Learning Research, pages 3259–3269. PMLR, 2020

  2. [2]

    The lottery ticket hypothesis: Finding sparse, trainable neural networks

    Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, 2019

  3. [3]

    Linear mode connectivity in multitask and continual learning

    Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Dilan G¨ or¨ ur, Razvan Pascanu, and Hassan Ghasemzadeh. Linear mode connectivity in multitask and continual learning. In9th International Conference on Learning Representations. OpenReview.net, 2021

  4. [4]

    arXiv preprint arXiv:2210.06671 , year =

    Aditya Kumar Akash et al. Wasserstein barycenter-based model fusion and linear mode connectivity of neural networks.arXiv preprint arXiv:2210.06671, 2022

  5. [5]

    Ainsworth et al

    Samuel K. Ainsworth et al. Git re-basin: Merging models modulo permutation sym- metries. InThe Eleventh International Conference on Learning Representations, ICLR

  6. [6]

    OpenReview.net, 2023

  7. [7]

    The role of permutation invariance in linear mode connectivity of neural networks

    Rahim Entezari et al. The role of permutation invariance in linear mode connectivity of neural networks. InThe Tenth International Conference on Learning Representations, ICLR 2022, 2022

  8. [8]

    Exploring mode connectivity for pre-trained language models

    Yujia Qin et al. Exploring mode connectivity for pre-trained language models. InPro- ceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pages 6726–6746. Association for Computational Linguistics, 2022

  9. [9]

    Going beyond linear mode connectivity: The layerwise linear feature connectivity

    Zhanpeng Zhou et al. Going beyond linear mode connectivity: The layerwise linear feature connectivity. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, 2023

  10. [10]

    Federated learning with matched averaging

    Hongyi Wang et al. Federated learning with matched averaging. In8th International Conference on Learning Representations, ICLR 2020. OpenReview.net, 2020

  11. [11]

    Property inference attacks on fully connected neural networks us- ing permutation invariant representations

    Karan Ganju et al. Property inference attacks on fully connected neural networks us- ing permutation invariant representations. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619–633, 2018

  12. [12]

    Deepfool: A simple and accurate method to fool deep neural networks

    Seyed-Mohsen Moosavi-Dezfooli et al. Deepfool: A simple and accurate method to fool deep neural networks. In2016 IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR). IEEE, June 2016

  13. [13]

    Huang et al

    Sandy H. Huang et al. Adversarial attacks on neural network policies. In5th Interna- tional Conference on Learning Representations, ICLR 2017, Workshop Track Proceed- ings. OpenReview.net, 2017

  14. [14]

    The effects of adding noise during backpropagation training on a general- ization performance.Neural Computation, 8(3):643–674, 1996

    Guozhong An. The effects of adding noise during backpropagation training on a general- ization performance.Neural Computation, 8(3):643–674, 1996

  15. [15]

    Feature selection, l1 vs

    Andrew Y Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning, page 78, 2004

  16. [16]

    Analytical bounds on the local lipschitz constants of relu networks.IEEE Transactions on Neural Networks and Learning Systems, 2023

    Trevor Avant and Kristi A Morgansen. Analytical bounds on the local lipschitz constants of relu networks.IEEE Transactions on Neural Networks and Learning Systems, 2023