Robustness and Regularization in Hierarchical Re-Basin
Pith reviewed 2026-05-21 20:29 UTC · model grok-4.3
The pith
Hierarchical Re-Basin merges models while building resistance to adversarial attacks and input perturbations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Re-Basin applied through a hierarchical merging procedure induces adversarial and perturbation robustness into the resulting models, with the robustness effect becoming stronger the more models participate in the hierarchy. The hierarchical algorithm also delivers better merged-model performance than the flat MergeMany baseline, although the accuracy cost on standard tasks is larger than previously observed.
What carries the argument
The hierarchical merging scheme, which applies Re-Basin recursively to successive groups of models in a tree structure instead of merging all models at once.
If this is right
- Merged models gain resistance to adversarial perturbations that scales with the number of base models.
- The hierarchical scheme produces stronger overall merged performance than flat merging.
- Clean-data accuracy falls more than earlier Re-Basin reports indicated.
- Robustness benefits appear consistently across the tested merging depths.
Where Pith is reading between the lines
- The hierarchy may act as a built-in regularizer that trades some accuracy for robustness.
- Similar robustness patterns could be tested by applying hierarchy to other merging algorithms.
- Deployment settings that value robustness over peak accuracy might benefit from deeper hierarchies.
Load-bearing premise
The observed robustness gains and performance drop arise from the hierarchical merging procedure itself rather than from the particular models, training runs, or evaluation protocols used.
What would settle it
Run the same set of base models through both flat Re-Basin and the hierarchical version, then measure whether the hierarchical version still shows higher adversarial and perturbation accuracy.
Figures
read the original abstract
This paper takes a closer look at Git Re-Basin, an interesting new approach to merge trained models. We propose a hierarchical model merging scheme that significantly outperforms the standard MergeMany algorithm. With our new algorithm, we find that Re-Basin induces adversarial and perturbation robustness into the merged models, with the effect becoming stronger the more models participate in the hierarchical merging scheme. However, in our experiments Re-Basin induces a much bigger performance drop than reported by the original authors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a hierarchical variant of the Git Re-Basin model-merging procedure that is claimed to outperform the standard MergeMany baseline. It reports that Re-Basin merging induces adversarial and perturbation robustness, with the magnitude of this robustness increasing as more models participate in the hierarchy, while also documenting a substantially larger clean-data performance drop than was reported in the original Re-Basin work.
Significance. If the robustness scaling is shown to be causally attributable to the hierarchical alignment-and-merge steps rather than to model-selection or training artifacts, the result would be of moderate interest to the model-merging community: it would identify a new, training-free regularization pathway whose strength can be tuned by hierarchy depth. The larger performance drop observation is also potentially useful for understanding the robustness-accuracy trade-off in merging, provided it is placed in context with prior baselines.
major comments (2)
- [§4 Experiments] §4 Experiments (and associated tables): the central claim that robustness strengthens with hierarchy depth requires an ablation in which the exact set of base models and training seeds is held fixed while only the merging depth is varied. No such controlled comparison is described; without it the scaling effect cannot be confidently attributed to the hierarchical Re-Basin procedure itself rather than to incidental differences in the participating models.
- [§4.2] §4.2 and Table 3: the reported adversarial and perturbation robustness numbers are presented without error bars across independent training runs or statistical significance tests. Given that the abstract already notes a larger performance drop than prior work, the absence of these controls makes it impossible to judge whether the robustness gains are reliable or simply correlated with the larger accuracy degradation.
minor comments (2)
- The abstract states empirical findings without any reference to experimental setup, number of models, datasets, or evaluation protocols; a one-sentence summary of the experimental regime would improve readability.
- [§3] Notation for the hierarchical merging levels is introduced without a clear diagram or pseudocode; a small figure illustrating the tree structure would clarify the algorithm.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the strength of our claims. We address each major point below and indicate the revisions we will incorporate.
read point-by-point responses
-
Referee: [§4 Experiments] §4 Experiments (and associated tables): the central claim that robustness strengthens with hierarchy depth requires an ablation in which the exact set of base models and training seeds is held fixed while only the merging depth is varied. No such controlled comparison is described; without it the scaling effect cannot be confidently attributed to the hierarchical Re-Basin procedure itself rather than to incidental differences in the participating models.
Authors: We agree that the current experimental design does not fully isolate the effect of hierarchy depth from the number of participating models. In our reported results, deeper hierarchies incorporate additional models by construction, which could introduce confounding factors. We will add a controlled ablation in the revised manuscript that fixes the exact set of base models and training seeds while varying only the merging depth (e.g., by constructing hierarchies of different depths from the same pool of models). This will allow a direct attribution of any robustness scaling to the hierarchical alignment-and-merge steps. revision: yes
-
Referee: [§4.2] §4.2 and Table 3: the reported adversarial and perturbation robustness numbers are presented without error bars across independent training runs or statistical significance tests. Given that the abstract already notes a larger performance drop than prior work, the absence of these controls makes it impossible to judge whether the robustness gains are reliable or simply correlated with the larger accuracy degradation.
Authors: We acknowledge that the absence of error bars and statistical tests limits the ability to assess reliability, particularly in light of the larger clean accuracy drop we report. We will rerun the primary experiments across multiple independent training seeds, report standard deviations or confidence intervals for both clean accuracy and robustness metrics, and include statistical significance tests (e.g., paired t-tests) comparing hierarchical Re-Basin against baselines. These additions will be incorporated into the revised §4.2 and Table 3. revision: yes
Circularity Check
No circularity: empirical claims rest on experiments, not derivations
full rationale
The paper introduces a hierarchical variant of Re-Basin merging and reports experimental observations of induced robustness that scales with the number of models. No equations, parameter-fitting steps, or mathematical derivations are described that could reduce a claimed prediction to an input by construction. Claims are supported by direct comparisons to MergeMany and ablation-style scaling experiments, which remain externally falsifiable. This is a standard empirical ML paper whose central results do not rely on self-referential definitions or load-bearing self-citations.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a hierarchical model merging scheme that significantly outperforms the standard MergeMany algorithm... Re-Basin induces adversarial and perturbation robustness... stronger the more models participate
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Re-Basin seems to act as a sort of regularization, positively impacting adversarial and perturbation robustness
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Linear mode connectivity and the lottery ticket hypothesis
Jonathan Frankle et al. Linear mode connectivity and the lottery ticket hypothesis. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, volume 119 ofProceedings of Machine Learning Research, pages 3259–3269. PMLR, 2020
work page 2020
-
[2]
The lottery ticket hypothesis: Finding sparse, trainable neural networks
Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, 2019
work page 2019
-
[3]
Linear mode connectivity in multitask and continual learning
Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Dilan G¨ or¨ ur, Razvan Pascanu, and Hassan Ghasemzadeh. Linear mode connectivity in multitask and continual learning. In9th International Conference on Learning Representations. OpenReview.net, 2021
work page 2021
-
[4]
arXiv preprint arXiv:2210.06671 , year =
Aditya Kumar Akash et al. Wasserstein barycenter-based model fusion and linear mode connectivity of neural networks.arXiv preprint arXiv:2210.06671, 2022
-
[5]
Samuel K. Ainsworth et al. Git re-basin: Merging models modulo permutation sym- metries. InThe Eleventh International Conference on Learning Representations, ICLR
-
[6]
OpenReview.net, 2023
work page 2023
-
[7]
The role of permutation invariance in linear mode connectivity of neural networks
Rahim Entezari et al. The role of permutation invariance in linear mode connectivity of neural networks. InThe Tenth International Conference on Learning Representations, ICLR 2022, 2022
work page 2022
-
[8]
Exploring mode connectivity for pre-trained language models
Yujia Qin et al. Exploring mode connectivity for pre-trained language models. InPro- ceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pages 6726–6746. Association for Computational Linguistics, 2022
work page 2022
-
[9]
Going beyond linear mode connectivity: The layerwise linear feature connectivity
Zhanpeng Zhou et al. Going beyond linear mode connectivity: The layerwise linear feature connectivity. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, 2023
work page 2023
-
[10]
Federated learning with matched averaging
Hongyi Wang et al. Federated learning with matched averaging. In8th International Conference on Learning Representations, ICLR 2020. OpenReview.net, 2020
work page 2020
-
[11]
Karan Ganju et al. Property inference attacks on fully connected neural networks us- ing permutation invariant representations. InProceedings of the 2018 ACM SIGSAC conference on computer and communications security, pages 619–633, 2018
work page 2018
-
[12]
Deepfool: A simple and accurate method to fool deep neural networks
Seyed-Mohsen Moosavi-Dezfooli et al. Deepfool: A simple and accurate method to fool deep neural networks. In2016 IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR). IEEE, June 2016
work page 2016
-
[13]
Sandy H. Huang et al. Adversarial attacks on neural network policies. In5th Interna- tional Conference on Learning Representations, ICLR 2017, Workshop Track Proceed- ings. OpenReview.net, 2017
work page 2017
-
[14]
Guozhong An. The effects of adding noise during backpropagation training on a general- ization performance.Neural Computation, 8(3):643–674, 1996
work page 1996
-
[15]
Andrew Y Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning, page 78, 2004
work page 2004
-
[16]
Trevor Avant and Kristi A Morgansen. Analytical bounds on the local lipschitz constants of relu networks.IEEE Transactions on Neural Networks and Learning Systems, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.