pith. machine review for the scientific record. sign in

arxiv: 2605.12843 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.AI

Recognition: unknown

Bayesian Model Merging

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords model mergingBayesian regressionanchor priorbi-level optimizationtask vectorsGram matrixplug-and-playmulti-task fusion
0
0 comments X

The pith

Bayesian Model Merging fuses task-specific models into one via inner Bayesian regression under anchor priors and outer Bayesian optimization of per-module hyperparameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Bayesian Model Merging to combine multiple expert models into a single network without joint retraining or full data access. It treats merging as a bi-level problem: the inner level solves an activation-based Bayesian regression that incorporates a strong anchor model prior for a closed-form solution, while the outer level runs Bayesian optimization to find module-specific hyperparameters on a small validation set. A further alignment between activation statistics and task vectors supports a data-free variant that estimates the needed Gram matrix without auxiliary samples. On benchmarks the resulting merged model outperforms prior plug-and-play baselines and, for eight tasks on ViT-L/14, reaches 95.1 accuracy against an expert average of 95.8.

Core claim

Bayesian Model Merging formulates model merging as a bi-level optimization in which the inner level performs activation-based Bayesian regression under a prior induced by an anchor model to obtain a closed-form merged weight solution, the outer level applies Bayesian optimization to search module-specific hyperparameters on a modest validation set, and an observed alignment between activation statistics and task vectors permits a data-free Gram-matrix estimator that removes the need for auxiliary data.

What carries the argument

Bi-level optimization with inner activation-based Bayesian regression under an anchor-model prior that yields a closed-form merged-weight solution.

Load-bearing premise

The statistical alignment between activation patterns and task vectors is tight enough to produce an accurate data-free Gram matrix, and the anchor prior yields a merged solution that generalizes without further post-hoc tuning.

What would settle it

On the ViT-L/14 eight-task benchmark the data-free BMM variant falls more than two points below the expert average of 95.8 while the data-dependent version also underperforms the strongest baseline.

Figures

Figures reproduced from arXiv: 2605.12843 by Kaiyang Li, Qing Su, Shaobo Han, Shihao Ji.

Figure 1
Figure 1. Figure 1: Probabilistic formulation of BMM. The framework can adopt different observation sources: empirical activations (data-assisted) or expert weight-induced surrogates (data-free). The inner MAP estimate is solved in closed-form, while the outer loop optimizes λ by BO [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation study of BMM on 20-task merging (ViT-B/32). (Left) Test accuracy as a function of the validation set fraction used for Bayesian Optimization (BO). (Right) Test accuracy vs. the number of BO search trials (K). All curves report mean ± std across 5 seeds. Solid and circle￾dashed lines represent data-assisted and data-free BMM. Blue/green colors indicate ISO-CTS/TSV anchors. The horizontal gray dot-d… view at source ↗
Figure 4
Figure 4. Figure 4: Pareto frontiers of sampling-based BMM vs. MAP-perturbed BMM on ViT-B/32 bench [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Radar charts: ViT-B/32 per-task breakdowns (corresponding to [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Radar charts: ViT-B/32 per-task breakdowns (corresponding to [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Radar charts: ViT-L/14 per-task breakdowns (corresponding to [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Radar charts: ViT-L/14 per-task breakdowns (corresponding to [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Radar charts: Llama per-task breakdowns (corresponding to [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Radar charts: Llama per-task breakdowns (corresponding to [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
read the original abstract

Model merging aims to combine multiple task-specific expert models into a single model without joint retraining, offering a practical alternative to multi-task learning when data access or computational budget is limited. Existing methods, however, face two key limitations: (1) they overlook the valuable inductive bias of strong anchor models and estimate the merged weights from scratch, and (2) they rely on a shared hyperparameter setting across different modules of the network, lacking a global optimization strategy. This paper introduces Bayesian Model Merging (BMM), a plug-and-play bi-level optimization framework, where the inner level formulates the model merging as an activation-based Bayesian regression under a strong prior induced by an anchor model, yielding an efficient closed-form solution; and the outer level leverages a Bayesian optimization procedure to search module-specific hyperparameters globally based on a small validation set. Furthermore, we reveal a key alignment between activation statistics and task vectors, enabling us to derive a data-free variant of BMM that estimates the Gram matrix for regression without any auxiliary data. Across extensive benchmarks, including up to 20-task merging in vision and 5-task merging in language, BMM consistently outperforms all plug-and-play anchor baselines (e.g., TA, WUDI-Merging, and TSV). In particular, on the ViT-L/14 benchmark for 8-task merging, a single merged model reaches 95.1, closely matching the average performance of eight task-specific experts (95.8).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Bayesian Model Merging (BMM), a plug-and-play bi-level optimization framework for model merging. The inner level casts merging as activation-based Bayesian regression under an anchor-model prior, yielding a closed-form solution; the outer level performs Bayesian optimization over module-specific hyperparameters on a small validation set. A data-free variant is derived from a claimed alignment between activation statistics and task vectors that permits Gram-matrix estimation without auxiliary data. Experiments across vision (up to 20-task) and language (5-task) benchmarks report consistent outperformance over anchor baselines (TA, WUDI-Merging, TSV), with a highlighted result of 95.1 accuracy on 8-task ViT-L/14 merging versus 95.8 for the average of eight task-specific experts.

Significance. If the alignment assumption holds and the reported gains prove robust, BMM supplies a principled Bayesian treatment of model merging that exploits strong anchor priors and global hyperparameter search, addressing two stated limitations of prior plug-and-play methods. The closed-form inner solution and data-free option could be valuable in data-limited or privacy-sensitive regimes.

major comments (2)
  1. [Section 5 (Experiments)] The central empirical claim (e.g., 95.1 on ViT-L/14 8-task merging) is presented without error bars, exact train/validation splits, number of random seeds, or ablation studies isolating the contribution of the anchor prior versus the outer optimization; this absence directly weakens confidence in the outperformance numbers cited in the abstract and Section 5.
  2. [Section 3.3 (Data-free BMM)] The data-free variant rests on an unquantified alignment between activation statistics and task vectors that is asserted to enable accurate Gram-matrix recovery (Section 3.3); no correlation coefficients, layer-wise error bounds, or sensitivity analysis are supplied, leaving the approximation error of the closed-form regression solution uncharacterized.
minor comments (2)
  1. [Section 3.1] Notation for the Gram matrix and anchor prior could be introduced earlier with an explicit equation reference to improve readability of the inner-level derivation.
  2. [Abstract] The abstract states 'up to 20-task merging in vision' but does not list the precise task counts or model sizes used in each table; adding a summary table of benchmark configurations would aid comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for strengthening the empirical and theoretical support in the manuscript. We address each major comment below and will incorporate the requested details and analyses in the revised version.

read point-by-point responses
  1. Referee: [Section 5 (Experiments)] The central empirical claim (e.g., 95.1 on ViT-L/14 8-task merging) is presented without error bars, exact train/validation splits, number of random seeds, or ablation studies isolating the contribution of the anchor prior versus the outer optimization; this absence directly weakens confidence in the outperformance numbers cited in the abstract and Section 5.

    Authors: We agree that additional statistical rigor and ablations are needed to support the central claims. In the revision we will report error bars over at least five random seeds, specify the exact train/validation splits and data partitioning procedure, and add ablation studies that separately quantify the contribution of the anchor-model prior versus the outer-level Bayesian hyperparameter optimization. These changes will directly address the concern about confidence in the reported numbers. revision: yes

  2. Referee: [Section 3.3 (Data-free BMM)] The data-free variant rests on an unquantified alignment between activation statistics and task vectors that is asserted to enable accurate Gram-matrix recovery (Section 3.3); no correlation coefficients, layer-wise error bounds, or sensitivity analysis are supplied, leaving the approximation error of the closed-form regression solution uncharacterized.

    Authors: We acknowledge that the alignment assumption underlying the data-free variant is currently stated without quantitative support. We will add layer-wise Pearson correlation coefficients between activation statistics and task vectors, explicit error bounds on the recovered Gram matrices, and a sensitivity analysis showing how approximation error propagates to the closed-form regression solution. These additions will characterize the reliability of the data-free variant. revision: yes

Circularity Check

0 steps flagged

Standard Bayesian regression plus held-out hyperparameter search; no reduction to fitted benchmark quantities

full rationale

The inner-level closed-form solution is the standard posterior mean of Bayesian linear regression under an anchor-induced prior; the outer level performs Bayesian optimization over module-specific hyperparameters on a small held-out validation set. Neither step is shown by the paper's equations to be algebraically identical to the final reported test metrics. The data-free Gram-matrix construction rests on an observed alignment between activation statistics and task vectors, presented as an enabling derivation rather than a parameter fitted directly to the 8-task ViT-L/14 benchmark scores. No self-citation chain or self-definitional loop is load-bearing for the central claim.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on standard Bayesian regression assumptions plus one domain-specific alignment observation; no new physical entities are introduced and the only free parameters are the module-wise hyperparameters optimized on validation data.

free parameters (1)
  • module-specific hyperparameters
    Tuned via outer Bayesian optimization on a small validation set; values are not reported in the abstract.
axioms (2)
  • domain assumption Activation statistics align with task vectors
    Invoked to derive the data-free Gram-matrix estimator.
  • standard math Bayesian regression under anchor-model prior admits efficient closed-form solution
    Forms the inner-level merging step.

pith-pipeline@v0.9.0 · 5559 in / 1443 out tokens · 50949 ms · 2026-05-14T20:38:04.112438+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 10 internal anchors

  1. [1]

    An Overview of Multi-Task Learning in Deep Neural Networks

    Sebastian Ruder. An overview of multi-task learning in deep neural networks.arXiv preprint arXiv:1706.05098, 2017

  2. [2]

    Matena and Colin A

    Michael S. Matena and Colin A. Raffel. Merging models with fisher-weighted averaging. Advances in Neural Information Processing Systems, 35:17703–17716, 2022

  3. [3]

    Editing models with task arithmetic

    Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. InThe Eleventh International Conference on Learning Representations, 2023

  4. [4]

    Raffel, and Mohit Bansal

    Prateek Yadav, Derek Tam, Leshem Choshen, Colin A. Raffel, and Mohit Bansal. TIES- merging: Resolving interference when merging models. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  5. [5]

    The hugging face hub

    Hugging Face. The hugging face hub. https://huggingface.co, 2026. Accessed: 2026-05- 04

  6. [6]

    Dataless knowledge fusion by merging weights of language models

    Xisen Jin, Xiang Ren, Daniel Preotiuc-Pietro, and Pengxiang Cheng. Dataless knowledge fusion by merging weights of language models. InThe Eleventh International Conference on Learning Representations, 2023

  7. [7]

    Whoever started the interference should end it: Guiding data-free model merging via task vectors

    Runxi Cheng, Feng Xiong, Yongxian Wei, Wanyun Zhu, and Chun Yuan. Whoever started the interference should end it: Guiding data-free model merging via task vectors. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 10121–10143. PMLR, 2025

  8. [8]

    Task singular vectors: Reducing task interference in model merging

    Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, and Emanuele Rodolà. Task singular vectors: Reducing task interference in model merging. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18695–18705, 2025

  9. [9]

    Bagdanov, and Joost van de Weijer

    Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, and Joost van de Weijer. No task left behind: Isotropic model merging with common and task-specific subspaces. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 43177–43199. PMLR, 2025

  10. [10]

    Language models are super mario: Absorbing abilities from homologous models as a free lunch

    Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, and Yongbin Li. Language models are super mario: Absorbing abilities from homologous models as a free lunch. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 57755–57775. PMLR, 2024

  11. [11]

    Parameter competition balancing for model merging.Advances in Neural Information Processing Systems, 37, 2024

    Guodong Du, Junlin Lee, Jing Li, Runhua Jiang, Yifei Guo, Shuyang Yu, Hanting Liu, Sim Kuan Goh, Ho-Kin Tang, Daojing He, and Min Zhang. Parameter competition balancing for model merging.Advances in Neural Information Processing Systems, 37, 2024

  12. [12]

    Localize-and-stitch: Efficient model merging via sparse task arithmetic.Transactions on Machine Learning Research, 2025

    Yifei He, Yuzheng Hu, Yong Lin, Tong Zhang, and Han Zhao. Localize-and-stitch: Efficient model merging via sparse task arithmetic.Transactions on Machine Learning Research, 2025. Accepted to TMLR

  13. [13]

    Modeling multi-task model merging as adaptive projective gradient descent

    Yongxian Wei, Anke Tang, Li Shen, Zixuan Hu, Chun Yuan, and Xiaochun Cao. Modeling multi-task model merging as adaptive projective gradient descent. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 66178–66193. PMLR, 2025

  14. [14]

    Vardan Papyan, X. Y . Han, and David L. Donoho. Prevalence of neural collapse during the terminal phase of deep learning training.Proceedings of the National Academy of Sciences, 117(40):24652–24663, 2020

  15. [15]

    Mech- anism for feature learning in neural networks and kernel machines.Science, 383(6690):1461– 1467, 2024

    Adityanarayanan Radhakrishnan, Daniel Beaglehole, Parthe Pandit, and Mikhail Belkin. Mech- anism for feature learning in neural networks and kernel machines.Science, 383(6690):1461– 1467, 2024. 10

  16. [16]

    Average gradient outer product as a mechanism for deep neural collapse

    Daniel Beaglehole, Peter Súkeník, Marco Mondelli, and Mikhail Belkin. Average gradient outer product as a mechanism for deep neural collapse. InAdvances in Neural Information Processing Systems, 2024

  17. [17]

    Formation of representations in neural networks

    Liu Ziyin, Isaac Chuang, Tomer Galanti, and Tomaso Poggio. Formation of representations in neural networks. InThe Thirteenth International Conference on Learning Representations (ICLR 2025), 2025. Spotlight

  18. [18]

    Understanding and improving transfer learning of deep models via neural collapse.arXiv preprint arXiv:2212.12206, 2022

    Xiao Li, Sheng Liu, Jinxin Zhou, Xinyu Lu, Carlos Fernandez-Granda, Zhihui Zhu, and Qing Qu. Understanding and improving transfer learning of deep models via neural collapse.arXiv preprint arXiv:2212.12206, 2022

  19. [19]

    Unleashing the power of neural collapse for transferability estimation.arXiv preprint arXiv:2310.05754, 2023

    Yuhe Ding, Bo Jiang, Lijun Sheng, Aihua Zheng, and Jian Liang. Unleashing the power of neural collapse for transferability estimation.arXiv preprint arXiv:2310.05754, 2023

  20. [20]

    The impact of geometric complexity on neural collapse in transfer learning

    Michael Munn, Benoit Dherin, and Javier Gonzalvo. The impact of geometric complexity on neural collapse in transfer learning. InAdvances in Neural Information Processing Systems, 2024

  21. [21]

    Bayesian optimization

    Peter I Frazier. Bayesian optimization. InRecent advances in optimization and modeling of contemporary problems, pages 255–278. Informs, 2018

  22. [22]

    Mergebench: A benchmark for merging domain-specialized llms.arXiv preprint arXiv:2505.10833, 2025

    Yifei He, Siqi Zeng, Yuzheng Hu, Rui Yang, Tong Zhang, and Han Zhao. Mergebench: A benchmark for merging domain-specialized llms.arXiv preprint arXiv:2505.10833, 2025

  23. [23]

    Explicit inductive bias for transfer learning with convolutional networks

    Xuhong Li, Yves Grandvalet, and Franck Davoine. Explicit inductive bias for transfer learning with convolutional networks. InProceedings of the 35th International Conference on Machine Learning, pages 2825–2834. PMLR, 2018

  24. [24]

    The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects

    Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, and Jinwen Ma. The anisotropic noise in stochastic gradient descent: Its behavior of escaping from sharp minima and regularization effects. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 7654–7663. PMLR, 2019

  25. [25]

    Jingfeng Wu, Difan Wang, and Weijie J. Su. The alignment property of SGD noise and how it helps select flat minima: A stability analysis. InAdvances in Neural Information Processing Systems, volume 35, pages 4680–4693, 2022

  26. [26]

    Optimizing neural networks with kronecker-factored approx- imate curvature

    James Martens and Roger Grosse. Optimizing neural networks with kronecker-factored approx- imate curvature. InProceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 2408–2417. PMLR, 2015

  27. [27]

    Weinberger

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. InInternational Conference on Machine Learning, 2017

  28. [28]

    Being bayesian, even just a bit, fixes overconfidence in relu networks

    Agustinus Kristiadi, Matthias Hein, and Philipp Hennig. Being bayesian, even just a bit, fixes overconfidence in relu networks. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 5436–5446. PMLR, 2020

  29. [29]

    3d object representations for fine-grained categorization

    Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. InICCV Workshops, 2013

  30. [30]

    Describing textures in the wild

    Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. Describing textures in the wild. InCVPR, 2014

  31. [31]

    Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019

  32. [32]

    The german traffic sign recognition benchmark: A multi-class classification competition

    Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. The german traffic sign recognition benchmark: A multi-class classification competition. InIJCNN, 2011

  33. [33]

    Gradient-based learning applied to document recognition.Proceedings of the IEEE, 1998

    Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 1998. 11

  34. [34]

    Remote sensing image scene classification: Benchmark and state of the art.Proceedings of the IEEE, 2017

    Gong Cheng, Junwei Han, and Xiaoqiang Lu. Remote sensing image scene classification: Benchmark and state of the art.Proceedings of the IEEE, 2017

  35. [35]

    Ehinger, Aude Oliva, and Antonio Torralba

    Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Exploring a large collection of scene categories. InIJCV, 2016

  36. [36]

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. InNeurIPS Workshops, 2011

  37. [37]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

  38. [38]

    An analysis of single-layer networks in unsuper- vised feature learning

    Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsuper- vised feature learning. InAISTATS, 2011

  39. [39]

    Automated flower classification over a large number of classes

    Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. InICVGIP, 2008

  40. [40]

    Parkhi, Andrea Vedaldi, Andrew Zisserman, and C

    Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, and C. V . Jawahar. Cats and dogs. In CVPR, 2012

  41. [41]

    Veeling, Jasper Linmans, Jim Winkens, Taco Cohen, and Max Welling

    Bastiaan S. Veeling, Jasper Linmans, Jim Winkens, Taco Cohen, and Max Welling. Rotation equivariant cnns for digital pathology. InMICCAI, 2018

  42. [42]

    Challenges in Representation Learning: A report on three machine learning contests

    Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Ham- ner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, et al. Challenges in represen- tation learning: A report on three machine learning contests.arXiv preprint arXiv:1307.0414, 2013

  43. [43]

    Emnist: Extending mnist to handwritten letters

    Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre van Schaik. Emnist: Extending mnist to handwritten letters. InIJCNN, 2017

  44. [44]

    Food-101: Mining discriminative components with random forests

    Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101: Mining discriminative components with random forests. InECCV, 2014

  45. [45]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

  46. [46]

    Manning, Andrew Y

    Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y . Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. InEMNLP, 2013

  47. [47]

    Deep Learning for Classical Japanese Literature

    Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature.arXiv preprint arXiv:1812.01718, 2018

  48. [48]

    Tulu 3: Pushing Frontiers in Open Language Model Post-Training

    Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, et al. Tülu 3: Pushing frontiers in open language model post-training.arXiv preprint arXiv:2411.15124, 2024

  49. [49]

    Instruction-Following Evaluation for Large Language Models

    Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models.arXiv preprint arXiv:2311.07911, 2023

  50. [50]

    Dart-math: Difficulty- aware rejection tuning for mathematical problem-solving.Advances in Neural Information Processing Systems, 37:7821–7846, 2024

    Yuxuan Tong, Xiwen Zhang, Rui Wang, Ruidong Wu, and Junxian He. Dart-math: Difficulty- aware rejection tuning for mathematical problem-solving.Advances in Neural Information Processing Systems, 37:7821–7846, 2024

  51. [51]

    Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions

    Jia Li, Edward Beeching, Lewis Tunstall, Ben Lipkin, Roman Soletskyi, Shengyi Huang, Kashif Rasul, Longhui Yu, Albert Q Jiang, Ziju Shen, et al. Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions. Hugging Face repository, 2024. 12

  52. [52]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

  53. [53]

    Aya dataset: An open-access collection for multilingual instruction tuning.arXiv preprint arXiv:2402.06619, 2024

    Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F Karlsson, Abinaya Mahendiran, Wei- Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, et al. Aya dataset: An open-access collection for multilingual instruction tuning.arXiv preprint arXiv:2402.06619, 2024

  54. [54]

    Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback

    Viet Lai, Chien Nguyen, Nghia Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan Rossi, and Thien Nguyen. Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 318–327, 2023

  55. [55]

    Magicoder: Empow- ering code generation with oss-instruct.arXiv preprint arXiv:2312.02120, 2023

    Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. Magicoder: Empow- ering code generation with oss-instruct.arXiv preprint arXiv:2312.02120, 2023

  56. [56]

    Program Synthesis with Large Language Models

    Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models.arXiv preprint arXiv:2108.07732, 2021

  57. [57]

    Is your code generated by chatGPT really correct? rigorous evaluation of large language models for code generation

    Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatGPT really correct? rigorous evaluation of large language models for code generation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

  58. [58]

    Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms.arXiv preprint arXiv:2406.18495, 2024

    Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, and Nouha Dziri. Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms.arXiv preprint arXiv:2406.18495, 2024

  59. [59]

    Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models.Advances in Neural Information Processing Systems, 37:47094–47165, 2024

    Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, et al. Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models.Advances in Neural Information Processing Systems, 37:47094–47165, 2024

  60. [60]

    HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

    Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, et al. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal.arXiv preprint arXiv:2402.04249, 2024

  61. [61]

    XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

    Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, and Dirk Hovy. Xstest: A test suite for identifying exaggerated safety behaviours in large language models.arXiv preprint arXiv:2308.01263, 2023

  62. [62]

    Golub and Charles F

    Gene H. Golub and Charles F. Van Loan.Matrix Computations. Johns Hopkins University Press, 4 edition, 2013

  63. [63]

    Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. MIT Press, 2006. 13 A Proof of Theorem 1 Let x denote an input activation to moduleW(t) and y=U (t)x= (W (t) −W pre)x the correspond- ing residual output. Using standard Stochastic Gradient Descent (SGD) with the L2 regularization, we define gy =−∇ yℓ as the back...

  64. [64]

    The expected per-sample descent matrix is aligned with the task vector: E[D(t)] =ρU (t), or equivalently,E[δ(U (t))] =0

  65. [65]

    At convergence, the centered descent matrix fluctuations are assumed to retain a positive Frobenius overlap with the Gram matrix of the mean descent matrix signal. That is, let D (t) =E[D (t)],C t =E h (D(t) − D (t) )⊤(D(t) − D (t) ) i ,M t = (D (t) )⊤D (t) .(18) We have cosF (Ct,M t)> α t, 0< α t ≤1 , where cosF (A,B) = Tr(A ⊤B)/(∥A∥F ∥B∥F ) is the Frobe...

  66. [66]

    mean ± std

    The gradient energy factorizes from the second-moment of input activation: E ∥gy∥2 2xx⊤ =E[∥g y∥2 2]E[xx ⊤].(19) All expectations are w.r.t. the stochasticity induced by mini-batch sampling while conditioning on the fine-tuned checkpoint. Assumption 1 reflects a local quasi-stationary basin in which the mean of update-drift becomes negligible and the weig...