Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning

Kei Hiroshima; Kento Uchida; Shinichi Shirakawa

arxiv: 2605.20803 · v1 · pith:KK5M6N4Enew · submitted 2026-05-20 · 💻 cs.LG · cs.AI

Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning

Kei Hiroshima , Kento Uchida , Shinichi Shirakawa This is my paper

Pith reviewed 2026-05-21 06:06 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords continual learningmodel mergingpreference vectortask performance controlMAGMAXcatastrophic forgettinglarge pre-trained models

0 comments

The pith

Tunable MAGMAX lets a preference vector adjust how much each task influences a merged continual learning model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to move beyond average-performance merging in continual learning by giving users direct control over task emphasis in the final model. It does this by adding a preference vector that decides how many elements from each task-specific parameter set enter the merged weights. The vector is built automatically from a small sample of target-environment data plus the original training sets, so no hand-tuning is required. If the approach holds, the same merged model could be redeployed across environments that care about different tasks without retraining or forgetting.

Core claim

Tunable MAGMAX introduces a preference vector that controls the number of elements selected from each task vector during model merging, allowing the merged model performance to be adjusted according to deployment needs. A method is also given for automatically constructing appropriate preference vectors by leveraging small amounts of target environment data and datasets from model training tasks, thereby eliminating the need for manual specification.

What carries the argument

The preference vector, which specifies how many elements to draw from each task-specific parameter vector when forming the merged model.

If this is right

Task-wise performance in the merged model can be shifted toward any chosen subset of tasks to match a target environment.
Manual specification of merging weights is replaced by an automatic procedure that uses only a small target sample.
The same base merged model can be adapted to multiple deployment settings while remaining competitive with standard baselines on continual learning benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same preference mechanism could be layered on top of other merging algorithms that currently fix a single global weighting.
If the small target sample is collected periodically, the method might support online re-tuning as user priorities drift.
Preference vectors for different users could be stored and swapped at inference time to personalize a single deployed model.

Load-bearing premise

Small amounts of target environment data combined with training-task datasets are sufficient to automatically construct preference vectors that reliably produce the desired task-performance trade-offs without manual tuning or overfitting to the small target sample.

What would settle it

If automatically constructed preference vectors produce task-performance trade-offs that deviate substantially from manually chosen vectors when both are evaluated on held-out target-environment data, the automatic-construction claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.20803 by Kei Hiroshima, Kento Uchida, Shinichi Shirakawa.

**Figure 2.** Figure 2: Preference vector for CIFAR-100-5 with α = 0.5, 2.0 (and d ≈ 86 × 106 ). 4.2 Evaluation of Tunable MAGMAX with Exemplary Preference Vectors In this section, we evaluate the flexibility of the task-wise performance of the merged model constructed by Tunable MAGMAX. As an exemplary setting of the preference vector, we introduce a coefficient α ≥ 0 and define the t-th element of the preference vector nT as nt… view at source ↗

**Figure 3.** Figure 3: Top-1 accuracy for the first and last tasks on CIFAR-100-5 (left), -20 [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Average accuracy with different numbers of tasks in target environments in CIFAR-100-20 The results show that Tunable MAGMAX (Label) consistently achieves high performance across varying numbers of tasks in the target environment, especially when the target environment involves a small number of tasks. Tunable MAGMAX (OT) also shows high accuracy when M is small, while its performance deteriorates when … view at source ↗

read the original abstract

Continual learning (CL) aims to train models sequentially on multiple tasks while mitigating catastrophic forgetting of previously learned knowledge. Recent advances in large pre-trained models (LPMs) and model merging techniques, such as MAGMAX, have demonstrated effective CL performance by combining task-specific parameters. However, existing methods primarily focus on average performance across all tasks and do not adequately address how to construct models accommodating different deployment environments or varying user preferences. This paper proposes a model merging framework, termed Tunable MAGMAX, which enables preference-aware control of task-specific performance in CL. Our method introduces a preference vector that controls the number of elements selected from each task vector during model merging, allowing us to adjust the merged model performance according to their deployment needs. We further propose a method for automatically constructing appropriate preference vectors by leveraging small amounts of target environment data and datasets from model training tasks, thereby eliminating the need for manual specification. The experimental result on CL benchmark tasks demonstrates that Tunable MAGMAX effectively controls task-wise performance and successfully adapts merged models to various target environments. The proposed Tunable MAGMAX achieves superior or comparable performance to baseline methods, making it a practical solution for deploying CL models to various environments where the preferences of each task performance differ.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tunable MAGMAX adds a controllable preference vector to MAGMAX merging for task prioritization in continual learning, but the automatic construction from small target data looks vulnerable to overfitting.

read the letter

The main takeaway is that this paper extends MAGMAX by introducing a preference vector that controls how many elements are taken from each task vector during merging, plus an automatic way to build that vector from small target-environment samples and the original training data. This gives a knob for adjusting task-wise performance to fit different deployments instead of just averaging across tasks. That matches a real need when average performance is not the only goal. The automatic construction is the concrete new mechanism here, and it removes the need for manual tuning, which is a practical improvement over prior merging work. The experiments on CL benchmarks claim the method controls performance as intended and matches or beats baselines, which is a reasonable starting point for showing the idea works in practice. The soft spot is the construction step itself. Small amounts of target data combined with training sets could easily produce a vector that overfits to those samples, so the reported trade-offs might not hold on new data from the same environment. The abstract does not mention cross-validation, held-out checks, or regularization for this part, which leaves the reliability of the tunability open. If the full paper separates the data properly and shows robustness, that concern shrinks; otherwise it undercuts the central claim. This is aimed at researchers working on model merging for continual learning with large pre-trained models, especially those who need customizable priorities rather than one-size-fits-all averages. A reader focused on practical deployment gaps would get value from the control mechanism. I would send it for peer review. The extension is direct and addresses a deployment issue, even if the auto-construction needs tighter validation.

Referee Report

2 major / 2 minor

Summary. The paper proposes Tunable MAGMAX, an extension of the MAGMAX model merging technique for continual learning. It introduces a tunable preference vector that controls the selection of elements from each task-specific vector during merging, enabling adjustment of task-wise performance to match different deployment environments or user preferences. The method further includes an automatic procedure to construct suitable preference vectors from small amounts of target-environment data together with the original training-task datasets, removing the need for manual specification. Experiments on standard continual learning benchmarks are reported to show effective task-wise control, successful adaptation to varied target environments, and performance that is superior or comparable to existing baselines.

Significance. If the central claims hold after addressing the points below, the work would offer a practical advance for deploying merged continual-learning models in heterogeneous environments where average-performance merging is insufficient. The automatic preference-vector construction from limited target data is a notable strength for usability, provided it proves robust; this directly tackles the gap left by prior merging methods that optimize only for aggregate metrics.

major comments (2)

[§3.3] §3.3 (Preference Vector Construction): The automatic construction of the preference vector from small target-environment samples plus training-task data is presented without explicit regularization, hold-out validation, or sample-size ablation. Because the central claim of reliable tunability and adaptation rests on this step producing generalizable trade-offs at deployment time, the absence of these safeguards leaves open the risk that the reported control is an artifact of fitting to the small target sample.
[Table 4] Table 4 (target-environment rows): Performance deltas for different preference settings are shown without error bars, multiple random seeds, or statistical significance tests. This weakens the claim that Tunable MAGMAX “effectively controls task-wise performance” across environments, as it is impossible to judge whether the observed trade-offs are stable or sensitive to the particular small target samples used.

minor comments (2)

[Abstract] The abstract states “the experimental result” (singular); rephrasing to “experimental results” would improve readability.
[§4] Notation for the preference vector (denoted p or similar) is introduced in §3 but its precise range and normalization are not restated in the experimental section; a brief reminder would aid readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, outlining revisions that will strengthen the manuscript while preserving the core contributions.

read point-by-point responses

Referee: [§3.3] §3.3 (Preference Vector Construction): The automatic construction of the preference vector from small target-environment samples plus training-task data is presented without explicit regularization, hold-out validation, or sample-size ablation. Because the central claim of reliable tunability and adaptation rests on this step producing generalizable trade-offs at deployment time, the absence of these safeguards leaves open the risk that the reported control is an artifact of fitting to the small target sample.

Authors: We agree that the absence of these elements in §3.3 represents a gap that could undermine confidence in generalizability. In the revised manuscript we will add a sample-size ablation (reporting results for 1, 5, 10, and 20 target samples per task) and introduce a hold-out validation split from the target-environment data to select the preference vector. We will also incorporate and discuss L2 regularization on the preference vector during optimization to mitigate overfitting. These changes will be presented in an expanded §3.3 and a new supplementary section. revision: yes
Referee: [Table 4] Table 4 (target-environment rows): Performance deltas for different preference settings are shown without error bars, multiple random seeds, or statistical significance tests. This weakens the claim that Tunable MAGMAX “effectively controls task-wise performance” across environments, as it is impossible to judge whether the observed trade-offs are stable or sensitive to the particular small target samples used.

Authors: We concur that reporting variability and significance is necessary to support the stability claims. In the revision we will rerun the Table 4 experiments with five independent random seeds for both preference-vector construction and evaluation. Mean and standard-deviation values will be added as error bars, and we will include paired statistical tests (e.g., t-tests) against the baseline rows to quantify significance of the observed task-wise trade-offs. Updated tables and text will appear in the main body and appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces a preference vector to control task-specific performance during MAGMAX-style merging and describes an automatic construction procedure that uses small target-environment samples plus training-task data. No equations, self-citations, or uniqueness theorems are quoted that reduce the central performance-control claim to a tautological redefinition or to a fitted parameter on the identical evaluation data. The reported results are framed as empirical outcomes on standard CL benchmarks, leaving the derivation self-contained against external validation rather than internally forced by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The method rests on the unstated premise that task vectors remain sufficiently orthogonal or additive after selective element selection, and that the small target dataset is representative enough to choose the preference vector without introducing bias.

free parameters (1)

preference vector
Determines how many elements are taken from each task vector; its values are either chosen or automatically derived from limited target data.

pith-pipeline@v0.9.0 · 5753 in / 1109 out tokens · 38847 ms · 2026-05-21T06:06:35.536577+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 2 internal anchors

[1]

Akiba, T., Shing, M., Tang, Y., Sun, Q., Ha, D.: Evolutionary Optimiza- tion of Model Merging Recipes (Mar 2024),http://arxiv.org/abs/2403.13187, arXiv:2403.13187 [cs]

work page arXiv 2024
[2]

In: Ad- vances in Neural Information Processing Systems

Aljundi, R., Belilovsky, E., Tuytelaars, T., Charlin, L., Caccia, M., Lin, M., Page- Caccia, L.: Online Continual Learning with Maximal Interfered Retrieval. In: Ad- vances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019)

work page 2019
[3]

In: The Thirteenth International Conference on Learning Representations (2025)

Bandarkar, L., Muller, B., Yuvraj, P., Hou, R., Singhal, N., Lv, H., Liu, B.: Layer swapping for zero-shot cross-lingual transfer in large language models. In: The Thirteenth International Conference on Learning Representations (2025)

work page 2025
[4]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 9650–9660 (October 2021)

work page 2021
[5]

In:AdvancesinNeuralInformationProcessingSystems.vol.26.CurranAssociates, Inc

Cuturi, M.: Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In:AdvancesinNeuralInformationProcessingSystems.vol.26.CurranAssociates, Inc. (2013)

work page 2013
[6]

In: 2009 IEEE Conference on Computer Vision and Pattern Recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (2009) Preference-Aware Model Merging for Continual Learning 15

work page 2009
[7]

In: International Conference on Learning Representations (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

work page 2021
[8]

ICCV (2021)

Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., Song, D., Steinhardt, J., Gilmer, J.: The many faces of robustness: A critical analysis of out-of-distribution generalization. ICCV (2021)

work page 2021
[9]

In: The Eleventh International Conference on Learning Representations (2023)

Ilharco, G., Ribeiro, M.T., Wortsman, M., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: The Eleventh International Conference on Learning Representations (2023)

work page 2023
[10]

Journal of Mathematical Sciences 133, 1381–1382 (2006)

Kantorovich, L.: On the translocation of masses. Journal of Mathematical Sciences 133, 1381–1382 (2006)

work page 2006
[11]

Proceedings of the National Academy of Sciences114(13), 3521–3526 (2017)

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., Hadsell, R.: Overcoming catastrophic forgetting in neu- ral networks. Proceedings of the National Academy of Sciences114(13), 3521–3526 (2017)

work page 2017
[12]

Master’s thesis, University of Tront (2009)

Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, University of Tront (2009)

work page 2009
[13]

IEEE Transactions on Pattern Analysis and Machine Intelligence40(12), 2935–2947 (2018)

Li, Z., Hoiem, D.: Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence40(12), 2935–2947 (2018)

work page 2018
[14]

In: Computer Vision – ECCV

Marczak, D., Twardowski, B., Trzciński, T., Cygert, S.: MAGMAX: Leveraging Model Merging for Seamless Continual Learning. In: Computer Vision – ECCV

work page
[15]

pp. 379–395. Springer Nature Switzerland, Cham (2025)

work page 2025
[16]

Learning Transferable Visual Models From Natural Language Supervision

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. CoRRabs/2103.00020(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[17]

In: Proceedings of the 38th International Conference on Machine Learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transfer- able visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 8748–87...

work page 2021
[18]

Progressive Neural Networks

Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., Hadsell, R.: Progressive neural networks. arXiv:1606.04671 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullis, C., Katta, A., Coombes, T., Jitsev, J., Komatsuzaki, A.: LAION-400M: Open dataset of CLIP- filtered 400 million image-text pairs (2021)

work page 2021
[20]

IEEE Transactions on Pattern Analysis and Machine Intelligence46(8), 5362–5383 (2024)

Wang, L., Zhang, X., Su, H., Zhu, J.: A comprehensive survey of continual learn- ing: Theory, method and application. IEEE Transactions on Pattern Analysis and Machine Intelligence46(8), 5362–5383 (2024)

work page 2024
[21]

In: Proceedings of the 39th International Conference on Machine Learning

Wortsman, M., Ilharco, G., Gadre, S.Y., Roelofs, R., Gontijo-Lopes, R., Morcos, A.S., Namkoong, H., Farhadi, A., Carmon, Y., Kornblith, S., Schmidt, L.: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of M...

work page 2022
[22]

In: Advances in Neural Information Processing Systems

Yadav, P., Tam, D., Choshen, L., Raffel, C.A., Bansal, M.: TIES-Merging: Re- solving Interference When Merging Models. In: Advances in Neural Information Processing Systems. vol. 36, pp. 7093–7115. Curran Associates, Inc. (2023) 16 K. Hiroshima et al. A Results in Other Dataset Settings We report our main results with the experimental settings in Section ...

work page 2023

[1] [1]

Akiba, T., Shing, M., Tang, Y., Sun, Q., Ha, D.: Evolutionary Optimiza- tion of Model Merging Recipes (Mar 2024),http://arxiv.org/abs/2403.13187, arXiv:2403.13187 [cs]

work page arXiv 2024

[2] [2]

In: Ad- vances in Neural Information Processing Systems

Aljundi, R., Belilovsky, E., Tuytelaars, T., Charlin, L., Caccia, M., Lin, M., Page- Caccia, L.: Online Continual Learning with Maximal Interfered Retrieval. In: Ad- vances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019)

work page 2019

[3] [3]

In: The Thirteenth International Conference on Learning Representations (2025)

Bandarkar, L., Muller, B., Yuvraj, P., Hou, R., Singhal, N., Lv, H., Liu, B.: Layer swapping for zero-shot cross-lingual transfer in large language models. In: The Thirteenth International Conference on Learning Representations (2025)

work page 2025

[4] [4]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 9650–9660 (October 2021)

work page 2021

[5] [5]

In:AdvancesinNeuralInformationProcessingSystems.vol.26.CurranAssociates, Inc

Cuturi, M.: Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In:AdvancesinNeuralInformationProcessingSystems.vol.26.CurranAssociates, Inc. (2013)

work page 2013

[6] [6]

In: 2009 IEEE Conference on Computer Vision and Pattern Recognition

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (2009) Preference-Aware Model Merging for Continual Learning 15

work page 2009

[7] [7]

In: International Conference on Learning Representations (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

work page 2021

[8] [8]

ICCV (2021)

Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., Song, D., Steinhardt, J., Gilmer, J.: The many faces of robustness: A critical analysis of out-of-distribution generalization. ICCV (2021)

work page 2021

[9] [9]

In: The Eleventh International Conference on Learning Representations (2023)

Ilharco, G., Ribeiro, M.T., Wortsman, M., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: The Eleventh International Conference on Learning Representations (2023)

work page 2023

[10] [10]

Journal of Mathematical Sciences 133, 1381–1382 (2006)

Kantorovich, L.: On the translocation of masses. Journal of Mathematical Sciences 133, 1381–1382 (2006)

work page 2006

[11] [11]

Proceedings of the National Academy of Sciences114(13), 3521–3526 (2017)

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., Hadsell, R.: Overcoming catastrophic forgetting in neu- ral networks. Proceedings of the National Academy of Sciences114(13), 3521–3526 (2017)

work page 2017

[12] [12]

Master’s thesis, University of Tront (2009)

Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, University of Tront (2009)

work page 2009

[13] [13]

IEEE Transactions on Pattern Analysis and Machine Intelligence40(12), 2935–2947 (2018)

Li, Z., Hoiem, D.: Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence40(12), 2935–2947 (2018)

work page 2018

[14] [14]

In: Computer Vision – ECCV

Marczak, D., Twardowski, B., Trzciński, T., Cygert, S.: MAGMAX: Leveraging Model Merging for Seamless Continual Learning. In: Computer Vision – ECCV

work page

[15] [15]

pp. 379–395. Springer Nature Switzerland, Cham (2025)

work page 2025

[16] [16]

Learning Transferable Visual Models From Natural Language Supervision

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. CoRRabs/2103.00020(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[17] [17]

In: Proceedings of the 38th International Conference on Machine Learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transfer- able visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 8748–87...

work page 2021

[18] [18]

Progressive Neural Networks

Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., Hadsell, R.: Progressive neural networks. arXiv:1606.04671 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[19] [19]

Schuhmann, C., Vencu, R., Beaumont, R., Kaczmarczyk, R., Mullis, C., Katta, A., Coombes, T., Jitsev, J., Komatsuzaki, A.: LAION-400M: Open dataset of CLIP- filtered 400 million image-text pairs (2021)

work page 2021

[20] [20]

IEEE Transactions on Pattern Analysis and Machine Intelligence46(8), 5362–5383 (2024)

Wang, L., Zhang, X., Su, H., Zhu, J.: A comprehensive survey of continual learn- ing: Theory, method and application. IEEE Transactions on Pattern Analysis and Machine Intelligence46(8), 5362–5383 (2024)

work page 2024

[21] [21]

In: Proceedings of the 39th International Conference on Machine Learning

Wortsman, M., Ilharco, G., Gadre, S.Y., Roelofs, R., Gontijo-Lopes, R., Morcos, A.S., Namkoong, H., Farhadi, A., Carmon, Y., Kornblith, S., Schmidt, L.: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of M...

work page 2022

[22] [22]

In: Advances in Neural Information Processing Systems

Yadav, P., Tam, D., Choshen, L., Raffel, C.A., Bansal, M.: TIES-Merging: Re- solving Interference When Merging Models. In: Advances in Neural Information Processing Systems. vol. 36, pp. 7093–7115. Curran Associates, Inc. (2023) 16 K. Hiroshima et al. A Results in Other Dataset Settings We report our main results with the experimental settings in Section ...

work page 2023