On the Vulnerability of Parameter-Level Defenses to Model Merging

Jian Liang; Kuangpu Guo; Qingyan Zheng; Ran He; Tieniu Tan; Yongcan Yu; Zilei Wang

arxiv: 2606.30360 · v1 · pith:PLQRGUCUnew · submitted 2026-06-29 · 💻 cs.LG · cs.CV

On the Vulnerability of Parameter-Level Defenses to Model Merging

Kuangpu Guo , Qingyan Zheng , Jian Liang , Yongcan Yu , Zilei Wang , Ran He , Tieniu Tan This is my paper

Pith reviewed 2026-06-30 07:11 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords model mergingparameter-level defensestask vectorsanchor-guided attackmodel securityfine-tuning

0 comments

The pith

Linear parameter transformations meant to block unauthorized model merging can be analytically reversed using the original pretrained model as an anchor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that parameter-level defenses, which transform weights to prevent merging of expert models, leave the added task vectors small compared to the base pretrained model. This dominance lets an attacker treat the pretrained model as a fixed reference to solve for the secret transformation matrix exactly. The resulting Anchor-Guided Attack succeeds against existing safeguards in defense-agnostic settings. The authors also introduce Anchor-Repulsive Fine-tuning to push task vectors away from this vulnerability.

Core claim

The protected task vectors are inherently small in magnitude. Consequently, the protected weights remain overwhelmingly dominated by the pretrained model. Designating the pretrained model as a static reference anchor allows analytic recovery of the transformation matrix, bypassing the defenses.

What carries the argument

Anchor-Guided Attack (AGA) that aligns the protected model with the pretrained anchor to recover the transformation matrix analytically.

Load-bearing premise

The magnitude of the protected task vectors is small enough for the pretrained model to serve as a reliable static reference anchor allowing analytic recovery of the transformation matrix under realistic defense-agnostic conditions.

What would settle it

A counter-example where task vector magnitudes are comparable to the base model and AGA fails to recover the exact transformation matrix.

Figures

Figures reproduced from arXiv: 2606.30360 by Jian Liang, Kuangpu Guo, Qingyan Zheng, Ran He, Tieniu Tan, Yongcan Yu, Zilei Wang.

**Figure 1.** Figure 1: Illustration of proactive defense and attack in model merging. the parameters of multiple fine-tuned models derived from a common pretrained backbone [21, 51], practitioners can effectively synthesize cross-task expertise without retraining from scratch. Despite its remarkable efficiency, the inherent openness of model merging introduces severe intellectual property (IP) [10, 44] risks. As illustrated in … view at source ↗

**Figure 2.** Figure 2: Analysis of parameter magnitude and optimization landscape. (a) Comparison of the Frobenius norm between the protected pretrained weights (W P pre) and the protected task vector (τ P ) on the ViT-B/32 model finetuend on Cars. (b) Loss landscape illustration of our attack against the state-of-the-art (SOTA) protections. smaller than that of the transformed anchor WpreP, typically by two to three orders of … view at source ↗

**Figure 3.** Figure 3: Evaluation of protection efficacy against attack and standalone utility on Qwen2-7B. (a) Protected-task performance of AGA-attacked models under various defenses. (b) Standalone accuracy of individual fine-tuned models [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Loss landscape illustration of our protection ARF against our attack AGA. 8.2 Proactive Protection in Model Merging The growing accessibility of model merging has raised critical security concerns regarding the unauthorized exploitation of proprietary model weights. Consequently, proactive protection methods have been rapidly developed to safeguard intellectual property [46, 47]. These defenses generally … view at source ↗

**Figure 5.** Figure 5: Comparison of the Frobenius norm between the protected pretrained weights (WP pre) and the protected task vector (τ P ) on the ViT-L/14 model finetuned on Cars. The visualization highlights a severe magnitude disparity, demonstrating that the protected pretrained anchor overwhelmingly dominates the task vector across all evaluated defenses. the pretrained anchor mathematically, the recovered model θ a A f… view at source ↗

**Figure 6.** Figure 6: Comparison of the Frobenius norm between the protected pretrained weights (WP pre) and the protected task vector (τ P ) on the GPT2 model finetuned on QQP. Params MergeLock MergeBarrier 10 3 10 4 10 5 10 6 10 7 10 8 L1 Norm Sum WP pre P [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of the Frobenius norm between the protected pretrained weights (WP pre) and the protected task vector (τ P ) on the Qwen2-7B model finetuned on Alpaca. 11 More Experimental Results 11.1 Extended Evaluation of Our Attack on ViT-L/14 and GPT-2 To rigorously substantiate the cross-architecture and cross-modal generalization capabilities of our Anchor-Guided Attack (AGA), we provide extended evalua… view at source ↗

read the original abstract

The training-free integration of expert models via model merging has exposed significant security risks, enabling free-riders to combine specialized models without authorization. Recent works propose parameter-level defenses that employ linear parameter transformations to neutralize this threat. In this paper, we systematically analyze such defenses and reveal that their protected task vectors are inherently small in magnitude. Consequently, the protected weights remain overwhelmingly dominated by the pretrained model. Based on this observation, we designate the pretrained model as a static reference anchor and propose the Anchor-Guided Attack (AGA) to circumvent existing safeguards. Specifically, AGA aligns the protected model with this anchor to recover the transformation matrix analytically. Extensive evaluations validate that AGA consistently bypasses both individual and composite defenses under realistic defense-agnostic scenarios. Furthermore, we provide Anchor-Repulsive Fine-tuning (ARF), a defense method to mitigate the anchor dominance leveraged by AGA. Empirical results confirm that ARF effectively defeats the proposed attack. Our code is available at https://github.com/krumpguo/secure-merge-attack.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows parameter-level merging defenses can be bypassed by anchoring to the pretrained model to recover the transformation matrix analytically, and offers a counter-defense, but the small-magnitude assumption needs more justification.

read the letter

The main thing to know is that the authors identify a practical way to defeat parameter-level defenses on model merging by treating the pretrained weights as a fixed anchor and recovering the linear transformation analytically. They also introduce a mitigation for that specific vulnerability.

The new pieces are the Anchor-Guided Attack (AGA) and the Anchor-Repulsive Fine-tuning (ARF). AGA rests on the observation that defended task vectors stay small enough that the protected weights remain close to the base model, letting them solve for the transformation matrix directly. They run AGA on both single and composite defenses under defense-agnostic conditions and report that it succeeds consistently. ARF is then presented as a way to reduce the anchor dominance, with experiments showing it limits the attack.

The empirical side is the clearest strength. The evaluations cover realistic merging scenarios, the attack construction is straightforward once the magnitude observation is granted, and the code release makes the results checkable. This gives practitioners working on secure model sharing something concrete to test against.

The softer part is the generality of the claim that protected task vectors are inherently small. The paper presents this as a property of linear defenses rather than an empirical finding limited to the cases they tested. There is no bound or sensitivity analysis showing why a defense could not apply a scaling or rotation that keeps the merging protection while making the effective task vector large enough to break the static-anchor approximation. If that happens, the closed-form recovery would not hold.

This paper is for people working on model merging security and collaborative training. A reader focused on practical attacks and defenses will find usable ideas here. It deserves peer review because it surfaces a concrete vulnerability with both an attack and a response, even though the theoretical grounding on the magnitude assumption could be tighter.

Referee Report

2 major / 0 minor

Summary. The paper claims that parameter-level defenses against unauthorized model merging, which rely on linear transformations of task vectors, produce protected task vectors that are inherently small in magnitude. As a result, the defended weights remain dominated by the pretrained model, which can be used as a static reference anchor. The authors propose the Anchor-Guided Attack (AGA) that analytically recovers the unknown linear transformation matrix by alignment with this anchor, bypassing both individual and composite defenses in defense-agnostic settings. They further introduce Anchor-Repulsive Fine-tuning (ARF) as a countermeasure that mitigates the anchor dominance exploited by AGA. The claims are supported by extensive empirical evaluations and publicly released code.

Significance. If the magnitude observation and analytic recovery hold under the stated conditions, the work identifies a structural vulnerability in linear parameter-level defenses for model merging and supplies both an attack (AGA) and a mitigation (ARF). The availability of code supports reproducibility. The result is significant for the security of model merging but is currently limited by the lack of a general theoretical characterization of when the anchor assumption holds.

major comments (2)

[Abstract] Abstract: the central claim that protected task vectors are 'inherently small in magnitude' (and therefore that the pretrained model can serve as a static reference anchor enabling analytic recovery of arbitrary linear transformations) is presented as a general property of linear defenses. No theoretical bound, sensitivity analysis, or proof is supplied showing why this must hold for any linear map that neutralizes merging; the claim therefore rests on an empirical observation whose scope is unclear.
[Anchor-Guided Attack] AGA derivation (implicit in the abstract and attack description): the closed-form recovery of the transformation matrix T assumes that the protected weights remain overwhelmingly dominated by the pretrained model under defense-agnostic conditions. If a linear defense applies scaling or rotation that makes the effective task-vector magnitude comparable to the base weights while still preventing merging, the approximation fails and the analytic guarantee no longer applies; no analysis of this regime is provided.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight important aspects regarding the theoretical grounding of our observations. We address each major comment below, clarifying the empirical basis of our claims while agreeing to strengthen the manuscript with additional discussion and analysis where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that protected task vectors are 'inherently small in magnitude' (and therefore that the pretrained model can serve as a static reference anchor enabling analytic recovery of arbitrary linear transformations) is presented as a general property of linear defenses. No theoretical bound, sensitivity analysis, or proof is supplied showing why this must hold for any linear map that neutralizes merging; the claim therefore rests on an empirical observation whose scope is unclear.

Authors: We agree that our central observation is grounded in systematic empirical evaluation across existing linear parameter-level defenses rather than a formal proof that the property holds for every conceivable linear map. The term 'inherently' in the manuscript reflects the consistent pattern that effective neutralization of unauthorized merging requires transformations that keep task-vector magnitudes small relative to the pretrained weights. We will revise the abstract to emphasize the empirical scope and add a dedicated discussion subsection on the conditions and sensitivity of the anchor assumption, including any available bounds derived from the merging objective. revision: yes
Referee: [Anchor-Guided Attack] AGA derivation (implicit in the abstract and attack description): the closed-form recovery of the transformation matrix T assumes that the protected weights remain overwhelmingly dominated by the pretrained model under defense-agnostic conditions. If a linear defense applies scaling or rotation that makes the effective task-vector magnitude comparable to the base weights while still preventing merging, the approximation fails and the analytic guarantee no longer applies; no analysis of this regime is provided.

Authors: The AGA derivation and its closed-form recovery are predicated on the dominance observed in current defenses, which our extensive experiments confirm holds in defense-agnostic settings. We acknowledge that hypothetical linear transformations achieving comparable magnitudes while still blocking merging would fall outside the analyzed regime. We will incorporate a new subsection analyzing boundary cases, including synthetic experiments that vary task-vector magnitude ratios and measure when the analytic recovery remains accurate versus when it degrades. revision: yes

standing simulated objections not resolved

A complete general theoretical characterization proving the anchor assumption for arbitrary linear maps that neutralize merging.

Circularity Check

0 steps flagged

No significant circularity; derivation rests on empirical observation rather than self-referential construction.

full rationale

The paper's load-bearing premise is the empirical finding that protected task vectors have small magnitude, allowing the pretrained model to act as a static anchor for analytic recovery of the transformation matrix T via alignment. This is explicitly framed as an observation from analyzing existing defenses, not a mathematical identity or fitted parameter renamed as a prediction. No equations are provided that reduce the attack derivation to its own inputs by construction, and no self-citation chain is invoked to justify uniqueness or the ansatz. The subsequent proposal of ARF as a countermeasure further indicates independent content outside any circular loop. The result is therefore self-contained against external benchmarks of the evaluated defenses.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.1-grok · 5721 in / 1082 out tokens · 28853 ms · 2026-06-30T07:11:23.472507+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 8 canonical work pages · 4 internal anchors

[1]

Program Synthesis with Large Language Models

Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., et al.: Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

In: Assignment and Matching Problems: Solution Methods with FORTRAN-Programs, pp

Burkard, R.E., Derigs, U.: The linear sum assignment problem. In: Assignment and Matching Problems: Solution Methods with FORTRAN-Programs, pp. 1–15 (1980)

1980
[3]

Machine learning pp

Caruana, R.: Multitask learning. Machine learning pp. 41–75 (1997)

1997
[4]

In: Proceedings of the 11th International Workshop on Semantic Evaluation (2017)

Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation (2017)

2017
[5]

In: Proc

Chen, W.J., Tsai, M.Y., Lee, C.Y., Yu, C.M.: Defending unauthorized model merg- ing via dual-stage weight protection. In: Proc. CVPR (2026)

2026
[6]

Chen, Z., Zhang, H., Zhang, X., Zhao, L.: Quora question pairs (2018)

2018
[7]

Proceedings of the IEEE pp

Cheng, G., Han, J., Lu, X.: Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE pp. 1865–1883 (2017)

2017
[8]

In: Proc

Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: Proc. CVPR (2014)

2014
[9]

Training Verifiers to Solve Math Word Problems

Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., et al.: Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[10]

In: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis (2023)

Cong, T., Ran, D., Liu, Z., He, X., Liu, J., Gong, Y., Li, Q., Wang, A., Wang, X.: Have you merged my model? on the robustness of large language model ip protec- tion methods against model merging. In: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis (2023)

2023
[11]

IEEE Signal Processing Magazine pp

Deng, L.: The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine pp. 141–142 (2012)

2012
[12]

arXiv preprint arXiv:2506.09446 (2025)

Ding, Y., Liang, J., Jiang, B., Wang, Z., Zheng, A., Luo, B.: Harmonizing and merging source models for clip-based domain generalization. arXiv preprint arXiv:2506.09446 (2025)

work page arXiv 2025
[13]

In: Third International Workshop on Paraphrasing (2005)

Dolan, B., Brockett, C.: Automatically constructing a corpus of sentential para- phrases. In: Third International Workshop on Paraphrasing (2005)

2005
[14]

In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing (2007)

Giampiccolo, D., Magnini, B., Dagan, I., Dolan, W.B.: The third pascal recognizing textual entailment challenge. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing (2007)

2007
[15]

In: Proc

Guo, K., Yu, A., Liang, J., Ding, Y., Wang, Z., He, R., Tan, T.: Stay unique, stay efficient: Preserving model personality in multi-task merging. In: Proc. ECCV (2026)

2026
[16]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing pp

Helber, P., Bischke, B., Dengel, A., Borth, D.: Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing pp. 2217–2226 (2019)

2019
[17]

In: Proc

Ilharco, G., Ribeiro, M.T., Wortsman, M., Gururangan, S., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: Proc. ICLR (2023)

2023
[18]

In: Proc

Junhao, W., Zhe, Y., Sakuma, J.: Disrupting model merging: A parameter-level defense without sacrificing accuracy. In: Proc. ICCV (2025)

2025
[19]

In: Proc

Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine- grained categorization. In: Proc. ICCV (2013) On the Vulnerability of Parameter-Level Defenses to Model Merging 17

2013
[20]

Naval research logistics quarterly pp

Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly pp. 83–97 (1955)

1955
[21]

In: Proc

Li, L., Zhang, T., Bu, Z., Wang, S., He, H., Fu, J., Wu, Y., Bian, J., Chen, Y., Bengio, Y.: Map: Low-compute model merging with amortized pareto fronts via quadratic approximation. In: Proc. ICLR (2025)

2025
[22]

In: Proc

Li, Q., Pan, M., Chen, J., Teng, F., Shen, Z., Su, G., Peng, H., Zhang, X.: Do not merge my model! safeguarding open-source llms against unauthorized model merging. In: Proc. AAAI (2026)

2026
[23]

Li, X., Zhang, T., Dubois, Y., Taori, R., Gulrajani, I., Guestrin, C., Liang, P., Hashimoto, T.B.: Alpacaeval: An automatic evaluator of instruction-following models (2023)

2023
[24]

In: Proc

Lu, Z., Fan, C., Wei, W., Qu, X., Chen, D., Cheng, Y.: Twin-merging: Dynamic integration of modular expertise in model merging. In: Proc. NeurIPS (2024)

2024
[25]

https://modelscope.cn/ (2025)

ModeScope: Modelscope: Open-source model platform. https://modelscope.cn/ (2025)

2025
[26]

In: Proc

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y., et al.: Reading digits in natural images with unsupervised feature learning. In: Proc. NeurIPS (2011)

2011
[27]

In: Proc

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: Proc. ICML (2021)

2021
[28]

OpenAI blog p

Radford,A.,Wu,J.,Child,R.,Luan,D.,Amodei,D.,Sutskever,I.,etal.:Language models are unsupervised multitask learners. OpenAI blog p. 9 (2019)

2019
[29]

In: Proc

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proc. EMNLP (2016)

2016
[30]

In: Proc

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. EMNLP (2013)

2013
[31]

In: Proc

Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: The german traffic sign recogni- tion benchmark: a multi-class classification competition. In: Proc. IJCNN (2011)

2011
[32]

In: Proc

Sun, W., Li, Q., Geng, Y.a., Li, B.: Cat merging: A training-free approach for resolving conflicts in model merging. In: Proc. ICML (2025)

2025
[33]

In: Proc

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proc. NeurIPS (2017)

2017
[34]

In: Proc

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: Glue: A multi- task benchmark and analysis platform for natural language understanding. In: Proc. ICLR (2018)

2018
[35]

In: Proc

Wang, K., Dimitriadis, N., Ortiz-Jimenez, G., Fleuret, F., Frossard, P.: Localizing task information for improved model merging and compression. In: Proc. ICML (2025)

2025
[36]

arXiv preprint arXiv:2509.01548 (2025)

Wang, Z., Yang, E., Yin, L., Liu, S., Shen, L.: Model unmerging: Making your models unmergeable for secure model sharing. arXiv preprint arXiv:2509.01548 (2025)

work page arXiv 2025
[37]

In: Proc

Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. In: Proc. ACL (2019)

2019
[38]

The Annals of Mathematical Statis- tics pp

Watson, G.S.: Linear least squares regression. The Annals of Mathematical Statis- tics pp. 1679–1699 (1967)

1967
[39]

In: Proc

Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proc. ACL (2018) 18 Guo. et al

2018
[40]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., et al.: Huggingface’s transformers: State-of- the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910
[41]

Big Data and Cognitive Computing p

Wu, X.K., Chen, M., Li, W., Wang, R., Lu, L., Liu, J., Hwang, K., Hao, Y., Pan, Y., Meng, Q., et al.: Llm fine-tuning: Concepts, opportunities, and challenges. Big Data and Cognitive Computing p. 87 (2025)

2025
[42]

In: Proc

Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: Proc. CVPR (2010)

2010
[43]

In: Proc

Yadav, P., Tam, D., Choshen, L., Raffel, C.A., Bansal, M.: Ties-merging: Resolving interference when merging models. In: Proc. NeurIPS (2023)

2023
[44]

In: Proc

Yamabe, S., Waseda, F.K., Takahashi, T., Wataoka, K.: Mergeprint: Merge- resistant fingerprints for robust black-box ownership verification of large language models. In: Proc. ACL (2025)

2025
[45]

Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., Yang, J., Xu, J., Zhou, J., Bai, J., He, J., Lin, J., Dang, K., Lu, K., Chen, K., Yang, K., Li, M., Xue, M., Ni, N., Zhang, P., Wang, P., Peng, R., Men, R., Gao, R., Lin, R., Wan...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[46]

In: Proc

Yu, L., Yu, B., Yu, H., Huang, F., Li, Y.: Language models are super mario: Absorbing abilities from homologous models as a free lunch. In: Proc. ICML (2024)

2024
[47]

arXiv preprint arXiv:2505.22271 (2025)

Yu, Y., Wang, Y., He, R., Liang, J.: Test-time immunization: A universal de- fense framework against jailbreaks for (multimodal) large language models. arXiv preprint arXiv:2505.22271 (2025)

work page arXiv 2025
[48]

arXiv preprint arXiv:2402.17193 (2024)

Zhang,B.,Liu,Z.,Cherry,C.,Firat,O.:Whenscalingmeetsllmfinetuning:Theef- fect of data, model and finetuning method. arXiv preprint arXiv:2402.17193 (2024)

work page arXiv 2024
[49]

National Science Review pp

Zhang, Y., Yang, Q.: An overview of multi-task learning. National Science Review pp. 30–43 (2018)

2018
[50]

IEEE Transactions on Knowledge and Data Engineering pp

Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering pp. 5586–5609 (2021)

2021
[51]

In: Proc

Zhao, Z., Shen, T., Zhu, D., Li, Z., Su, J., Wang, X., Wu, F.: Merging loras like playing lego: Pushing the modularity of lora to extremes through rank-wise clus- tering. In: Proc. ICLR (2025) On the Vulnerability of Parameter-Level Defenses to Model Merging 1 7 The Proof Theorem 3 (Error Bound of Attention Module Recovery).LetW f t = Wpre +τ∈R N×D (where...

2025

[1] [1]

Program Synthesis with Large Language Models

Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., et al.: Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[2] [2]

In: Assignment and Matching Problems: Solution Methods with FORTRAN-Programs, pp

Burkard, R.E., Derigs, U.: The linear sum assignment problem. In: Assignment and Matching Problems: Solution Methods with FORTRAN-Programs, pp. 1–15 (1980)

1980

[3] [3]

Machine learning pp

Caruana, R.: Multitask learning. Machine learning pp. 41–75 (1997)

1997

[4] [4]

In: Proceedings of the 11th International Workshop on Semantic Evaluation (2017)

Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation (2017)

2017

[5] [5]

In: Proc

Chen, W.J., Tsai, M.Y., Lee, C.Y., Yu, C.M.: Defending unauthorized model merg- ing via dual-stage weight protection. In: Proc. CVPR (2026)

2026

[6] [6]

Chen, Z., Zhang, H., Zhang, X., Zhao, L.: Quora question pairs (2018)

2018

[7] [7]

Proceedings of the IEEE pp

Cheng, G., Han, J., Lu, X.: Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE pp. 1865–1883 (2017)

2017

[8] [8]

In: Proc

Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: Proc. CVPR (2014)

2014

[9] [9]

Training Verifiers to Solve Math Word Problems

Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., et al.: Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[10] [10]

In: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis (2023)

Cong, T., Ran, D., Liu, Z., He, X., Liu, J., Gong, Y., Li, Q., Wang, A., Wang, X.: Have you merged my model? on the robustness of large language model ip protec- tion methods against model merging. In: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis (2023)

2023

[11] [11]

IEEE Signal Processing Magazine pp

Deng, L.: The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine pp. 141–142 (2012)

2012

[12] [12]

arXiv preprint arXiv:2506.09446 (2025)

Ding, Y., Liang, J., Jiang, B., Wang, Z., Zheng, A., Luo, B.: Harmonizing and merging source models for clip-based domain generalization. arXiv preprint arXiv:2506.09446 (2025)

work page arXiv 2025

[13] [13]

In: Third International Workshop on Paraphrasing (2005)

Dolan, B., Brockett, C.: Automatically constructing a corpus of sentential para- phrases. In: Third International Workshop on Paraphrasing (2005)

2005

[14] [14]

In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing (2007)

Giampiccolo, D., Magnini, B., Dagan, I., Dolan, W.B.: The third pascal recognizing textual entailment challenge. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing (2007)

2007

[15] [15]

In: Proc

Guo, K., Yu, A., Liang, J., Ding, Y., Wang, Z., He, R., Tan, T.: Stay unique, stay efficient: Preserving model personality in multi-task merging. In: Proc. ECCV (2026)

2026

[16] [16]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing pp

Helber, P., Bischke, B., Dengel, A., Borth, D.: Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing pp. 2217–2226 (2019)

2019

[17] [17]

In: Proc

Ilharco, G., Ribeiro, M.T., Wortsman, M., Gururangan, S., Schmidt, L., Hajishirzi, H., Farhadi, A.: Editing models with task arithmetic. In: Proc. ICLR (2023)

2023

[18] [18]

In: Proc

Junhao, W., Zhe, Y., Sakuma, J.: Disrupting model merging: A parameter-level defense without sacrificing accuracy. In: Proc. ICCV (2025)

2025

[19] [19]

In: Proc

Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine- grained categorization. In: Proc. ICCV (2013) On the Vulnerability of Parameter-Level Defenses to Model Merging 17

2013

[20] [20]

Naval research logistics quarterly pp

Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly pp. 83–97 (1955)

1955

[21] [21]

In: Proc

Li, L., Zhang, T., Bu, Z., Wang, S., He, H., Fu, J., Wu, Y., Bian, J., Chen, Y., Bengio, Y.: Map: Low-compute model merging with amortized pareto fronts via quadratic approximation. In: Proc. ICLR (2025)

2025

[22] [22]

In: Proc

Li, Q., Pan, M., Chen, J., Teng, F., Shen, Z., Su, G., Peng, H., Zhang, X.: Do not merge my model! safeguarding open-source llms against unauthorized model merging. In: Proc. AAAI (2026)

2026

[23] [23]

Li, X., Zhang, T., Dubois, Y., Taori, R., Gulrajani, I., Guestrin, C., Liang, P., Hashimoto, T.B.: Alpacaeval: An automatic evaluator of instruction-following models (2023)

2023

[24] [24]

In: Proc

Lu, Z., Fan, C., Wei, W., Qu, X., Chen, D., Cheng, Y.: Twin-merging: Dynamic integration of modular expertise in model merging. In: Proc. NeurIPS (2024)

2024

[25] [25]

https://modelscope.cn/ (2025)

ModeScope: Modelscope: Open-source model platform. https://modelscope.cn/ (2025)

2025

[26] [26]

In: Proc

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y., et al.: Reading digits in natural images with unsupervised feature learning. In: Proc. NeurIPS (2011)

2011

[27] [27]

In: Proc

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: Proc. ICML (2021)

2021

[28] [28]

OpenAI blog p

Radford,A.,Wu,J.,Child,R.,Luan,D.,Amodei,D.,Sutskever,I.,etal.:Language models are unsupervised multitask learners. OpenAI blog p. 9 (2019)

2019

[29] [29]

In: Proc

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proc. EMNLP (2016)

2016

[30] [30]

In: Proc

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. EMNLP (2013)

2013

[31] [31]

In: Proc

Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: The german traffic sign recogni- tion benchmark: a multi-class classification competition. In: Proc. IJCNN (2011)

2011

[32] [32]

In: Proc

Sun, W., Li, Q., Geng, Y.a., Li, B.: Cat merging: A training-free approach for resolving conflicts in model merging. In: Proc. ICML (2025)

2025

[33] [33]

In: Proc

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proc. NeurIPS (2017)

2017

[34] [34]

In: Proc

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: Glue: A multi- task benchmark and analysis platform for natural language understanding. In: Proc. ICLR (2018)

2018

[35] [35]

In: Proc

Wang, K., Dimitriadis, N., Ortiz-Jimenez, G., Fleuret, F., Frossard, P.: Localizing task information for improved model merging and compression. In: Proc. ICML (2025)

2025

[36] [36]

arXiv preprint arXiv:2509.01548 (2025)

Wang, Z., Yang, E., Yin, L., Liu, S., Shen, L.: Model unmerging: Making your models unmergeable for secure model sharing. arXiv preprint arXiv:2509.01548 (2025)

work page arXiv 2025

[37] [37]

In: Proc

Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. In: Proc. ACL (2019)

2019

[38] [38]

The Annals of Mathematical Statis- tics pp

Watson, G.S.: Linear least squares regression. The Annals of Mathematical Statis- tics pp. 1679–1699 (1967)

1967

[39] [39]

In: Proc

Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. In: Proc. ACL (2018) 18 Guo. et al

2018

[40] [40]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., et al.: Huggingface’s transformers: State-of- the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910

[41] [41]

Big Data and Cognitive Computing p

Wu, X.K., Chen, M., Li, W., Wang, R., Lu, L., Liu, J., Hwang, K., Hao, Y., Pan, Y., Meng, Q., et al.: Llm fine-tuning: Concepts, opportunities, and challenges. Big Data and Cognitive Computing p. 87 (2025)

2025

[42] [42]

In: Proc

Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: Proc. CVPR (2010)

2010

[43] [43]

In: Proc

Yadav, P., Tam, D., Choshen, L., Raffel, C.A., Bansal, M.: Ties-merging: Resolving interference when merging models. In: Proc. NeurIPS (2023)

2023

[44] [44]

In: Proc

Yamabe, S., Waseda, F.K., Takahashi, T., Wataoka, K.: Mergeprint: Merge- resistant fingerprints for robust black-box ownership verification of large language models. In: Proc. ACL (2025)

2025

[45] [45]

Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., Yang, J., Xu, J., Zhou, J., Bai, J., He, J., Lin, J., Dang, K., Lu, K., Chen, K., Yang, K., Li, M., Xue, M., Ni, N., Zhang, P., Wang, P., Peng, R., Men, R., Gao, R., Lin, R., Wan...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[46] [46]

In: Proc

Yu, L., Yu, B., Yu, H., Huang, F., Li, Y.: Language models are super mario: Absorbing abilities from homologous models as a free lunch. In: Proc. ICML (2024)

2024

[47] [47]

arXiv preprint arXiv:2505.22271 (2025)

Yu, Y., Wang, Y., He, R., Liang, J.: Test-time immunization: A universal de- fense framework against jailbreaks for (multimodal) large language models. arXiv preprint arXiv:2505.22271 (2025)

work page arXiv 2025

[48] [48]

arXiv preprint arXiv:2402.17193 (2024)

Zhang,B.,Liu,Z.,Cherry,C.,Firat,O.:Whenscalingmeetsllmfinetuning:Theef- fect of data, model and finetuning method. arXiv preprint arXiv:2402.17193 (2024)

work page arXiv 2024

[49] [49]

National Science Review pp

Zhang, Y., Yang, Q.: An overview of multi-task learning. National Science Review pp. 30–43 (2018)

2018

[50] [50]

IEEE Transactions on Knowledge and Data Engineering pp

Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering pp. 5586–5609 (2021)

2021

[51] [51]

In: Proc

Zhao, Z., Shen, T., Zhu, D., Li, Z., Su, J., Wang, X., Wu, F.: Merging loras like playing lego: Pushing the modularity of lora to extremes through rank-wise clus- tering. In: Proc. ICLR (2025) On the Vulnerability of Parameter-Level Defenses to Model Merging 1 7 The Proof Theorem 3 (Error Bound of Attention Module Recovery).LetW f t = Wpre +τ∈R N×D (where...

2025