pith. sign in

arxiv: 2510.10486 · v2 · submitted 2025-10-12 · 💻 cs.CR · cs.AI

MEASER: Malware embedding attacks on open-source LLMs

Pith reviewed 2026-05-18 08:06 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords malware embeddingopen-source LLMsmodel poisoningquantization robustnessparameter-efficient fine-tuningstealthy attackspayload injectionbackdoor triggers
0
0 comments X

The pith

Adversaries with internal access can embed malware into open-source LLMs during sharing, recovering payloads with zero errors while preserving model performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes malware embedding attacks on open-source LLMs across threat models and focuses on the case of adversaries with internal knowledge who modify models before distribution. It proposes MEASER, which selects parameters via a performance-aware metric, embeds payloads using magnitude-adaptive modulation protected by error-correcting codes, and injects triggers for later activation. The approach claims perfect payload recovery even after quantization and parameter-efficient fine-tuning, with stealth that exceeds prior methods for general neural networks. A sympathetic reader would care because open-source LLMs are widely downloaded and deployed, creating opportunities for hidden compromise in collaborative model ecosystems.

Core claim

MEASER is the first malware embedding attack designed specifically for open-source LLMs. It proceeds by identifying the least disruptive parameters with a performance-aware importance metric, embedding the payload through Magnitude-Adaptive Relative Quantization Index Modulation combined with LDPC codes and spread spectrum modulation for robustness, and inserting triggers that execute the payload on chosen inputs. Experiments across four popular open-source LLMs show zero bit error rate in all recovery settings, including after quantization, together with significantly higher stealth rates than existing malware embedding attacks developed for general deep neural networks.

What carries the argument

MEASER pipeline that identifies targeted parameters with a performance-aware importance metric, embeds payloads via the Magnitude-Adaptive Relative Quantization Index Modulation mechanism augmented by LDPC codes and spread spectrum modulation, and activates via injected triggers.

If this is right

  • Payloads are recovered with zero bit error rate in every tested configuration on four open-source LLMs.
  • Stealth rates exceed those of prior malware embedding attacks developed for general deep neural networks.
  • The embedded malware remains effective and stealthy after quantization of the host model.
  • Model performance on standard tasks shows only minimal degradation after embedding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Model distributors may need to add weight-integrity verification steps before public release to block such embeddings.
  • The same parameter-selection and modulation ideas could be tested on other publicly shared neural network architectures beyond LLMs.
  • End users downloading open-source models could run lightweight scans for anomalous parameter distributions before deployment.

Load-bearing premise

The adversary possesses internal knowledge and can inject the payload and trigger directly into the model during the sharing phase.

What would settle it

A controlled test recovering a non-zero bit error rate from the payload after applying standard post-training quantization to a MEASER-embedded model.

Figures

Figures reproduced from arXiv: 2510.10486 by Aodi Liu, Hailong Ma, Hu Tao, Ming Tan, Qian Chen, Wei Li, Zilong Wang.

Figure 1
Figure 1. Figure 1: High-level view of SASER. 5.1 Overview To accomplish the aforementioned objectives, SASER elaborately devises three sequential stages, i.e., TARGET, LAUNCH, and EXPLODE, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: 𝑑PAI of models with 𝑛=11. Results are averaged over 3 runs with different random seeds on MMLU. We surmise this is because parameter perturbation via random bit substitution would impair language modeling capabilities more quickly. To further understand the difference between 𝐷acc and 𝐷ppl in contributing to model performance, we illustrate different responses of open-source LLMs in [PITH_FULL_IMAGE:figur… view at source ↗
Figure 5
Figure 5. Figure 5: 𝐷acc & 𝐷ppl of models with 𝑛=11. Results are aver￾aged over 3 runs with different random seeds on MMLU [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: 𝐷acc & 𝐷ppl of LLaMA2 & ChatGLM3 on the MMLU. Results are averaged over 3 runs with different random seeds. In addition, we analyze the grouping methods, LSB numbers, distribution of vulnerable parameters, and contribution of 𝐷acc & 𝐷ppl when conducting SASER with robust mode in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: 𝑑PAI of models with 𝑛 = 4. Results are averaged over 3 runs with different random seeds on MMLU 0 5 10 15 20 25 30 Layer 0.0 0.2 0.4 0.6 0.8 1.0 Dacc (a) 𝐷acc of LLaMA2-7B 0 5 10 15 20 25 Layer 0.0 0.2 0.4 0.6 0.8 1.0 Dacc (b) 𝐷acc of ChatGLM3-6B 0 5 10 15 20 25 30 Layer 0.0 0.2 0.4 0.6 0.8 1.0 Dppl (c) 𝐷ppl of LLaMA2-7B 0 5 10 15 20 25 Layer 0.0 0.2 0.4 0.6 0.8 1.0 Dppl (d) 𝐷ppl of ChatGLM3-6B [PITH_FULL… view at source ↗
Figure 7
Figure 7. Figure 7: , 𝑑PAI increases along with the increase of 𝑛 from 0 to 1 across all grouping methods, which is in line with the results in the general mode. 1 2 3 4 5 6 7 8 n 0.0 0.2 0.4 0.6 0.8 1.0 PAI model-base name-base layer-base matrix-base (a) LLaMA2-7B 1 2 3 4 5 6 7 8 n 0.0 0.2 0.4 0.6 0.8 1.0 PAI model-base name-base layer-base matrix-base (b) ChatGLM3-6B [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: 𝐷acc & 𝐷ppl of models with 𝑛=4. Results are aver￾aged over 3 runs with different random seeds on MMLU. 1 2 3 4 5 6 7 8 n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Layer 0.0 0.2 0.4 0.6 0.8 1.0 (a) 𝐷acc of LLaMA2-7B 1 2 3 4 5 6 7 8 n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Layer 0.0 0.2 0.4 0.6 0.8 1.0 (b) 𝐷acc of ChatGLM3-6B 1 2… view at source ↗
Figure 8
Figure 8. Figure 8: 𝑑PAI of models on MMLU with 8-bit quantization. Results are averaged over 3 runs with different random seeds. Distribution of vulnerable parameters [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 11
Figure 11. Figure 11: 𝐷acc & 𝐷ppl of LLaMA2-7B and ChatGLM3-6B on MMLU in the case of 8-bit quantization. Results are averaged over 3 runs with different random seeds. A Appendix: Results of SASER with robust model We plot the attack performance in the setting of 8-bit quantization. Grouping methods of targeted parameters. As shown in [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
read the original abstract

Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks to the collaborative and sharing nature. Although full access to source codes, model parameters, and training data lays the groundwork for transparency, we argue that such a full-access manner is vulnerable to MEAs, and their ill-effects are not fully understood. In this paper, we conduct a systematic formalization for MEAs on open-source LLMs by enumerating all possible threat models associated with adversary objectives, knowledge, and capabilities. Therein, the threat posed by adversaries with internal knowledge, who inject payloads and triggers during the model sharing phase, is of practical interest. We go even further and propose the first MEA against open-source LLMs, dubbed MEASER, which wields impacts through identifying targeted parameters, embedding payloads, injecting triggers, and executing payloads sequentially. Particularly, MEASER enhances the attack robustness against quantization and parameter-efficient fine-tuning (PEFT) by employing the Magnitude-Adaptive Relative Quantization Index Modulation (MAR-QIM) mechanism, synergized with LDPC codes and spread spectrum modulation. In addition, to achieve stealthiness, MEASER devises the performance-aware importance metric to identify targeted parameters with the least degradation of model performance. Extensive experiments on four popular open-source LLMs show that the stealth rate of MEASER outperforms existing MEAs (for general DNNs) significantly, while consistently achieving a 0 bit error rate (BER) in all settings. Moreover, MEASER also maintains superior stealthiness on quantized models. We appeal for investigations on countermeasures against MEASER in view of the significant attack effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript formalizes malware embedding attacks (MEAs) on open-source LLMs by enumerating threat models and identifies the internal-knowledge adversary (payload/trigger injection during model sharing) as the practical case. It proposes MEASER, which selects target parameters via a performance-aware importance metric, embeds payloads using the Magnitude-Adaptive Relative Quantization Index Modulation (MAR-QIM) mechanism combined with LDPC codes and spread-spectrum modulation for robustness to quantization and PEFT, and reports 0 BER together with superior stealthiness versus prior DNN MEAs across four LLMs, including quantized settings.

Significance. If the central claims hold, the work is significant for highlighting concrete risks in open-source LLM distribution pipelines and for providing the first systematic MEA formalization tailored to LLMs. Credit is due for the explicit threat-model enumeration, the introduction of MAR-QIM for quantization robustness, and the multi-model experimental campaign that demonstrates consistent 0 BER. These elements would usefully motivate countermeasures research.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (experimental results): the claim of 'superior stealthiness on quantized models' rests on the performance-aware importance metric, yet the manuscript does not state whether this metric is evaluated before or after quantization. If quantization occurs after payload embedding (the common deployment sequence), the selected parameters could produce larger accuracy drops than reported, rendering the comparison to prior MEAs invalid under identical post-quantization conditions. This directly affects the load-bearing stealthiness claim for quantized settings.
  2. [§4] §4 (results tables): the reported 0 BER is presented without error bars, statistical significance tests, or explicit baseline re-evaluation details under the same quantization schedule used for MEASER. This leaves the 'consistently achieving a 0 bit error rate in all settings' assertion only partially supported and weakens the cross-model generalization claim.
minor comments (2)
  1. [§3] Notation for the performance-aware importance metric and MAR-QIM parameters is introduced without a consolidated table of symbols, making it harder to trace how the threshold interacts with LDPC rate and spread-spectrum parameters.
  2. [Abstract] The abstract states that MEASER 'outperforms existing MEAs (for general DNNs) significantly' but does not name the specific prior methods or cite their original papers in the abstract itself; a brief parenthetical reference would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We are pleased that the referee recognizes the significance of our work in formalizing malware embedding attacks on open-source LLMs and the contributions of the MEASER framework. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (experimental results): the claim of 'superior stealthiness on quantized models' rests on the performance-aware importance metric, yet the manuscript does not state whether this metric is evaluated before or after quantization. If quantization occurs after payload embedding (the common deployment sequence), the selected parameters could produce larger accuracy drops than reported, rendering the comparison to prior MEAs invalid under identical post-quantization conditions. This directly affects the load-bearing stealthiness claim for quantized settings.

    Authors: We appreciate this observation, which highlights an important clarification needed in the presentation of our results. The performance-aware importance metric is computed on the full-precision model prior to quantization and payload embedding, as this reflects the selection process in the threat model where the adversary has access to the model before distribution. To ensure the stealthiness claim holds under post-quantization evaluation, we will revise the manuscript to explicitly state the evaluation timing and include additional results demonstrating the accuracy degradation after quantization for the selected parameters compared to baselines. This will confirm that the superior stealthiness is maintained even when accuracy is measured post-quantization. revision: yes

  2. Referee: [§4] §4 (results tables): the reported 0 BER is presented without error bars, statistical significance tests, or explicit baseline re-evaluation details under the same quantization schedule used for MEASER. This leaves the 'consistently achieving a 0 bit error rate in all settings' assertion only partially supported and weakens the cross-model generalization claim.

    Authors: We agree that additional statistical rigor would better support our claims. The 0 BER results stem from the error-correcting properties of the LDPC codes combined with the MAR-QIM and spread-spectrum modulation, which are designed to achieve perfect recovery in the evaluated settings. Nevertheless, to address the concern, we will add error bars based on multiple independent runs with varied random seeds for parameter selection and embedding. We will also re-evaluate the baseline methods under the identical quantization schedule as MEASER and include these details in the revised tables and text. This will provide stronger evidence for the consistency across models and settings. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claims rest on experimental validation of the MEASER attack across four open-source LLMs, reporting 0 BER and improved stealthiness via MAR-QIM and a performance-aware importance metric. These outcomes are tied to empirical results on specific models and settings rather than any closed mathematical derivation or self-referential equations that reduce the reported performance to fitted inputs by construction. No load-bearing self-citations, ansatz smuggling, or uniqueness theorems imported from prior author work appear in the abstract or described mechanisms; the threat model formalization and attack steps are presented as independent contributions validated against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 1 invented entities

The central claim rests on the practical feasibility of internal-knowledge injection during sharing and on the effectiveness of newly introduced mechanisms (MAR-QIM, performance-aware metric) whose independent validation is not provided in the abstract.

free parameters (1)
  • performance-aware importance metric threshold
    Used to select targeted parameters while minimizing model degradation; value not specified in abstract.
invented entities (1)
  • MAR-QIM mechanism no independent evidence
    purpose: Provides robustness to quantization and PEFT via magnitude-adaptive relative quantization index modulation
    Newly devised technique introduced to address limitations of prior MEAs.

pith-pipeline@v0.9.0 · 5841 in / 1106 out tokens · 37111 ms · 2026-05-18T08:06:08.541493+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

  1. [1]

    d.].VirusTotal

    [n. d.].VirusTotal. https://www.virustotal.com/ Accessed via VirusTotal’s official website

  2. [2]

    Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr

    Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. 2024. Poisoning web-scale training datasets is practical. InProceedings of the 45th IEEE Symposium on Security and Privacy. 407–425

  3. [3]

    Zhiyang Chen, Yun Ma, Haiyang Shen, and Mugeng Liu. 2025. WeInfer: Unleash- ing the Power of WebGPU on LLM Inference in Web Browsers. InProceedings of the ACM on Web Conference. 4264–4273

  4. [4]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186

  5. [5]

    Ran Dubin. 2023. Disarming Attacks Inside Neural Network Models.IEEE Access 11 (2023), 124295–124303

  6. [6]

    Daniel Gilkarov and Ran Dubin. 2024. Steganalysis of AI Models LSB Attacks. IEEE Transactions on Information Forensics and Security19 (2024), 4767–4779

  7. [7]

    Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, ...

  8. [8]

    Daya Guo, Dejian Yang, Haowei Zhang, et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638

  9. [9]

    Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Un- derstanding. InProceedings of the 9th International Conference on Learning Repre- sentations. 1–27

  10. [10]

    2022.Pickle Files: The New ML Model Attack Vector

    HiddenLayer. 2022.Pickle Files: The New ML Model Attack Vector. https:// hiddenlayer.com/innovation-hub/pickle-strike/

  11. [11]

    Mancini, and Fernando Pérez-Cruz

    Dorjan Hitaj, Giulio Pagnotta, Briland Hitaj, Luigi V. Mancini, and Fernando Pérez-Cruz. 2022. MaleficNet: Hiding Malware into Deep Neural Networks Using Spread-Spectrum Channel Coding. InEuropean Symposium on Research in Computer Security. 425–444

  12. [12]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InProceedings of the 10th International Conference on Learning Representations. 1–13

  13. [13]

    Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards Reasoning in Large Language Models: A Survey. InFindings of the Association for Computational Linguistics: ACL 2023. 1049–1065

  14. [14]

    Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen. 2024. Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation. InPro- ceedings of the 12th International Conference on Learning Representations. 1–21

  15. [15]

    Jabari Kwesi, Jiaxun Cao, Riya Manchanda, and Pardis Emami-Naeini. 2025. Exploring user security and privacy attitudes and concerns toward the use of 7https://protectai.com/blog/why-ebpf-is-secure. Conference’17, July 2017, Washington, DC, USA Tan et al. General-Purpose LLM chatbots for mental health. InProceedings of the 34th USENIX Security Symposium. 6007–6024

  16. [16]

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles. 16: 611–626

  17. [17]

    Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mick- ael Rouvier, and Richard Dufour. 2024. BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains. InFindings of the Asso- ciation for Computational Linguistics: ACL 2024. 5848–5864

  18. [18]

    Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, and Yang Liu. 2024. BadEdit: Backdooring Large Language Models by Model Editing. InProceedings of the 12th International Conference on Learning Representations. 1–18

  19. [19]

    Tao Liu, Zihao Liu, Qi Liu, Wujie Wen, Wenyao Xu, and Ming Li. 2020. StegoNet: Turn Deep Neural Network into a Stegomalware. InProceedings of the 36th Annual Computer Security Applications Conference. 928–938

  20. [20]

    Tong Liu, Guozhu Meng, Peng Zhou, Zizhuang Deng, Shuaiyin Yao, and Kai Chen. 2025. The Art of Hide and Seek: Making Pickle-Based Model Supply Chain Poisoning Stealthy Again. arXiv:2508.19774 [cs.CR] https://arxiv.org/abs/2508. 19774

  21. [21]

    Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 61–68

  22. [22]

    Mmaitre314. 2024. Python pickle malware scanner. https://pypi.org/project/ picklescan/

  23. [23]

    Fengran Mo, Chuan Meng, Mohammad Aliannejadi, and Jian-Yun Nie. 2025. Con- versational search: From fundamentals to frontiers in the LLM era. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 4094–4097

  24. [24]

    Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. 2019. Data-Free Quantization Through Weight Equalization and Bias Correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1325– 1334

  25. [25]

    GPT-4 Technical Report

    OpenAI. 2023.GPT-4 Technical Report. Technical Report arXiv:2303.08774

  26. [26]

    ProtectAI. 2024. Modelscan: Protection against model serialization attacks. https: //github.com/protectai/modelscan

  27. [27]

    Python Software Foundation. 2025. pickle — Python object serialization

  28. [28]

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language Models are Unsupervised Multitask Learners.OpenAI Technical Report1, 8 (2019), 9

  29. [29]

    Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Wayne Xin Zhao, Furu Wei, and Ji-Rong Wen. 2024. Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 5701– 5715

  30. [30]

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lam- ple. 2023. LLaMA: Open and Efficient Foundation Language Models.arXiv preprint arXiv:2302.13971abs/2302.13971 (2023), 1–27

  31. [31]

    Yu-Lin Tsai, Chia-Yi Hsu, Chia-Mu Yu, and Pin-Yu Chen. 2021. Formalizing generalization and adversarial robustness of neural networks to weight perturba- tions. InProceedings of the 35th International Conference on Neural Information Processing Systems. 13 pages

  32. [32]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need.Proceedings of the 31st International Conference on Neural Information Processing Systems, 5998–6008

  33. [33]

    Zhi Wang, Chaoge Liu, and Xiang Cui. 2021. EvilModel: Hiding Malware In- side of Neural Network Models. In2021 IEEE Symposium on Computers and Communications. 1–7

  34. [34]

    Zhi Wang, Chaoge Liu, Xiang Cui, Jie Yin, and Xutong Wang. 2022. EvilModel 2.0: Bringing Neural Network Models into Malware Attacks.Computers & Security 120 (2022), 102807

  35. [35]

    Yu Xia, Subhojyoti Mukherjee, Zhouhang Xie, Junda Wu, Xintong Li, Ryan Aponte, Hanjia Lyu, Joe Barrow, Hongjie Chen, Franck Dernoncourt, Branislav Kveton, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Sungchul Kim, Zhengmian Hu, Yue Zhao, Nedim Lipka, Seunghyun Yoon, Ting-Hao ’Kenneth’ Huang, Zichao Wang, P...

  36. [36]

    Jianhan Xu, Linyang Li, Jiping Zhang, Xiaoqing Zheng, Kai-Wei Chang, Cho- Jui Hsieh, and Xuanjing Huang. 2022. Weight Perturbation as Defense against Adversarial Word Substitutions. InFindings of the Association for Computational Linguistics: EMNLP 2022. 7054–7063

  37. [37]

    Jiahao Yu, Xingwei Lin, Zheng Yu, and Xinyu Xing. 2024. LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks. InProceedings of the 33th USENIX Security Symposium. 4657–4674

  38. [38]

    Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, and Dinghao Wu. 2024. Jailbreak Open-Sourced Large Language Models via Enforced Decoding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 5475–5493

  39. [39]

    Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, and Yang Zhang. 2024. Instruction Backdoor Attacks Against Customized LLMs. InProceedings of the 33th USENIX Security Symposium. 1849–1866

  40. [40]

    Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, and Yang Zhang. 2025. JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation. InProceedings of the 34th USENIX Security Symposium. 1–23

  41. [41]

    Zhihao Zhang, Jun Zhao, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. Un- veiling Linguistic Regions in Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 6228–6247

  42. [42]

    Na Zhao, Kejiang Chen, Chuan Qin, Yi Yin, Weiming Zhang, and Nenghai Yu

  43. [43]

    In Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security

    Calibration-based Steganalysis for Neural Network Steganography. In Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security. 91–96

  44. [44]

    Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, and Nan Duan. 2024. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models. InFindings of the Association for Computational Linguistics: NAACL 2024. Mexico City, Mexico, 2299–2314. doi:10.18653/v1/2024.findings-naacl.149

  45. [45]

    Ruofan Zhu, Ganhao Chen, Wenbo Shen, Xiaofei Xie, and Rui Chang. 2025. My Model is Malware to You: Transforming AI Models into Malware by Abusing TensorFlow APIs. In2025 IEEE Symposium on Security and Privacy. 486–503. SASER: Stego attacks on open-source LLMs Conference’17, July 2017, Washington, DC, USA /uni00000013/uni00000018/uni00000014/uni00000013/un...