MEASER: Malware embedding attacks on open-source LLMs
Pith reviewed 2026-05-18 08:06 UTC · model grok-4.3
The pith
Adversaries with internal access can embed malware into open-source LLMs during sharing, recovering payloads with zero errors while preserving model performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MEASER is the first malware embedding attack designed specifically for open-source LLMs. It proceeds by identifying the least disruptive parameters with a performance-aware importance metric, embedding the payload through Magnitude-Adaptive Relative Quantization Index Modulation combined with LDPC codes and spread spectrum modulation for robustness, and inserting triggers that execute the payload on chosen inputs. Experiments across four popular open-source LLMs show zero bit error rate in all recovery settings, including after quantization, together with significantly higher stealth rates than existing malware embedding attacks developed for general deep neural networks.
What carries the argument
MEASER pipeline that identifies targeted parameters with a performance-aware importance metric, embeds payloads via the Magnitude-Adaptive Relative Quantization Index Modulation mechanism augmented by LDPC codes and spread spectrum modulation, and activates via injected triggers.
If this is right
- Payloads are recovered with zero bit error rate in every tested configuration on four open-source LLMs.
- Stealth rates exceed those of prior malware embedding attacks developed for general deep neural networks.
- The embedded malware remains effective and stealthy after quantization of the host model.
- Model performance on standard tasks shows only minimal degradation after embedding.
Where Pith is reading between the lines
- Model distributors may need to add weight-integrity verification steps before public release to block such embeddings.
- The same parameter-selection and modulation ideas could be tested on other publicly shared neural network architectures beyond LLMs.
- End users downloading open-source models could run lightweight scans for anomalous parameter distributions before deployment.
Load-bearing premise
The adversary possesses internal knowledge and can inject the payload and trigger directly into the model during the sharing phase.
What would settle it
A controlled test recovering a non-zero bit error rate from the payload after applying standard post-training quantization to a MEASER-embedded model.
Figures
read the original abstract
Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks to the collaborative and sharing nature. Although full access to source codes, model parameters, and training data lays the groundwork for transparency, we argue that such a full-access manner is vulnerable to MEAs, and their ill-effects are not fully understood. In this paper, we conduct a systematic formalization for MEAs on open-source LLMs by enumerating all possible threat models associated with adversary objectives, knowledge, and capabilities. Therein, the threat posed by adversaries with internal knowledge, who inject payloads and triggers during the model sharing phase, is of practical interest. We go even further and propose the first MEA against open-source LLMs, dubbed MEASER, which wields impacts through identifying targeted parameters, embedding payloads, injecting triggers, and executing payloads sequentially. Particularly, MEASER enhances the attack robustness against quantization and parameter-efficient fine-tuning (PEFT) by employing the Magnitude-Adaptive Relative Quantization Index Modulation (MAR-QIM) mechanism, synergized with LDPC codes and spread spectrum modulation. In addition, to achieve stealthiness, MEASER devises the performance-aware importance metric to identify targeted parameters with the least degradation of model performance. Extensive experiments on four popular open-source LLMs show that the stealth rate of MEASER outperforms existing MEAs (for general DNNs) significantly, while consistently achieving a 0 bit error rate (BER) in all settings. Moreover, MEASER also maintains superior stealthiness on quantized models. We appeal for investigations on countermeasures against MEASER in view of the significant attack effectiveness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript formalizes malware embedding attacks (MEAs) on open-source LLMs by enumerating threat models and identifies the internal-knowledge adversary (payload/trigger injection during model sharing) as the practical case. It proposes MEASER, which selects target parameters via a performance-aware importance metric, embeds payloads using the Magnitude-Adaptive Relative Quantization Index Modulation (MAR-QIM) mechanism combined with LDPC codes and spread-spectrum modulation for robustness to quantization and PEFT, and reports 0 BER together with superior stealthiness versus prior DNN MEAs across four LLMs, including quantized settings.
Significance. If the central claims hold, the work is significant for highlighting concrete risks in open-source LLM distribution pipelines and for providing the first systematic MEA formalization tailored to LLMs. Credit is due for the explicit threat-model enumeration, the introduction of MAR-QIM for quantization robustness, and the multi-model experimental campaign that demonstrates consistent 0 BER. These elements would usefully motivate countermeasures research.
major comments (2)
- [Abstract and §4] Abstract and §4 (experimental results): the claim of 'superior stealthiness on quantized models' rests on the performance-aware importance metric, yet the manuscript does not state whether this metric is evaluated before or after quantization. If quantization occurs after payload embedding (the common deployment sequence), the selected parameters could produce larger accuracy drops than reported, rendering the comparison to prior MEAs invalid under identical post-quantization conditions. This directly affects the load-bearing stealthiness claim for quantized settings.
- [§4] §4 (results tables): the reported 0 BER is presented without error bars, statistical significance tests, or explicit baseline re-evaluation details under the same quantization schedule used for MEASER. This leaves the 'consistently achieving a 0 bit error rate in all settings' assertion only partially supported and weakens the cross-model generalization claim.
minor comments (2)
- [§3] Notation for the performance-aware importance metric and MAR-QIM parameters is introduced without a consolidated table of symbols, making it harder to trace how the threshold interacts with LDPC rate and spread-spectrum parameters.
- [Abstract] The abstract states that MEASER 'outperforms existing MEAs (for general DNNs) significantly' but does not name the specific prior methods or cite their original papers in the abstract itself; a brief parenthetical reference would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive comments. We are pleased that the referee recognizes the significance of our work in formalizing malware embedding attacks on open-source LLMs and the contributions of the MEASER framework. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (experimental results): the claim of 'superior stealthiness on quantized models' rests on the performance-aware importance metric, yet the manuscript does not state whether this metric is evaluated before or after quantization. If quantization occurs after payload embedding (the common deployment sequence), the selected parameters could produce larger accuracy drops than reported, rendering the comparison to prior MEAs invalid under identical post-quantization conditions. This directly affects the load-bearing stealthiness claim for quantized settings.
Authors: We appreciate this observation, which highlights an important clarification needed in the presentation of our results. The performance-aware importance metric is computed on the full-precision model prior to quantization and payload embedding, as this reflects the selection process in the threat model where the adversary has access to the model before distribution. To ensure the stealthiness claim holds under post-quantization evaluation, we will revise the manuscript to explicitly state the evaluation timing and include additional results demonstrating the accuracy degradation after quantization for the selected parameters compared to baselines. This will confirm that the superior stealthiness is maintained even when accuracy is measured post-quantization. revision: yes
-
Referee: [§4] §4 (results tables): the reported 0 BER is presented without error bars, statistical significance tests, or explicit baseline re-evaluation details under the same quantization schedule used for MEASER. This leaves the 'consistently achieving a 0 bit error rate in all settings' assertion only partially supported and weakens the cross-model generalization claim.
Authors: We agree that additional statistical rigor would better support our claims. The 0 BER results stem from the error-correcting properties of the LDPC codes combined with the MAR-QIM and spread-spectrum modulation, which are designed to achieve perfect recovery in the evaluated settings. Nevertheless, to address the concern, we will add error bars based on multiple independent runs with varied random seeds for parameter selection and embedding. We will also re-evaluate the baseline methods under the identical quantization schedule as MEASER and include these details in the revised tables and text. This will provide stronger evidence for the consistency across models and settings. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central claims rest on experimental validation of the MEASER attack across four open-source LLMs, reporting 0 BER and improved stealthiness via MAR-QIM and a performance-aware importance metric. These outcomes are tied to empirical results on specific models and settings rather than any closed mathematical derivation or self-referential equations that reduce the reported performance to fitted inputs by construction. No load-bearing self-citations, ansatz smuggling, or uniqueness theorems imported from prior author work appear in the abstract or described mechanisms; the threat model formalization and attack steps are presented as independent contributions validated against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- performance-aware importance metric threshold
invented entities (1)
-
MAR-QIM mechanism
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SASER devises the performance-aware importance metric to identify targeted parameters with the least degradation of model performance... MAR-QIM mechanism, synergized with LDPC codes and spread spectrum modulation
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
enhances the attack robustness against quantization... de-quantizing the embedded payloads
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
[n. d.].VirusTotal. https://www.virustotal.com/ Accessed via VirusTotal’s official website
-
[2]
Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr. 2024. Poisoning web-scale training datasets is practical. InProceedings of the 45th IEEE Symposium on Security and Privacy. 407–425
work page 2024
-
[3]
Zhiyang Chen, Yun Ma, Haiyang Shen, and Mugeng Liu. 2025. WeInfer: Unleash- ing the Power of WebGPU on LLM Inference in Web Browsers. InProceedings of the ACM on Web Conference. 4264–4273
work page 2025
-
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186
work page 2019
-
[5]
Ran Dubin. 2023. Disarming Attacks Inside Neural Network Models.IEEE Access 11 (2023), 124295–124303
work page 2023
-
[6]
Daniel Gilkarov and Ran Dubin. 2024. Steganalysis of AI Models LSB Attacks. IEEE Transactions on Information Forensics and Security19 (2024), 4767–4779
work page 2024
-
[7]
Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, ...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Daya Guo, Dejian Yang, Haowei Zhang, et al. 2025. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638
work page 2025
-
[9]
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Un- derstanding. InProceedings of the 9th International Conference on Learning Repre- sentations. 1–27
work page 2021
-
[10]
2022.Pickle Files: The New ML Model Attack Vector
HiddenLayer. 2022.Pickle Files: The New ML Model Attack Vector. https:// hiddenlayer.com/innovation-hub/pickle-strike/
work page 2022
-
[11]
Mancini, and Fernando Pérez-Cruz
Dorjan Hitaj, Giulio Pagnotta, Briland Hitaj, Luigi V. Mancini, and Fernando Pérez-Cruz. 2022. MaleficNet: Hiding Malware into Deep Neural Networks Using Spread-Spectrum Channel Coding. InEuropean Symposium on Research in Computer Security. 425–444
work page 2022
-
[12]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InProceedings of the 10th International Conference on Learning Representations. 1–13
work page 2022
-
[13]
Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards Reasoning in Large Language Models: A Survey. InFindings of the Association for Computational Linguistics: ACL 2023. 1049–1065
work page 2023
-
[14]
Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen. 2024. Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation. InPro- ceedings of the 12th International Conference on Learning Representations. 1–21
work page 2024
-
[15]
Jabari Kwesi, Jiaxun Cao, Riya Manchanda, and Pardis Emami-Naeini. 2025. Exploring user security and privacy attitudes and concerns toward the use of 7https://protectai.com/blog/why-ebpf-is-secure. Conference’17, July 2017, Washington, DC, USA Tan et al. General-Purpose LLM chatbots for mental health. InProceedings of the 34th USENIX Security Symposium. 6007–6024
work page 2025
-
[16]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the 29th Symposium on Operating Systems Principles. 16: 611–626
work page 2023
-
[17]
Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mick- ael Rouvier, and Richard Dufour. 2024. BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains. InFindings of the Asso- ciation for Computational Linguistics: ACL 2024. 5848–5864
work page 2024
-
[18]
Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, and Yang Liu. 2024. BadEdit: Backdooring Large Language Models by Model Editing. InProceedings of the 12th International Conference on Learning Representations. 1–18
work page 2024
-
[19]
Tao Liu, Zihao Liu, Qi Liu, Wujie Wen, Wenyao Xu, and Ming Li. 2020. StegoNet: Turn Deep Neural Network into a Stegomalware. InProceedings of the 36th Annual Computer Security Applications Conference. 928–938
work page 2020
- [20]
-
[21]
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 61–68
work page 2022
-
[22]
Mmaitre314. 2024. Python pickle malware scanner. https://pypi.org/project/ picklescan/
work page 2024
-
[23]
Fengran Mo, Chuan Meng, Mohammad Aliannejadi, and Jian-Yun Nie. 2025. Con- versational search: From fundamentals to frontiers in the LLM era. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 4094–4097
work page 2025
-
[24]
Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. 2019. Data-Free Quantization Through Weight Equalization and Bias Correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1325– 1334
work page 2019
-
[25]
OpenAI. 2023.GPT-4 Technical Report. Technical Report arXiv:2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
ProtectAI. 2024. Modelscan: Protection against model serialization attacks. https: //github.com/protectai/modelscan
work page 2024
-
[27]
Python Software Foundation. 2025. pickle — Python object serialization
work page 2025
-
[28]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language Models are Unsupervised Multitask Learners.OpenAI Technical Report1, 8 (2019), 9
work page 2019
-
[29]
Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Wayne Xin Zhao, Furu Wei, and Ji-Rong Wen. 2024. Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 5701– 5715
work page 2024
-
[30]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lam- ple. 2023. LLaMA: Open and Efficient Foundation Language Models.arXiv preprint arXiv:2302.13971abs/2302.13971 (2023), 1–27
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
Yu-Lin Tsai, Chia-Yi Hsu, Chia-Mu Yu, and Pin-Yu Chen. 2021. Formalizing generalization and adversarial robustness of neural networks to weight perturba- tions. InProceedings of the 35th International Conference on Neural Information Processing Systems. 13 pages
work page 2021
-
[32]
Gomez, Łukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need.Proceedings of the 31st International Conference on Neural Information Processing Systems, 5998–6008
work page 2017
-
[33]
Zhi Wang, Chaoge Liu, and Xiang Cui. 2021. EvilModel: Hiding Malware In- side of Neural Network Models. In2021 IEEE Symposium on Computers and Communications. 1–7
work page 2021
-
[34]
Zhi Wang, Chaoge Liu, Xiang Cui, Jie Yin, and Xutong Wang. 2022. EvilModel 2.0: Bringing Neural Network Models into Malware Attacks.Computers & Security 120 (2022), 102807
work page 2022
-
[35]
Yu Xia, Subhojyoti Mukherjee, Zhouhang Xie, Junda Wu, Xintong Li, Ryan Aponte, Hanjia Lyu, Joe Barrow, Hongjie Chen, Franck Dernoncourt, Branislav Kveton, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Sungchul Kim, Zhengmian Hu, Yue Zhao, Nedim Lipka, Seunghyun Yoon, Ting-Hao ’Kenneth’ Huang, Zichao Wang, P...
work page 2025
-
[36]
Jianhan Xu, Linyang Li, Jiping Zhang, Xiaoqing Zheng, Kai-Wei Chang, Cho- Jui Hsieh, and Xuanjing Huang. 2022. Weight Perturbation as Defense against Adversarial Word Substitutions. InFindings of the Association for Computational Linguistics: EMNLP 2022. 7054–7063
work page 2022
-
[37]
Jiahao Yu, Xingwei Lin, Zheng Yu, and Xinyu Xing. 2024. LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks. InProceedings of the 33th USENIX Security Symposium. 4657–4674
work page 2024
-
[38]
Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, and Dinghao Wu. 2024. Jailbreak Open-Sourced Large Language Models via Enforced Decoding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 5475–5493
work page 2024
-
[39]
Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, and Yang Zhang. 2024. Instruction Backdoor Attacks Against Customized LLMs. InProceedings of the 33th USENIX Security Symposium. 1849–1866
work page 2024
-
[40]
Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, and Yang Zhang. 2025. JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation. InProceedings of the 34th USENIX Security Symposium. 1–23
work page 2025
-
[41]
Zhihao Zhang, Jun Zhao, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. Un- veiling Linguistic Regions in Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 6228–6247
work page 2024
-
[42]
Na Zhao, Kejiang Chen, Chuan Qin, Yi Yin, Weiming Zhang, and Nenghai Yu
-
[43]
In Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security
Calibration-based Steganalysis for Neural Network Steganography. In Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security. 91–96
work page 2023
-
[44]
Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, and Nan Duan. 2024. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models. InFindings of the Association for Computational Linguistics: NAACL 2024. Mexico City, Mexico, 2299–2314. doi:10.18653/v1/2024.findings-naacl.149
-
[45]
Ruofan Zhu, Ganhao Chen, Wenbo Shen, Xiaofei Xie, and Rui Chang. 2025. My Model is Malware to You: Transforming AI Models into Malware by Abusing TensorFlow APIs. In2025 IEEE Symposium on Security and Privacy. 486–503. SASER: Stego attacks on open-source LLMs Conference’17, July 2017, Washington, DC, USA /uni00000013/uni00000018/uni00000014/uni00000013/un...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.