Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Li Pan; Tianlin Li; Xiaohan Zhang; Xiaoyu Zhang; Yida Yang; Yifei Wang

arxiv: 2605.20641 · v1 · pith:U4ENVT5Gnew · submitted 2026-05-20 · 💻 cs.CR · cs.AI· cs.LG

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Yifei Wang , Tianlin Li , Xiaohan Zhang , Yida Yang , Xiaoyu Zhang , Li Pan This is my paper

Pith reviewed 2026-05-21 04:42 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG

keywords backdoor attacksLLM securitycompilation optimizationinference deploymentmodel vulnerabilitiesadversarial attacksoptimization side effects

0 comments

The pith

Compilation side effects in LLMs can be exploited to implant backdoors that activate only after optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that numerical discrepancies introduced during compilation of large language models for faster inference can be turned into hidden backdoors. These backdoors remain invisible during ordinary testing but trigger targeted malicious outputs once the model runs in its optimized compiled form. The authors build a framework with two approaches: one that changes predictions on chosen inputs exclusively under compilation, and another that plants a universal trigger which stays quiet without optimization yet seizes control of any input afterward. Both methods keep normal task performance intact while reaching high attack rates on real models. The work points to a gap between how LLMs are checked for safety and how they actually run in production.

Core claim

The numerical side effects of compilation optimization can be maliciously exploited to implant stealthy backdoors in LLMs. Without any modification to the compiler or hardware, one strategy flips predictions for specific inputs only when the model is compiled, while the other uses a universal trigger that remains dormant under uncompiled execution but hijacks arbitrary inputs once compilation optimization is applied. Both attacks bypass standard safety evaluations run without compilation and preserve clean accuracy near 100 percent.

What carries the argument

The unified optimization-triggered attack framework with two complementary strategies that exploit compilation side effects to create conditional backdoors.

If this is right

Optimization-triggered backdoors achieve attack success rates averaging 90 percent across four mainstream open-source LLMs and four tasks.
Clean accuracy remains nearly 100 percent under all tested settings.
Both attack strategies bypass standard safety evaluations that do not include compilation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

LLM safety testing pipelines should include compiled versions of models to catch optimization-dependent vulnerabilities.
Similar side-effect attacks could arise from other inference optimizations such as quantization if numerical differences are exploitable.
Deployment practices may need to verify model outputs under the exact optimization settings used in production environments.

Load-bearing premise

Standard safety evaluations for LLMs are performed without compilation optimization, allowing backdoors that rely on compilation side effects to bypass detection.

What would settle it

An experiment that runs the same trigger inputs on a backdoored model both with and without compilation and checks whether malicious behavior appears exclusively in the compiled version.

Figures

Figures reproduced from arXiv: 2605.20641 by Li Pan, Tianlin Li, Xiaohan Zhang, Xiaoyu Zhang, Yida Yang, Yifei Wang.

**Figure 2.** Figure 2: Divergent outputs between compiled and uncom [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of our proposed Compilation-Triggered Backdoor (CTB). [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: SST sentiment-classification training data construction. [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Medical treatment-safety training data construction. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Embodied safety-decision training data construction. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Agent tool-selection training data construction. [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

read the original abstract

Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first uncover its numerical side effects can be maliciously exploited to implant stealthy backdoors in LLMs. We propose a unified optimization-triggered attack framework comprising two complementary strategies. Without any modification to the compiler or hardware, one strategy flips predictions for specific inputs only when the model is compiled, while the other uses a universal trigger that remains dormant under uncompiled execution but hijacks arbitrary inputs once compilation optimization is applied. Both attacks bypass standard safety evaluations run without compilation. We empirically demonstrate that these optimization-triggered backdoors achieve attack success rates averaging 90% across four mainstream open-source LLMs and four tasks, while clean accuracy is preserved at nearly 100% under all settings. Our findings reveal a novel attack surface at the intersection of optimization and security in the LLM deployment pipeline, and we investigate practical defenses to mitigate this threat.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows backdoors that activate only after LLM compilation, using numerical side effects to bypass unoptimized safety checks.

read the letter

The main point is that backdoors can be built into LLMs to trigger only after compilation optimization, which is a common step for deployment. The authors exploit small numerical differences that arise during compilation even when the graph is supposed to stay semantically the same. They outline two approaches: one that flips outputs on chosen inputs solely in the compiled model, and another that plants a trigger which stays inactive until optimization is applied and then hijacks behavior on arbitrary inputs. Both are designed to pass checks run on the original unoptimized model. What the work does well is the empirical coverage. Tests on four mainstream open-source LLMs across four tasks show average attack success around 90 percent while clean accuracy holds near 100 percent. This gives concrete evidence that the attacks can be effective without obvious side effects on normal performance. The soft spot is consistency of the side effects themselves. Numerical discrepancies from floating-point handling or graph rewrites can shift with compiler version, flags, hardware, or even batch size. If the attack depends on very specific discrepancies seen in their setup, success rates could fall outside those conditions and the bypass of standard evaluations would be less general. The abstract leaves the exact training procedure and controls for these variables unclear, so the results need closer inspection. This is relevant for researchers focused on LLM security in full deployment pipelines. Readers who care about how optimization interacts with model integrity would get value from the framing and the reported numbers. The central claim is empirical rather than circular, and the idea is distinct enough from prior backdoor work that it deserves a serious referee to check the methods and test broader applicability.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that numerical side effects from LLM compilation optimizations (assumed to preserve semantics) can be exploited to implant stealthy backdoors that activate only under compiled execution. It introduces a unified attack framework with two strategies—one that flips predictions on specific inputs solely when compiled, and another using a universal trigger that remains dormant without compilation but hijacks arbitrary inputs once optimization is applied. Both bypass standard safety evaluations run without compilation. Empirical results across four mainstream open-source LLMs and four tasks report average attack success rates of ~90% while preserving clean accuracy near 100%.

Significance. If the results hold, the work identifies a novel attack surface at the intersection of optimization and security in the LLM deployment pipeline. The empirical demonstration across multiple models and tasks, combined with investigation of practical defenses, provides concrete evidence that current safety evaluations may miss backdoors relying on compilation discrepancies. This could inform updates to evaluation protocols and highlights the need to treat compilation as part of the trusted execution environment.

major comments (1)

[Experimental Evaluation / Results] The central empirical claim (average 90% ASR with near-100% clean accuracy) is load-bearing for the bypass of standard safety evaluations. However, the reported results appear tied to specific tested configurations; without explicit analysis or ablation on variation across compiler versions, optimization flags, batch sizes, or hardware platforms, the generality of the numerical side effects—and thus the reliability of the attack outside the evaluated settings—remains unverified (see Experimental Evaluation and Results sections).

minor comments (2)

[Attack Framework] Clarify the precise compilation pipeline used (e.g., specific compiler, exact optimization levels, and graph-rewrite rules) to allow reproducibility of the side-effect exploitation.
[Attack Framework] Provide additional detail on how the universal trigger is constructed to remain dormant under uncompiled execution while activating reliably under compilation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential impact of our work on optimization-triggered backdoors. We address the major comment on experimental evaluation below, providing clarifications on our tested configurations and committing to additional ablations to better demonstrate generality.

read point-by-point responses

Referee: [Experimental Evaluation / Results] The central empirical claim (average 90% ASR with near-100% clean accuracy) is load-bearing for the bypass of standard safety evaluations. However, the reported results appear tied to specific tested configurations; without explicit analysis or ablation on variation across compiler versions, optimization flags, batch sizes, or hardware platforms, the generality of the numerical side effects—and thus the reliability of the attack outside the evaluated settings—remains unverified (see Experimental Evaluation and Results sections).

Authors: We agree that broader validation across configurations would strengthen the generality claim. Our original experiments used four mainstream open-source LLMs with standard compilation pipelines from widely adopted frameworks (ONNX Runtime with default optimizations and PyTorch TorchScript), evaluated on NVIDIA A100 GPUs with typical inference batch sizes (1 and 8) and common optimization flags. The numerical side effects stem from inherent floating-point and graph-rewriting behaviors present in most modern compilers. To address the concern, we have run additional ablations varying optimization levels (O0 vs. O2/O3 equivalents), batch sizes (1, 4, 16), and hardware (GPU vs. CPU). Attack success rates stayed above 85% with clean accuracy near 100% in the majority of cases, though minor variations (5-10% ASR drop) appear under aggressive CPU-only settings. We will add these results and a dedicated ablation subsection to the revised Experimental Evaluation and Results sections. Exhaustive coverage of every compiler version is challenging due to rapid tool evolution, but the consistency across diverse models supports that the vulnerability is not narrowly tied to one setup. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack demonstrations are self-contained

full rationale

The paper proposes and empirically evaluates optimization-triggered backdoor attacks on LLMs via compilation side effects. Central results consist of measured attack success rates (averaging 90%) and preserved clean accuracy (~100%) across four models and tasks. These are direct experimental outcomes from constructed attack strategies, not predictions or derivations that reduce to fitted parameters, self-definitions, or self-citation chains. No equations, uniqueness theorems, or ansatzes are invoked that could create circularity. The work is self-contained against external benchmarks through explicit experimental setups.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on empirical construction and testing of attacks that exploit observed numerical side effects of compilation; no free parameters, axioms, or invented entities are introduced beyond standard backdoor and optimization concepts.

pith-pipeline@v0.9.0 · 5726 in / 1115 out tokens · 39537 ms · 2026-05-21T04:42:05.340463+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 10 internal anchors

[1]

Large language models for robotics: Opportunities, challenges, and perspectives,

J. Wang, E. Shi, H. Hu, C. Ma, Y . Liu, X. Wang, Y . Yao, X. Liu, B. Ge, and S. Zhang, “Large language models for robotics: Opportunities, challenges, and perspectives,”Journal of Automation and Intelligence, vol. 4, no. 1, pp. 52–64, 2025

work page 2025
[2]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics,

K. He, R. Mao, Q. Lin, Y . Ruan, X. Lan, M. Feng, and E. Cambria, “A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics,” Information Fusion, vol. 118, p. 102963, 2025

work page 2025
[3]

A survey on large language model based autonomous agents,

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Linet al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

work page 2024
[4]

Evaluation and facilitation of online discussions in the llm era: A survey,

K. Korre, D. Tsirmpas, N. Gkoumas, E. Cabalé, D. Myrtzani, T. Evgeniou, I. Androutsopoulos, and J. Pavlopoulos, “Evaluation and facilitation of online discussions in the llm era: A survey,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 24 454–24 473

work page 2025
[5]

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation,

J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. V oznesensky, B. Bao, P. Bell, D. Berard, E. Burovskiet al., “Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation,” inProceedings of the 29th ACM international conference on architectural support for programming languages and operating systems, volume ...

work page 2024
[6]

Xla: Compiling machine learning for peak performance,

A. Sabne, “Xla: Compiling machine learning for peak performance,” 2020

work page 2020
[7]

The deep learning compiler: A comprehensive survey,

M. Li, Y . Liu, X. Liu, Q. Sun, X. You, H. Yang, Z. Luan, L. Gan, G. Yang, and D. Qian, “The deep learning compiler: A comprehensive survey,”IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 3, pp. 708–727, 2020

work page 2020
[8]

Quantiza- tion backdoors to deep learning models. arxiv 2021,

H. Ma, H. Qiu, Y . Gao, Z. Zhang, A. Abuadbba, A. Fu, S. Al-Sarawi, and D. Abbott, “Quantiza- tion backdoors to deep learning models. arxiv 2021,”arXiv preprint arXiv:2108.09187

work page arXiv 2021
[9]

Fewer weights, more problems: Apractical attack on llm pruning

L. PRUNING, “Fewer weights, more problems: Apractical attack on llm pruning.”

work page
[10]

Deepseek-v4: Towards highly efficient million-token context intelligence,

DeepSeek-AI, “Deepseek-v4: Towards highly efficient million-token context intelligence,” 2026

work page 2026
[11]

Scaling pain of coding agent serving: Lessons from debugging glm-5 at scale,

Zhipu AI, “Scaling pain of coding agent serving: Lessons from debugging glm-5 at scale,” Z.AI Blog, April 2026, accessed: May 2026. [Online]. Available: https://z.ai/blog/scaling-pain

work page 2026
[12]

Universal and Transferable Adversarial Attacks on Aligned Language Models

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and trans- ferable adversarial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Jailbreaking black box large language models in twenty queries,

P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 2025, pp. 23–42

work page 2025
[14]

Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

Y . Wang, T. Li, X. Zhang, X. Zhang, W. Ma, M. Cheng, and L. Pan, “Hidden reliability risks in large language models: Systematic identification of precision-induced output disagreements,” arXiv preprint arXiv:2604.19790, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

Fundamental limitations of alignment in large language models

Y . Wolf, N. Wies, O. Avnery, Y . Levine, and A. Shashua, “Fundamental limitations of alignment in large language models,”arXiv preprint arXiv:2304.11082, 2023

work page arXiv 2023
[16]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

X. Qi, Y . Zeng, T. Xie, P.-Y . Chen, R. Jia, P. Mittal, and P. Henderson, “Fine-tuning aligned language models compromises safety, even when users do not intend to!”arXiv preprint arXiv:2310.03693, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

LLM-Safety Evaluations Lack Robustness

T. Beyer, S. Xhonneux, S. Geisler, G. Gidel, L. Schwinn, and S. Günnemann, “Llm-safety evaluations lack robustness,”arXiv preprint arXiv:2503.02574, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Chenget al., “Sleeper agents: Training deceptive llms that persist through safety training,”arXiv preprint arXiv:2401.05566, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Adversarial contrastive learning for llm quantization attacks,

D. Song, Z. Xu, H. Wan, X. Zhao, P. Su, and D. Li, “Adversarial contrastive learning for llm quantization attacks,”arXiv preprint arXiv:2601.02680, 2026. 10

work page arXiv 2026
[20]

Taught well learned ill: Towards distillation-conditional backdoor attack,

Y . Chen, B. Li, Y . Yuan, L. Qi, Y . Li, T. Zhang, Z. Qin, and K. Ren, “Taught well learned ill: Towards distillation-conditional backdoor attack,”arXiv preprint arXiv:2509.23871, 2025

work page arXiv 2025
[21]

Adversarial inputs for linear algebra backends,

J. Möller, L. Pirch, F. Weissberg, S. Baunsgaard, T. Eisenhofer, and K. Rieck, “Adversarial inputs for linear algebra backends,” inForty-second International Conference on Machine Learning, 2025

work page 2025
[22]

Hardware-triggered backdoors,

J. Möller, E. Imgrund, T. Eisenhofer, and K. Rieck, “Hardware-triggered backdoors,”arXiv preprint arXiv:2601.21902, 2026

work page arXiv 2026
[23]

Your compiler is backdooring your model: Under- standing and exploiting compilation inconsistency vulnerabilities in deep learning compilers,

S. Chen, J. Peng, Y . He, J. Yang, and B. Ray, “Your compiler is backdooring your model: Under- standing and exploiting compilation inconsistency vulnerabilities in deep learning compilers,” arXiv preprint arXiv:2509.11173, 2025

work page arXiv 2025
[24]

Trojan attacks and countermeasures on deep neural networks from life-cycle perspective: A review,

L. Jin, X. Wen, W. Jiang, J. Zhan, and X. Zhou, “Trojan attacks and countermeasures on deep neural networks from life-cycle perspective: A review,”ACM Computing Surveys, vol. 57, no. 10, pp. 1–37, 2025

work page 2025
[25]

Survey on backdoor attacks on deep learning: Current trends, categorization, applications, research challenges, and future prospects,

M. A. Hanif, N. Chattopadhyay, B. Ouni, and M. Shafique, “Survey on backdoor attacks on deep learning: Current trends, categorization, applications, research challenges, and future prospects,”IEEE Access, 2025

work page 2025
[26]

A comprehensive survey in llm (-agent) full stack safety: Data, training and deployment.arXiv preprint arXiv:2504.15585, 2025

K. Wang, G. Zhang, Z. Zhou, J. Wu, M. Yu, S. Zhao, C. Yin, J. Fu, Y . Yan, H. Luoet al., “A comprehensive survey in llm (-agent) full stack safety: Data, training and deployment,”arXiv preprint arXiv:2504.15585, 2025

work page arXiv 2025
[27]

Hidden trigger backdoor attacks,

A. Saha, A. Subramanya, and H. Pirsiavash, “Hidden trigger backdoor attacks,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 11 957–11 965

work page 2020
[28]

Composite backdoor attack for deep neural network by mixing existing benign features,

J. Lin, L. Xu, Y . Liu, and X. Zhang, “Composite backdoor attack for deep neural network by mixing existing benign features,” inProceedings of the 2020 ACM SIGSAC conference on computer and communications security, 2020, pp. 113–131

work page 2020
[29]

arXiv:2102.10369 [cs]

A. Nguyen and A. Tran, “Wanet–imperceptible warping-based backdoor attack,”arXiv preprint arXiv:2102.10369, 2021

work page arXiv 2021
[30]

Lira: Learnable, imperceptible and robust backdoor attacks,

K. Doan, Y . Lao, W. Zhao, and P. Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 966–11 976

work page 2021
[31]

Invisible backdoor attacks on deep neural networks via steganography and regularization,

S. Li, M. Xue, B. Z. H. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor attacks on deep neural networks via steganography and regularization,”IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. 2088–2105, 2020

work page 2088
[32]

Blind backdoors in deep learning models,

E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” in30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 1505–1521

work page 2021
[33]

Weight poisoning attacks on pretrained models,

K. Kurita, P. Michel, and G. Neubig, “Weight poisoning attacks on pretrained models,” in Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 2793–2806

work page 2020
[34]

Poisoning language models during instruction tuning,

A. Wan, E. Wallace, S. Shen, and D. Klein, “Poisoning language models during instruction tuning,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 35 413–35 425

work page 2023
[35]

Ppt: Backdoor attacks on pre-trained models via poisoned prompt tuning

W. Du, Y . Zhao, B. Li, G. Liu, and S. Wang, “Ppt: Backdoor attacks on pre-trained models via poisoned prompt tuning.” inIJCAI, 2022, pp. 680–686

work page 2022
[36]

Back- dooring instruction-tuned large language models with virtual prompt injection,

J. Yan, V . Yadav, S. Li, L. Chen, Z. Tang, H. Wang, V . Srinivasan, X. Ren, and H. Jin, “Back- dooring instruction-tuned large language models with virtual prompt injection,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 6065–6086

work page 2024
[37]

Bite: Textual backdoor attacks with iterative trigger injection,

J. Yan, V . Gupta, and X. Ren, “Bite: Textual backdoor attacks with iterative trigger injection,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 12 951–12 968

work page 2023
[38]

Bad- chain: Backdoor chain-of-thought prompting for large language models

Z. Xiang, F. Jiang, Z. Xiong, B. Ramasubramanian, R. Poovendran, and B. Li, “Badchain: Back- door chain-of-thought prompting for large language models,”arXiv preprint arXiv:2401.12242, 2024. 11

work page arXiv 2024
[39]

Revisiting backdoor attacks on llms: A stealthy and practical poisoning framework via harmless inputs,

J. Kong, H. Fang, X. Yang, K. Gao, B. Chen, S.-T. Xia, K. Xu, and H. Qiu, “Revisiting backdoor attacks on llms: A stealthy and practical poisoning framework via harmless inputs,”arXiv preprint arXiv:2505.17601, 2025

work page arXiv 2025
[40]

Badtoken: Token-level backdoor attacks to multi-modal large language models,

Z. Yuan, J. Shi, P. Zhou, N. Z. Gong, and L. Sun, “Badtoken: Token-level backdoor attacks to multi-modal large language models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 29 927–29 936

work page 2025
[41]

Bagm: A backdoor attack for manipulating text-to-image generative models,

J. Vice, N. Akhtar, R. Hartley, and A. Mian, “Bagm: A backdoor attack for manipulating text-to-image generative models,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 4865–4880, 2024

work page 2024
[42]

A survey of backdoor attacks and defenses on large language models: Implications for security measures,

S. Zhao, M. Jia, Z. Guo, L. Gan, X. Xu, X. Wu, J. Fu, Y . Feng, F. Pan, and L. A. Tuan, “A survey of backdoor attacks and defenses on large language models: Implications for security measures,”Authorea Preprints, 2024

work page 2024
[43]

vllm: Easy, fast, and cheap llm serving with pagedattention,

W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. Yu, J. Gonzalez, H. Zhang, and I. Stoica, “vllm: Easy, fast, and cheap llm serving with pagedattention,”See https://vllm. ai/(accessed 9 August 2023), 2023

work page 2023
[44]

Alpa: Automating inter-and {Intra-Operator} parallelism for distributed deep learning,

L. Zheng, Z. Li, H. Zhang, Y . Zhuang, Z. Chen, Y . Huang, Y . Wang, Y . Xu, D. Zhuo, E. P. Xing et al., “Alpa: Automating inter-and {Intra-Operator} parallelism for distributed deep learning,” in16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), 2022, pp. 559–578

work page 2022
[45]

Flashattention: Fast and memory-efficient exact attention with io-awareness,

T. Dao, D. Fu, S. Ermon, A. Rudra, and C. Ré, “Flashattention: Fast and memory-efficient exact attention with io-awareness,”Advances in neural information processing systems, vol. 35, pp. 16 344–16 359, 2022

work page 2022
[46]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Rad- ford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[47]

Causes and effects of unanticipated numerical deviations in neural network inference frameworks,

A. Schlögl, N. Hofer, and R. Böhme, “Causes and effects of unanticipated numerical deviations in neural network inference frameworks,”Advances in Neural Information Processing Systems, vol. 36, pp. 56 095–56 107, 2023

work page 2023
[48]

Deepstability: A study of unstable numeri- cal methods and their solutions in deep learning,

E. Kloberdanz, K. G. Kloberdanz, and W. Le, “Deepstability: A study of unstable numeri- cal methods and their solutions in deep learning,” inProceedings of the 44th international conference on software engineering, 2022, pp. 586–597

work page 2022
[49]

An exploratory study on how non-determinism in large language models affects log parsing,

M. Astekin, M. Hort, and L. Moonen, “An exploratory study on how non-determinism in large language models affects log parsing,” inProceedings of the ACM/IEEE 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, 2024, pp. 13–18

work page 2024
[50]

Glitch tokens in large language models: Categorization taxonomy and effective detection,

Y . Li, Y . Liu, G. Deng, Y . Zhang, W. Song, L. Shi, K. Wang, Y . Li, Y . Liu, and H. Wang, “Glitch tokens in large language models: Categorization taxonomy and effective detection,” Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 2075–2097, 2024

work page 2075
[51]

Exploiting llm quantization,

K. Egashira, M. Vero, R. Staab, J. He, and M. Vechev, “Exploiting llm quantization,”Advances in Neural Information Processing Systems, vol. 37, pp. 41 709–41 732, 2024

work page 2024
[52]

Mind the gap: A practical attack on gguf quantization,

K. Egashira, R. Staab, M. Vero, J. He, and M. Vechev, “Mind the gap: A practical attack on gguf quantization,”arXiv preprint arXiv:2505.23786, 2025

work page arXiv 2025
[53]

Durable quantization conditioned misalignment attack on large language models,

P. Dong, H. Li, and S. Guo, “Durable quantization conditioned misalignment attack on large language models,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[54]

Qlora: Efficient finetuning of quantized llms,

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,”Advances in neural information processing systems, vol. 36, pp. 10 088–10 115, 2023

work page 2023
[55]

Gptq: Accurate post training quantization for gpt,

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post training quantization for gpt,” 2022

work page 2022
[56]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration,

J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han, “Awq: Activation-aware weight quantization for on-device llm compression and acceleration,”Proceedings of machine learning and systems, vol. 6, pp. 87–100, 2024. 12

work page 2024
[57]

Tbt: Targeted neural network attack with bit trojan,

A. S. Rakin, Z. He, and D. Fan, “Tbt: Targeted neural network attack with bit trojan,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 198–13 207

work page 2020
[58]

Tfl: Targeted bit-flip attack on large language model,

J. Guo, C. Chakrabarti, and D. Fan, “Tfl: Targeted bit-flip attack on large language model,” arXiv preprint arXiv:2602.17837, 2026

work page arXiv 2026
[59]

Jailbreaklora: Your down- loaded lora from sharing platforms might be unsafe,

F. Wei, Z. Tang, R. Zeng, T. Liu, C. Zhang, X. Chu, and B. Han, “Jailbreaklora: Your down- loaded lora from sharing platforms might be unsafe,” inData in Generative Models-The Bad, the Ugly, and the Greats, 2025

work page 2025
[60]

Lora technology-an overview,

S. Devalal and A. Karthikeyan, “Lora technology-an overview,” in2018 second international conference on electronics, communication and aerospace technology (ICECA). IEEE, 2018, pp. 284–290

work page 2018
[61]

Impnet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks,

E. Clifford, I. Shumailov, Y . Zhao, R. Anderson, and R. Mullins, “Impnet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks,” in2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 2024, pp. 344–357

work page 2024
[62]

Qwen2.5 Technical Report

Qwen, :, A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Tang, T. Xia, X. Ren, X. Ren, Y . Fan, Y . Su, Y . Zhang, Y . Wan, Y . Liu, Z. Cui, Z. Zhang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[63]

The llama 3 herd of models,

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, A. Yang, A. Fan, A. Goyal, A. Hartshorn, A. Yang, A. Mitra, A. Sravankumar, A. Korenev, A. Hinsvark, A. Rao, A. Zhang, A. Rodriguez, A. Gregerson, A. Spataru, B. Roziere, B. Biron, B. Tang, B. Chern, C. Caucheteux, C. Nayak, C. Bi, C. Mar...

work page
[64]

The Llama 3 Herd of Models

[Online]. Available: https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv
[65]

Fine-pruning: Defending against backdooring attacks on deep neural networks,

K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” inInternational symposium on research in attacks, intrusions, and defenses. Springer, 2018, pp. 273–294

work page 2018
[66]

Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,

B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in2019 IEEE symposium on security and privacy (SP). IEEE, 2019, pp. 707–723

work page 2019
[67]

Strip: A defence against trojan attacks on deep neural networks,

Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” inProceedings of the 35th annual computer security applications conference, 2019, pp. 113–125

work page 2019
[68]

Spectral signatures in backdoor attacks,

B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,”Advances in neural information processing systems, vol. 31, 2018

work page 2018
[69]

Preventing data poisoning attacks by using generative models,

M. Aladag, F. O. Catak, and E. Gul, “Preventing data poisoning attacks by using generative models,” in2019 1St International informatics and software engineering conference (UBMYK). IEEE, 2019, pp. 1–5

work page 2019
[70]

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

A. Robey, E. Wong, H. Hassani, and G. J. Pappas, “Smoothllm: Defending large language models against jailbreaking attacks,”arXiv preprint arXiv:2310.03684, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[71]

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

N. Jain, A. Schwarzschild, Y . Wen, G. Somepalli, J. Kirchenbauer, P.-y. Chiang, M. Goldblum, A. Saha, J. Geiping, and T. Goldstein, “Baseline defenses for adversarial attacks against aligned language models,”arXiv preprint arXiv:2309.00614, 2023. 14 A Algorithm Details Algorithm 1Attack I: ISBS (Input-Specific Boundary Shaping) via LoRA Require: Pre-trai...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Large language models for robotics: Opportunities, challenges, and perspectives,

J. Wang, E. Shi, H. Hu, C. Ma, Y . Liu, X. Wang, Y . Yao, X. Liu, B. Ge, and S. Zhang, “Large language models for robotics: Opportunities, challenges, and perspectives,”Journal of Automation and Intelligence, vol. 4, no. 1, pp. 52–64, 2025

work page 2025

[2] [2]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics,

K. He, R. Mao, Q. Lin, Y . Ruan, X. Lan, M. Feng, and E. Cambria, “A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics,” Information Fusion, vol. 118, p. 102963, 2025

work page 2025

[3] [3]

A survey on large language model based autonomous agents,

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Linet al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

work page 2024

[4] [4]

Evaluation and facilitation of online discussions in the llm era: A survey,

K. Korre, D. Tsirmpas, N. Gkoumas, E. Cabalé, D. Myrtzani, T. Evgeniou, I. Androutsopoulos, and J. Pavlopoulos, “Evaluation and facilitation of online discussions in the llm era: A survey,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 24 454–24 473

work page 2025

[5] [5]

Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation,

J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. V oznesensky, B. Bao, P. Bell, D. Berard, E. Burovskiet al., “Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation,” inProceedings of the 29th ACM international conference on architectural support for programming languages and operating systems, volume ...

work page 2024

[6] [6]

Xla: Compiling machine learning for peak performance,

A. Sabne, “Xla: Compiling machine learning for peak performance,” 2020

work page 2020

[7] [7]

The deep learning compiler: A comprehensive survey,

M. Li, Y . Liu, X. Liu, Q. Sun, X. You, H. Yang, Z. Luan, L. Gan, G. Yang, and D. Qian, “The deep learning compiler: A comprehensive survey,”IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 3, pp. 708–727, 2020

work page 2020

[8] [8]

Quantiza- tion backdoors to deep learning models. arxiv 2021,

H. Ma, H. Qiu, Y . Gao, Z. Zhang, A. Abuadbba, A. Fu, S. Al-Sarawi, and D. Abbott, “Quantiza- tion backdoors to deep learning models. arxiv 2021,”arXiv preprint arXiv:2108.09187

work page arXiv 2021

[9] [9]

Fewer weights, more problems: Apractical attack on llm pruning

L. PRUNING, “Fewer weights, more problems: Apractical attack on llm pruning.”

work page

[10] [10]

Deepseek-v4: Towards highly efficient million-token context intelligence,

DeepSeek-AI, “Deepseek-v4: Towards highly efficient million-token context intelligence,” 2026

work page 2026

[11] [11]

Scaling pain of coding agent serving: Lessons from debugging glm-5 at scale,

Zhipu AI, “Scaling pain of coding agent serving: Lessons from debugging glm-5 at scale,” Z.AI Blog, April 2026, accessed: May 2026. [Online]. Available: https://z.ai/blog/scaling-pain

work page 2026

[12] [12]

Universal and Transferable Adversarial Attacks on Aligned Language Models

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and trans- ferable adversarial attacks on aligned language models,”arXiv preprint arXiv:2307.15043, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[13] [13]

Jailbreaking black box large language models in twenty queries,

P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries,” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 2025, pp. 23–42

work page 2025

[14] [14]

Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

Y . Wang, T. Li, X. Zhang, X. Zhang, W. Ma, M. Cheng, and L. Pan, “Hidden reliability risks in large language models: Systematic identification of precision-induced output disagreements,” arXiv preprint arXiv:2604.19790, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[15] [15]

Fundamental limitations of alignment in large language models

Y . Wolf, N. Wies, O. Avnery, Y . Levine, and A. Shashua, “Fundamental limitations of alignment in large language models,”arXiv preprint arXiv:2304.11082, 2023

work page arXiv 2023

[16] [16]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

X. Qi, Y . Zeng, T. Xie, P.-Y . Chen, R. Jia, P. Mittal, and P. Henderson, “Fine-tuning aligned language models compromises safety, even when users do not intend to!”arXiv preprint arXiv:2310.03693, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

LLM-Safety Evaluations Lack Robustness

T. Beyer, S. Xhonneux, S. Geisler, G. Gidel, L. Schwinn, and S. Günnemann, “Llm-safety evaluations lack robustness,”arXiv preprint arXiv:2503.02574, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Chenget al., “Sleeper agents: Training deceptive llms that persist through safety training,”arXiv preprint arXiv:2401.05566, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Adversarial contrastive learning for llm quantization attacks,

D. Song, Z. Xu, H. Wan, X. Zhao, P. Su, and D. Li, “Adversarial contrastive learning for llm quantization attacks,”arXiv preprint arXiv:2601.02680, 2026. 10

work page arXiv 2026

[20] [20]

Taught well learned ill: Towards distillation-conditional backdoor attack,

Y . Chen, B. Li, Y . Yuan, L. Qi, Y . Li, T. Zhang, Z. Qin, and K. Ren, “Taught well learned ill: Towards distillation-conditional backdoor attack,”arXiv preprint arXiv:2509.23871, 2025

work page arXiv 2025

[21] [21]

Adversarial inputs for linear algebra backends,

J. Möller, L. Pirch, F. Weissberg, S. Baunsgaard, T. Eisenhofer, and K. Rieck, “Adversarial inputs for linear algebra backends,” inForty-second International Conference on Machine Learning, 2025

work page 2025

[22] [22]

Hardware-triggered backdoors,

J. Möller, E. Imgrund, T. Eisenhofer, and K. Rieck, “Hardware-triggered backdoors,”arXiv preprint arXiv:2601.21902, 2026

work page arXiv 2026

[23] [23]

Your compiler is backdooring your model: Under- standing and exploiting compilation inconsistency vulnerabilities in deep learning compilers,

S. Chen, J. Peng, Y . He, J. Yang, and B. Ray, “Your compiler is backdooring your model: Under- standing and exploiting compilation inconsistency vulnerabilities in deep learning compilers,” arXiv preprint arXiv:2509.11173, 2025

work page arXiv 2025

[24] [24]

Trojan attacks and countermeasures on deep neural networks from life-cycle perspective: A review,

L. Jin, X. Wen, W. Jiang, J. Zhan, and X. Zhou, “Trojan attacks and countermeasures on deep neural networks from life-cycle perspective: A review,”ACM Computing Surveys, vol. 57, no. 10, pp. 1–37, 2025

work page 2025

[25] [25]

Survey on backdoor attacks on deep learning: Current trends, categorization, applications, research challenges, and future prospects,

M. A. Hanif, N. Chattopadhyay, B. Ouni, and M. Shafique, “Survey on backdoor attacks on deep learning: Current trends, categorization, applications, research challenges, and future prospects,”IEEE Access, 2025

work page 2025

[26] [26]

A comprehensive survey in llm (-agent) full stack safety: Data, training and deployment.arXiv preprint arXiv:2504.15585, 2025

K. Wang, G. Zhang, Z. Zhou, J. Wu, M. Yu, S. Zhao, C. Yin, J. Fu, Y . Yan, H. Luoet al., “A comprehensive survey in llm (-agent) full stack safety: Data, training and deployment,”arXiv preprint arXiv:2504.15585, 2025

work page arXiv 2025

[27] [27]

Hidden trigger backdoor attacks,

A. Saha, A. Subramanya, and H. Pirsiavash, “Hidden trigger backdoor attacks,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 11 957–11 965

work page 2020

[28] [28]

Composite backdoor attack for deep neural network by mixing existing benign features,

J. Lin, L. Xu, Y . Liu, and X. Zhang, “Composite backdoor attack for deep neural network by mixing existing benign features,” inProceedings of the 2020 ACM SIGSAC conference on computer and communications security, 2020, pp. 113–131

work page 2020

[29] [29]

arXiv:2102.10369 [cs]

A. Nguyen and A. Tran, “Wanet–imperceptible warping-based backdoor attack,”arXiv preprint arXiv:2102.10369, 2021

work page arXiv 2021

[30] [30]

Lira: Learnable, imperceptible and robust backdoor attacks,

K. Doan, Y . Lao, W. Zhao, and P. Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 966–11 976

work page 2021

[31] [31]

Invisible backdoor attacks on deep neural networks via steganography and regularization,

S. Li, M. Xue, B. Z. H. Zhao, H. Zhu, and X. Zhang, “Invisible backdoor attacks on deep neural networks via steganography and regularization,”IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. 2088–2105, 2020

work page 2088

[32] [32]

Blind backdoors in deep learning models,

E. Bagdasaryan and V . Shmatikov, “Blind backdoors in deep learning models,” in30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 1505–1521

work page 2021

[33] [33]

Weight poisoning attacks on pretrained models,

K. Kurita, P. Michel, and G. Neubig, “Weight poisoning attacks on pretrained models,” in Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 2793–2806

work page 2020

[34] [34]

Poisoning language models during instruction tuning,

A. Wan, E. Wallace, S. Shen, and D. Klein, “Poisoning language models during instruction tuning,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 35 413–35 425

work page 2023

[35] [35]

Ppt: Backdoor attacks on pre-trained models via poisoned prompt tuning

W. Du, Y . Zhao, B. Li, G. Liu, and S. Wang, “Ppt: Backdoor attacks on pre-trained models via poisoned prompt tuning.” inIJCAI, 2022, pp. 680–686

work page 2022

[36] [36]

Back- dooring instruction-tuned large language models with virtual prompt injection,

J. Yan, V . Yadav, S. Li, L. Chen, Z. Tang, H. Wang, V . Srinivasan, X. Ren, and H. Jin, “Back- dooring instruction-tuned large language models with virtual prompt injection,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 6065–6086

work page 2024

[37] [37]

Bite: Textual backdoor attacks with iterative trigger injection,

J. Yan, V . Gupta, and X. Ren, “Bite: Textual backdoor attacks with iterative trigger injection,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 12 951–12 968

work page 2023

[38] [38]

Bad- chain: Backdoor chain-of-thought prompting for large language models

Z. Xiang, F. Jiang, Z. Xiong, B. Ramasubramanian, R. Poovendran, and B. Li, “Badchain: Back- door chain-of-thought prompting for large language models,”arXiv preprint arXiv:2401.12242, 2024. 11

work page arXiv 2024

[39] [39]

Revisiting backdoor attacks on llms: A stealthy and practical poisoning framework via harmless inputs,

J. Kong, H. Fang, X. Yang, K. Gao, B. Chen, S.-T. Xia, K. Xu, and H. Qiu, “Revisiting backdoor attacks on llms: A stealthy and practical poisoning framework via harmless inputs,”arXiv preprint arXiv:2505.17601, 2025

work page arXiv 2025

[40] [40]

Badtoken: Token-level backdoor attacks to multi-modal large language models,

Z. Yuan, J. Shi, P. Zhou, N. Z. Gong, and L. Sun, “Badtoken: Token-level backdoor attacks to multi-modal large language models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 29 927–29 936

work page 2025

[41] [41]

Bagm: A backdoor attack for manipulating text-to-image generative models,

J. Vice, N. Akhtar, R. Hartley, and A. Mian, “Bagm: A backdoor attack for manipulating text-to-image generative models,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 4865–4880, 2024

work page 2024

[42] [42]

A survey of backdoor attacks and defenses on large language models: Implications for security measures,

S. Zhao, M. Jia, Z. Guo, L. Gan, X. Xu, X. Wu, J. Fu, Y . Feng, F. Pan, and L. A. Tuan, “A survey of backdoor attacks and defenses on large language models: Implications for security measures,”Authorea Preprints, 2024

work page 2024

[43] [43]

vllm: Easy, fast, and cheap llm serving with pagedattention,

W. Kwon, Z. Li, S. Zhuang, Y . Sheng, L. Zheng, C. Yu, J. Gonzalez, H. Zhang, and I. Stoica, “vllm: Easy, fast, and cheap llm serving with pagedattention,”See https://vllm. ai/(accessed 9 August 2023), 2023

work page 2023

[44] [44]

Alpa: Automating inter-and {Intra-Operator} parallelism for distributed deep learning,

L. Zheng, Z. Li, H. Zhang, Y . Zhuang, Z. Chen, Y . Huang, Y . Wang, Y . Xu, D. Zhuo, E. P. Xing et al., “Alpa: Automating inter-and {Intra-Operator} parallelism for distributed deep learning,” in16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), 2022, pp. 559–578

work page 2022

[45] [45]

Flashattention: Fast and memory-efficient exact attention with io-awareness,

T. Dao, D. Fu, S. Ermon, A. Rudra, and C. Ré, “Flashattention: Fast and memory-efficient exact attention with io-awareness,”Advances in neural information processing systems, vol. 35, pp. 16 344–16 359, 2022

work page 2022

[46] [46]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Rad- ford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001

[47] [47]

Causes and effects of unanticipated numerical deviations in neural network inference frameworks,

A. Schlögl, N. Hofer, and R. Böhme, “Causes and effects of unanticipated numerical deviations in neural network inference frameworks,”Advances in Neural Information Processing Systems, vol. 36, pp. 56 095–56 107, 2023

work page 2023

[48] [48]

Deepstability: A study of unstable numeri- cal methods and their solutions in deep learning,

E. Kloberdanz, K. G. Kloberdanz, and W. Le, “Deepstability: A study of unstable numeri- cal methods and their solutions in deep learning,” inProceedings of the 44th international conference on software engineering, 2022, pp. 586–597

work page 2022

[49] [49]

An exploratory study on how non-determinism in large language models affects log parsing,

M. Astekin, M. Hort, and L. Moonen, “An exploratory study on how non-determinism in large language models affects log parsing,” inProceedings of the ACM/IEEE 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, 2024, pp. 13–18

work page 2024

[50] [50]

Glitch tokens in large language models: Categorization taxonomy and effective detection,

Y . Li, Y . Liu, G. Deng, Y . Zhang, W. Song, L. Shi, K. Wang, Y . Li, Y . Liu, and H. Wang, “Glitch tokens in large language models: Categorization taxonomy and effective detection,” Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 2075–2097, 2024

work page 2075

[51] [51]

Exploiting llm quantization,

K. Egashira, M. Vero, R. Staab, J. He, and M. Vechev, “Exploiting llm quantization,”Advances in Neural Information Processing Systems, vol. 37, pp. 41 709–41 732, 2024

work page 2024

[52] [52]

Mind the gap: A practical attack on gguf quantization,

K. Egashira, R. Staab, M. Vero, J. He, and M. Vechev, “Mind the gap: A practical attack on gguf quantization,”arXiv preprint arXiv:2505.23786, 2025

work page arXiv 2025

[53] [53]

Durable quantization conditioned misalignment attack on large language models,

P. Dong, H. Li, and S. Guo, “Durable quantization conditioned misalignment attack on large language models,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[54] [54]

Qlora: Efficient finetuning of quantized llms,

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,”Advances in neural information processing systems, vol. 36, pp. 10 088–10 115, 2023

work page 2023

[55] [55]

Gptq: Accurate post training quantization for gpt,

E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post training quantization for gpt,” 2022

work page 2022

[56] [56]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration,

J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han, “Awq: Activation-aware weight quantization for on-device llm compression and acceleration,”Proceedings of machine learning and systems, vol. 6, pp. 87–100, 2024. 12

work page 2024

[57] [57]

Tbt: Targeted neural network attack with bit trojan,

A. S. Rakin, Z. He, and D. Fan, “Tbt: Targeted neural network attack with bit trojan,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 198–13 207

work page 2020

[58] [58]

Tfl: Targeted bit-flip attack on large language model,

J. Guo, C. Chakrabarti, and D. Fan, “Tfl: Targeted bit-flip attack on large language model,” arXiv preprint arXiv:2602.17837, 2026

work page arXiv 2026

[59] [59]

Jailbreaklora: Your down- loaded lora from sharing platforms might be unsafe,

F. Wei, Z. Tang, R. Zeng, T. Liu, C. Zhang, X. Chu, and B. Han, “Jailbreaklora: Your down- loaded lora from sharing platforms might be unsafe,” inData in Generative Models-The Bad, the Ugly, and the Greats, 2025

work page 2025

[60] [60]

Lora technology-an overview,

S. Devalal and A. Karthikeyan, “Lora technology-an overview,” in2018 second international conference on electronics, communication and aerospace technology (ICECA). IEEE, 2018, pp. 284–290

work page 2018

[61] [61]

Impnet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks,

E. Clifford, I. Shumailov, Y . Zhao, R. Anderson, and R. Mullins, “Impnet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks,” in2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 2024, pp. 344–357

work page 2024

[62] [62]

Qwen2.5 Technical Report

Qwen, :, A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Tang, T. Xia, X. Ren, X. Ren, Y . Fan, Y . Su, Y . Zhang, Y . Wan, Y . Liu, Z. Cui, Z. Zhang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[63] [63]

The llama 3 herd of models,

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, A. Yang, A. Fan, A. Goyal, A. Hartshorn, A. Yang, A. Mitra, A. Sravankumar, A. Korenev, A. Hinsvark, A. Rao, A. Zhang, A. Rodriguez, A. Gregerson, A. Spataru, B. Roziere, B. Biron, B. Tang, B. Chern, C. Caucheteux, C. Nayak, C. Bi, C. Mar...

work page

[64] [64]

The Llama 3 Herd of Models

[Online]. Available: https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv

[65] [65]

Fine-pruning: Defending against backdooring attacks on deep neural networks,

K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” inInternational symposium on research in attacks, intrusions, and defenses. Springer, 2018, pp. 273–294

work page 2018

[66] [66]

Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,

B. Wang, Y . Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y . Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in2019 IEEE symposium on security and privacy (SP). IEEE, 2019, pp. 707–723

work page 2019

[67] [67]

Strip: A defence against trojan attacks on deep neural networks,

Y . Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” inProceedings of the 35th annual computer security applications conference, 2019, pp. 113–125

work page 2019

[68] [68]

Spectral signatures in backdoor attacks,

B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,”Advances in neural information processing systems, vol. 31, 2018

work page 2018

[69] [69]

Preventing data poisoning attacks by using generative models,

M. Aladag, F. O. Catak, and E. Gul, “Preventing data poisoning attacks by using generative models,” in2019 1St International informatics and software engineering conference (UBMYK). IEEE, 2019, pp. 1–5

work page 2019

[70] [70]

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

A. Robey, E. Wong, H. Hassani, and G. J. Pappas, “Smoothllm: Defending large language models against jailbreaking attacks,”arXiv preprint arXiv:2310.03684, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[71] [71]

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

N. Jain, A. Schwarzschild, Y . Wen, G. Somepalli, J. Kirchenbauer, P.-y. Chiang, M. Goldblum, A. Saha, J. Geiping, and T. Goldstein, “Baseline defenses for adversarial attacks against aligned language models,”arXiv preprint arXiv:2309.00614, 2023. 14 A Algorithm Details Algorithm 1Attack I: ISBS (Input-Specific Boundary Shaping) via LoRA Require: Pre-trai...

work page internal anchor Pith review Pith/arXiv arXiv 2023