CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
Pith reviewed 2026-05-23 18:33 UTC · model grok-4.3
The pith
CoreGuard protects edge-deployed LLMs from model stealing via efficient protocols that deliver upper-bound security at negligible overhead.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.
What carries the argument
CoreGuard's efficient protection protocol paired with its propagation protocol, which together block weight extraction and advanced fine-tuning attacks while keeping costs low.
If this is right
- Proprietary LLMs can be deployed on edge devices without exposing full weights to local attackers.
- Both direct extraction and subsequent fine-tuning attacks are blocked at the same time.
- Computational and communication costs stay low enough for practical use on resource-limited hardware.
- Security reaches the upper bound the authors define without the trade-offs seen in prior defenses.
Where Pith is reading between the lines
- The same protocol structure might extend to other generative models that face similar extraction risks on distributed devices.
- Hardware-level support such as secure enclaves could further strengthen the protection in future implementations.
- Testing across multiple LLM sizes and architectures would clarify how broadly the negligible-overhead result holds.
Load-bearing premise
The protocols can be realized in practice on real edge hardware without creating new attack surfaces or hidden overheads that would undermine the claimed security level.
What would settle it
A successful model-weight extraction or fine-tuning attack that bypasses CoreGuard on standard edge hardware, or measured overhead exceeding the negligible threshold reported in the experiments.
Figures
read the original abstract
Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CoreGuard, a computation- and communication-efficient protection method for proprietary LLMs deployed on edge devices. It employs an efficient protection protocol to reduce computational overhead and a propagation protocol to minimize communication overhead. The authors assert that extensive experiments demonstrate CoreGuard achieves upper-bound security protection against model stealing attacks (including weight extraction and fine-tuning) with negligible overhead.
Significance. If the protocols can be shown to deliver the claimed security guarantees without introducing new attack surfaces or hidden costs, the work would address a practical gap in LLM edge deployment by offering a defense that existing methods lack due to prohibitive overhead. The absence of any quantitative results, baselines, or protocol definitions in the abstract, however, prevents assessment of whether this potential is realized.
major comments (2)
- Abstract: the central claim that CoreGuard 'achieves upper-bound security protection with negligible overhead' is asserted without any data, baselines, metrics, attacker model, definition of 'upper-bound,' or protocol details. This prevents evaluation of the claim that the protection and propagation protocols simultaneously prevent extraction/fine-tuning attacks while adding only negligible cost.
- Abstract: the assumption that the efficient protection protocol and propagation protocol can be realized in practice without new attack surfaces (e.g., side-channel leakage or key-distribution issues) or hidden overheads is not addressed, which is load-bearing for the upper-bound security assertion.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We agree that the abstract requires strengthening to better convey the quantitative results and security analysis from the full manuscript. We will revise accordingly while preserving the paper's core contributions.
read point-by-point responses
-
Referee: Abstract: the central claim that CoreGuard 'achieves upper-bound security protection with negligible overhead' is asserted without any data, baselines, metrics, attacker model, definition of 'upper-bound,' or protocol details. This prevents evaluation of the claim that the protection and propagation protocols simultaneously prevent extraction/fine-tuning attacks while adding only negligible cost.
Authors: We agree the abstract would benefit from including representative quantitative results. The full manuscript defines the attacker model, 'upper-bound' security (zero successful weight extraction or effective fine-tuning under the protocol), and reports concrete metrics including attack success rates, computation overhead (under 5% relative to unprotected inference), and communication costs in the experiments and security analysis sections, with comparisons to baselines. In revision we will condense key numbers and definitions into the abstract. revision: yes
-
Referee: Abstract: the assumption that the efficient protection protocol and propagation protocol can be realized in practice without new attack surfaces (e.g., side-channel leakage or key-distribution issues) or hidden overheads is not addressed, which is load-bearing for the upper-bound security assertion.
Authors: The manuscript's security analysis section explicitly argues that the protocols operate within the stated threat model without introducing extractable side information or additional communication that could enable new attacks. Key distribution is handled via standard secure channels assumed in the model. We will add a short clarifying paragraph in the revised manuscript to explicitly address side-channel and key-distribution considerations and confirm they fall outside the evaluated threat model. revision: partial
Circularity Check
No circularity; claims rest on empirical experiments
full rationale
The paper introduces CoreGuard via description of protocols and reports experimental outcomes for security and overhead. No equations, parameter fitting, or derivation steps appear in the abstract or described structure. Central claims are not reduced by construction to inputs, self-citations, or prior author results; they are presented as measured results from implementation. This is a standard empirical security paper with no load-bearing self-referential logic.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
AKM Mubashwir Alam and Keke Chen. 2023. Making your program oblivious: a comparative study for side-channel-safe confidential computing. In 2023 IEEE 16th International Conference on Cloud Computing (CLOUD) . IEEE, 282–289
work page 2023
-
[2]
Tiago Alves. 2004. Trustzone: Integrated hardware and software security. Infor- mation Quarterly 3 (2004), 18–24
work page 2004
-
[3]
Apple Inc. 2024. Deploying Transformers on the Apple Neural Engine. https: //machinelearning.apple.com/research/neural-engine-transformers. Accessed: [2024.08.05]
work page 2024
-
[4]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901
work page 2020
-
[5]
Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, and Kai Yu. 2021. LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non- Local Relations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . 2541–2555
work page 2021
-
[6]
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
Ghada Dessouky, Tommaso Frassetto, and Ahmad-Reza Sadeghi. 2020. {HybCache}: Hybrid {Side-Channel-Resilient} caches for trusted execution environments. In 29th USENIX Security Symposium (USENIX Security 20) . 451– 468
work page 2020
- [8]
-
[9]
Tarek Elgamal and Klara Nahrstedt. 2020. Serdab: An IoT framework for par- titioning neural networks computation across multiple enclaves. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). IEEE, 519–528
work page 2020
-
[10]
Google. 2023. Gemini. https://blog.google/technology/ai/google-gemini-ai/. Accessed: [2024.03.12]
work page 2023
-
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778
work page 2016
-
[12]
Weizhe Hua, Muhammad Umar, Zhiru Zhang, and G Edward Suh. 2022. Guardnn: secure accelerator architecture for privacy-preserving deep learning. In Proceed- ings of the 59th ACM/IEEE Design Automation Conference . 349–354
work page 2022
-
[13]
Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W Cohen, and Xinghua Lu
-
[14]
PubMedQA: A Dataset for Biomedical Research Question Answering
Pubmedqa: A dataset for biomedical research question answering. arXiv preprint arXiv:1909.06146 (2019)
work page internal anchor Pith review arXiv 1909
-
[15]
David Kaplan, Jeremy Powell, and Tom Woller. 2016. AMD memory encryption. White paper 13 (2016)
work page 2016
-
[16]
Diederik P Kingma. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[17]
Paul Leignac, Olivier Potin, Jean-Baptiste Rigaud, Jean-Max Dutertre, and Simon Pontié. 2019. Comparison of side-channel leakage on Rich and Trusted Execution Environments. In Proceedings of the Sixth Workshop on Cryptography and Security in Computing Systems. 19–22
work page 2019
-
[18]
Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, and Yongbin Li. 2023. Graphix-t5: Mixing pre-trained transformers with graph-aware layers for text-to-sql parsing. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 37. 13076–13084
work page 2023
- [19]
-
[20]
Tengchao Ma, Changqiao Xu, Qingzhao An, Xiaohui Kuang, Lujie Zhong, and Luigi Alfredo Grieco. 2022. A Proactive Defense Strategy Against SGX Side- channel Attacks via self-checking DRL in the Cloud. In ICC 2022-IEEE Interna- tional Conference on Communications . IEEE, 4174–4179
work page 2022
-
[21]
Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V Rozas, Hisham Shafi, Vedvyas Shanbhogue, and Uday R Savagaonkar. 2013. Innovative instructions and software model for isolated execution. Hasp@ isca 10, 1 (2013)
work page 2013
-
[22]
Microsoft Azure. 2024. https://azure.microsoft.com/en-us/blog/azure- confidential-computing-with-nvidia-gpus-for-trustworthy-ai/. https: //azure.microsoft.com/en-us/blog/azure-confidential-computing-with-nvidia- gpus-for-trustworthy-ai/. Accessed: [2024.10.11]
work page 2024
-
[23]
Fan Mo, Ali Shahin Shamsabadi, Kleomenis Katevas, Soteris Demetriou, Ilias Leontiadis, Andrea Cavallaro, and Hamed Haddadi. 2020. Darknetz: towards model privacy at the edge using trusted execution environments. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services . 161–174
work page 2020
-
[24]
NVIDIA. 2024. NVIDIA Confidential Computing. https://www.nvidia.com/en- us/data-center/solutions/confidential-computing/. Accessed: [2024.10.11]
work page 2024
-
[25]
NVIDIA. 2024. NVIDIA H100 Tensor Core GPU. https://www.nvidia.com/en- us/data-center/h100/. Accessed: [2024.3.18]
work page 2024
-
[26]
OpenAI. 2023. GPT-4. https://openai.com/gpt-4. Accessed: [2023.11.17]
work page 2023
-
[27]
Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition . 4954–4963
work page 2019
-
[28]
Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, and Vinod Ganapathy. 2020. Activethief: Model extraction using active learning and unannotated public data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 865–872
work page 2020
-
[29]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9
work page 2019
-
[30]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[31]
Claude E Shannon. 1949. Communication theory of secrecy systems. The Bell system technical journal 28, 4 (1949), 656–715
work page 1949
-
[32]
Tianxiang Shen, Ji Qi, Jianyu Jiang, Xian Wang, Siyuan Wen, Xusheng Chen, Shixiong Zhao, Sen Wang, Li Chen, Xiapu Luo, et al. 2022. {SOTER}: Guarding Black-box Inference for General Neural Networks at the Edge. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). 723–738
work page 2022
-
[33]
Petr Socha, Vojtěch Miškovsk`y, and Martin Novotn`y. 2022. A comprehensive survey on the non-invasive passive side-channel analysis. Sensors 22, 21 (2022), 8096
work page 2022
-
[34]
Zhichuang Sun, Ruimin Sun, Changming Liu, Amrita Roy Chowdhury, Long Lu, and Somesh Jha. 2023. Shadownet: A secure and efficient on-device model inference system for convolutional neural networks. In 2023 IEEE Symposium on Security and Privacy (SP) . IEEE, 1596–1612
work page 2023
-
[35]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning . PMLR, 6105–6114
work page 2019
-
[36]
Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupati- raju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. 2024. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[37]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Florian Tramer and Dan Boneh. 2018. Slalom: Fast, verifiable and private exe- cution of neural networks in trusted hardware. arXiv preprint arXiv:1806.03287 (2018). arXiv, October, 2024, Li et al
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[39]
Jo Van Bulck, Nico Weichbrodt, Rüdiger Kapitza, Frank Piessens, and Raoul Strackx. 2017. Telling your secrets without page faults: Stealthy page {Table- Based} attacks on enclaved execution. In 26th USENIX Security Symposium (USENIX Security 17). 1041–1056
work page 2017
-
[40]
Wubing Wang, Mengyuan Li, Yinqian Zhang, and Zhiqiang Lin. 2023. PwrLeak: Exploiting Power Reporting Interface for Side-Channel Attacks on AMD SEV. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 46–66
work page 2023
- [41]
-
[42]
Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto
-
[43]
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . 6442–6454
work page 2020
-
[44]
An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Cheng- peng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. 2024. Qwen2 technical report. arXiv preprint arXiv:2407.10671 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[46]
Ziqi Zhang, Chen Gong, Yifeng Cai, Yuanyuan Yuan, Bingyan Liu, Ding Li, Yao Guo, and Xiangqun Chen. 2023. No Privacy Left Outside: On the (In-) Security of TEE-Shielded DNN Partition for On-Device ML. In 2024 IEEE Symposium on Security and Privacy (SP) . IEEE Computer Society, 52–52
work page 2023
- [47]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.