pith. sign in

arxiv: 2410.13903 · v3 · submitted 2024-10-16 · 💻 cs.CR · cs.AI· cs.DC

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

Pith reviewed 2026-05-23 18:33 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.DC
keywords CoreGuardLLM protectionmodel stealingedge deploymentprotection protocolmodel extraction defensefine-tuning attack
0
0 comments X

The pith

CoreGuard protects edge-deployed LLMs from model stealing via efficient protocols that deliver upper-bound security at negligible overhead.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CoreGuard as a method to defend proprietary LLMs running on edge devices against attackers who try to steal weights or fine-tune the model. Existing defenses are too slow or communication-heavy for edge settings, so CoreGuard uses a streamlined protection protocol to lower computation costs and a propagation protocol to cut communication. Experiments indicate this combination reaches the highest level of security the authors consider achievable while adding almost no extra load. If the approach holds, it would allow companies to place capable LLMs on phones and other local hardware without exposing the core model.

Core claim

CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.

What carries the argument

CoreGuard's efficient protection protocol paired with its propagation protocol, which together block weight extraction and advanced fine-tuning attacks while keeping costs low.

If this is right

  • Proprietary LLMs can be deployed on edge devices without exposing full weights to local attackers.
  • Both direct extraction and subsequent fine-tuning attacks are blocked at the same time.
  • Computational and communication costs stay low enough for practical use on resource-limited hardware.
  • Security reaches the upper bound the authors define without the trade-offs seen in prior defenses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same protocol structure might extend to other generative models that face similar extraction risks on distributed devices.
  • Hardware-level support such as secure enclaves could further strengthen the protection in future implementations.
  • Testing across multiple LLM sizes and architectures would clarify how broadly the negligible-overhead result holds.

Load-bearing premise

The protocols can be realized in practice on real edge hardware without creating new attack surfaces or hidden overheads that would undermine the claimed security level.

What would settle it

A successful model-weight extraction or fine-tuning attack that bypasses CoreGuard on standard edge hardware, or measured overhead exceeding the negligible threshold reported in the experiments.

Figures

Figures reproduced from arXiv: 2410.13903 by Hao Peng, Jianwei Yin, Lijun Zhang, Qinfeng Li, Tianyue Luo, Xianwei Zhu, Xinkui Zhao, Xuhong Zhang, Yangfan Xie, Yier Jin, Zhiqiang Shen.

Figure 1
Figure 1. Figure 1: Paradigms of model stealing. (a) Task-specific model [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An overview of CoreGuard. (a) Model locking: before deployment, CoreGuard permutes layers in the original model, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CoreGuard’s Defense Effectiveness Against Model [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of authorization position on security. Model [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces CoreGuard, a computation- and communication-efficient protection method for proprietary LLMs deployed on edge devices. It employs an efficient protection protocol to reduce computational overhead and a propagation protocol to minimize communication overhead. The authors assert that extensive experiments demonstrate CoreGuard achieves upper-bound security protection against model stealing attacks (including weight extraction and fine-tuning) with negligible overhead.

Significance. If the protocols can be shown to deliver the claimed security guarantees without introducing new attack surfaces or hidden costs, the work would address a practical gap in LLM edge deployment by offering a defense that existing methods lack due to prohibitive overhead. The absence of any quantitative results, baselines, or protocol definitions in the abstract, however, prevents assessment of whether this potential is realized.

major comments (2)
  1. Abstract: the central claim that CoreGuard 'achieves upper-bound security protection with negligible overhead' is asserted without any data, baselines, metrics, attacker model, definition of 'upper-bound,' or protocol details. This prevents evaluation of the claim that the protection and propagation protocols simultaneously prevent extraction/fine-tuning attacks while adding only negligible cost.
  2. Abstract: the assumption that the efficient protection protocol and propagation protocol can be realized in practice without new attack surfaces (e.g., side-channel leakage or key-distribution issues) or hidden overheads is not addressed, which is load-bearing for the upper-bound security assertion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We agree that the abstract requires strengthening to better convey the quantitative results and security analysis from the full manuscript. We will revise accordingly while preserving the paper's core contributions.

read point-by-point responses
  1. Referee: Abstract: the central claim that CoreGuard 'achieves upper-bound security protection with negligible overhead' is asserted without any data, baselines, metrics, attacker model, definition of 'upper-bound,' or protocol details. This prevents evaluation of the claim that the protection and propagation protocols simultaneously prevent extraction/fine-tuning attacks while adding only negligible cost.

    Authors: We agree the abstract would benefit from including representative quantitative results. The full manuscript defines the attacker model, 'upper-bound' security (zero successful weight extraction or effective fine-tuning under the protocol), and reports concrete metrics including attack success rates, computation overhead (under 5% relative to unprotected inference), and communication costs in the experiments and security analysis sections, with comparisons to baselines. In revision we will condense key numbers and definitions into the abstract. revision: yes

  2. Referee: Abstract: the assumption that the efficient protection protocol and propagation protocol can be realized in practice without new attack surfaces (e.g., side-channel leakage or key-distribution issues) or hidden overheads is not addressed, which is load-bearing for the upper-bound security assertion.

    Authors: The manuscript's security analysis section explicitly argues that the protocols operate within the stated threat model without introducing extractable side information or additional communication that could enable new attacks. Key distribution is handled via standard secure channels assumed in the model. We will add a short clarifying paragraph in the revised manuscript to explicitly address side-channel and key-distribution considerations and confirm they fall outside the evaluated threat model. revision: partial

Circularity Check

0 steps flagged

No circularity; claims rest on empirical experiments

full rationale

The paper introduces CoreGuard via description of protocols and reports experimental outcomes for security and overhead. No equations, parameter fitting, or derivation steps appear in the abstract or described structure. Central claims are not reduced by construction to inputs, self-citations, or prior author results; they are presented as measured results from implementation. This is a standard empirical security paper with no load-bearing self-referential logic.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, datasets, or implementation details, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5714 in / 1092 out tokens · 22200 ms · 2026-05-23T18:33:14.230827+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 9 internal anchors

  1. [1]

    AKM Mubashwir Alam and Keke Chen. 2023. Making your program oblivious: a comparative study for side-channel-safe confidential computing. In 2023 IEEE 16th International Conference on Cloud Computing (CLOUD) . IEEE, 282–289

  2. [2]

    Tiago Alves. 2004. Trustzone: Integrated hardware and software security. Infor- mation Quarterly 3 (2004), 18–24

  3. [3]

    Apple Inc. 2024. Deploying Transformers on the Apple Neural Engine. https: //machinelearning.apple.com/research/neural-engine-transformers. Accessed: [2024.08.05]

  4. [4]

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901

  5. [5]

    Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, and Kai Yu. 2021. LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non- Local Relations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . 2541–2555

  6. [6]

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021)

  7. [7]

    Ghada Dessouky, Tommaso Frassetto, and Ahmad-Reza Sadeghi. 2020. {HybCache}: Hybrid {Side-Channel-Resilient} caches for trusted execution environments. In 29th USENIX Security Symposium (USENIX Security 20) . 451– 468

  8. [8]

    Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2021. Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360 (2021)

  9. [9]

    Tarek Elgamal and Klara Nahrstedt. 2020. Serdab: An IoT framework for par- titioning neural networks computation across multiple enclaves. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). IEEE, 519–528

  10. [10]

    Google. 2023. Gemini. https://blog.google/technology/ai/google-gemini-ai/. Accessed: [2024.03.12]

  11. [11]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

  12. [12]

    Weizhe Hua, Muhammad Umar, Zhiru Zhang, and G Edward Suh. 2022. Guardnn: secure accelerator architecture for privacy-preserving deep learning. In Proceed- ings of the 59th ACM/IEEE Design Automation Conference . 349–354

  13. [13]

    Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W Cohen, and Xinghua Lu

  14. [14]

    PubMedQA: A Dataset for Biomedical Research Question Answering

    Pubmedqa: A dataset for biomedical research question answering. arXiv preprint arXiv:1909.06146 (2019)

  15. [15]

    David Kaplan, Jeremy Powell, and Tom Woller. 2016. AMD memory encryption. White paper 13 (2016)

  16. [16]

    Diederik P Kingma. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  17. [17]

    Paul Leignac, Olivier Potin, Jean-Baptiste Rigaud, Jean-Max Dutertre, and Simon Pontié. 2019. Comparison of side-channel leakage on Rich and Trusted Execution Environments. In Proceedings of the Sixth Workshop on Cryptography and Security in Computing Systems. 19–22

  18. [18]

    Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, and Yongbin Li. 2023. Graphix-t5: Mixing pre-trained transformers with graph-aware layers for text-to-sql parsing. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 37. 13076–13084

  19. [19]

    Qinfeng Li, Zhiqiang Shen, Zhenghan Qin, Yangfan Xie, Xuhong Zhang, Tianyu Du, and Jianwei Yin. 2024. TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment. arXiv preprint arXiv:2404.11121 (2024)

  20. [20]

    Tengchao Ma, Changqiao Xu, Qingzhao An, Xiaohui Kuang, Lujie Zhong, and Luigi Alfredo Grieco. 2022. A Proactive Defense Strategy Against SGX Side- channel Attacks via self-checking DRL in the Cloud. In ICC 2022-IEEE Interna- tional Conference on Communications . IEEE, 4174–4179

  21. [21]

    Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V Rozas, Hisham Shafi, Vedvyas Shanbhogue, and Uday R Savagaonkar. 2013. Innovative instructions and software model for isolated execution. Hasp@ isca 10, 1 (2013)

  22. [22]

    Microsoft Azure. 2024. https://azure.microsoft.com/en-us/blog/azure- confidential-computing-with-nvidia-gpus-for-trustworthy-ai/. https: //azure.microsoft.com/en-us/blog/azure-confidential-computing-with-nvidia- gpus-for-trustworthy-ai/. Accessed: [2024.10.11]

  23. [23]

    Fan Mo, Ali Shahin Shamsabadi, Kleomenis Katevas, Soteris Demetriou, Ilias Leontiadis, Andrea Cavallaro, and Hamed Haddadi. 2020. Darknetz: towards model privacy at the edge using trusted execution environments. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services . 161–174

  24. [24]

    NVIDIA. 2024. NVIDIA Confidential Computing. https://www.nvidia.com/en- us/data-center/solutions/confidential-computing/. Accessed: [2024.10.11]

  25. [25]

    NVIDIA. 2024. NVIDIA H100 Tensor Core GPU. https://www.nvidia.com/en- us/data-center/h100/. Accessed: [2024.3.18]

  26. [26]

    OpenAI. 2023. GPT-4. https://openai.com/gpt-4. Accessed: [2023.11.17]

  27. [27]

    Tribhuvanesh Orekondy, Bernt Schiele, and Mario Fritz. 2019. Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition . 4954–4963

  28. [28]

    Soham Pal, Yash Gupta, Aditya Shukla, Aditya Kanade, Shirish Shevade, and Vinod Ganapathy. 2020. Activethief: Model extraction using active learning and unannotated public data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 865–872

  29. [29]

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9

  30. [30]

    Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)

  31. [31]

    Claude E Shannon. 1949. Communication theory of secrecy systems. The Bell system technical journal 28, 4 (1949), 656–715

  32. [32]

    Tianxiang Shen, Ji Qi, Jianyu Jiang, Xian Wang, Siyuan Wen, Xusheng Chen, Shixiong Zhao, Sen Wang, Li Chen, Xiapu Luo, et al. 2022. {SOTER}: Guarding Black-box Inference for General Neural Networks at the Edge. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). 723–738

  33. [33]

    Petr Socha, Vojtěch Miškovsk`y, and Martin Novotn`y. 2022. A comprehensive survey on the non-invasive passive side-channel analysis. Sensors 22, 21 (2022), 8096

  34. [34]

    Zhichuang Sun, Ruimin Sun, Changming Liu, Amrita Roy Chowdhury, Long Lu, and Somesh Jha. 2023. Shadownet: A secure and efficient on-device model inference system for convolutional neural networks. In 2023 IEEE Symposium on Security and Privacy (SP) . IEEE, 1596–1612

  35. [35]

    Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning . PMLR, 6105–6114

  36. [36]

    Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupati- raju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. 2024. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295 (2024)

  37. [37]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

  38. [38]

    Florian Tramer and Dan Boneh. 2018. Slalom: Fast, verifiable and private exe- cution of neural networks in trusted hardware. arXiv preprint arXiv:1806.03287 (2018). arXiv, October, 2024, Li et al

  39. [39]

    Jo Van Bulck, Nico Weichbrodt, Rüdiger Kapitza, Frank Piessens, and Raoul Strackx. 2017. Telling your secrets without page faults: Stealthy page {Table- Based} attacks on enclaved execution. In 26th USENIX Security Symposium (USENIX Security 17). 1041–1056

  40. [40]

    Wubing Wang, Mengyuan Li, Yinqian Zhang, and Zhiqiang Lin. 2023. PwrLeak: Exploiting Power Reporting Interface for Side-Channel Attacks on AMD SEV. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 46–66

  41. [41]

    Guangxuan Xiao, Ji Lin, and Song Han. 2023. Offsite-tuning: Transfer learning without full model. arXiv preprint arXiv:2302.04870 (2023)

  42. [42]

    Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto

  43. [43]

    In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . 6442–6454

  44. [44]

    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Cheng- peng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. 2024. Qwen2 technical report. arXiv preprint arXiv:2407.10671 (2024)

  45. [45]

    Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887 (2018)

  46. [46]

    Ziqi Zhang, Chen Gong, Yifeng Cai, Yuanyuan Yuan, Bingyan Liu, Ding Li, Yao Guo, and Xiangqun Chen. 2023. No Privacy Left Outside: On the (In-) Security of TEE-Shielded DNN Partition for On-Device ML. In 2024 IEEE Symposium on Security and Privacy (SP) . IEEE Computer Society, 52–52

  47. [47]

    Jianwei Zhu, Hang Yin, and Shunfan Zhou. 2024. Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study. arXiv preprint arXiv:2409.03992 (2024)