PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
Pith reviewed 2026-05-20 23:28 UTC · model grok-4.3
The pith
PragLocker builds prompts that preserve function only on one target LLM by anchoring semantics with code symbols and injecting target-derived noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PragLocker constructs function-preserving obfuscated prompts by anchoring semantics with code symbols and then using target-model feedback to inject noise, yielding prompts that only work on the target LLM. Experiments across multiple agent systems, datasets, and foundation LLMs show that PragLocker substantially reduces cross-LLM portability, maintains target performance, and remains robust against adaptive attackers.
What carries the argument
Function-preserving obfuscated prompts created by anchoring semantics with code symbols and injecting noise from target-model feedback.
Load-bearing premise
Noise patterns taken from target-model feedback cannot be closely approximated or reversed by an adversary who queries a different but similar LLM or holds partial knowledge of the anchoring symbols.
What would settle it
An experiment in which an adversary queries one or more alternate LLMs and recovers a prompt that matches or exceeds the original target performance would falsify the non-portability claim.
Figures
read the original abstract
LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property. However, in untrusted deployments, adversaries can copy and reuse these prompts with other proprietary LLMs, causing economic losses. To protect these prompts, we identify four key challenges: proactivity, runtime protection, usability, and non-portability that existing approaches fail to address. We present PragLocker, a prompt protection scheme that satisfies these requirements. PragLocker constructs function-preserving obfuscated prompts by anchoring semantics with code symbols and then using target-model feedback to inject noise, yielding prompts that only work on the target LLM. Experiments across multiple agent systems, datasets, and foundation LLMs show that PragLocker substantially reduces cross-LLM portability, maintains target performance, and remains robust against adaptive attackers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PragLocker, a prompt protection scheme for LLM agents that constructs function-preserving obfuscated prompts by anchoring semantics with code symbols and using target-model feedback to inject noise. This yields prompts that only work on the target LLM, addressing challenges of proactivity, runtime protection, usability, and non-portability. Experiments across multiple agent systems, datasets, and foundation LLMs are reported to show substantial reduction in cross-LLM portability, maintained target performance, and robustness against adaptive attackers.
Significance. If the non-portability property holds under realistic adversarial conditions, PragLocker could provide a valuable tool for protecting intellectual property in LLM-based agents deployed in untrusted environments. The approach of combining semantic anchoring with model-specific noise injection appears novel and could advance the field of AI security. However, the strength of the contribution depends on the rigor of the experimental validation and analysis of the noise's specificity.
major comments (3)
- §3.2 (Method Construction): The description of injecting noise derived from target-model feedback lacks a formal characterization or equation defining the noise distribution. Without this, it is difficult to evaluate whether the non-portability is due to target-specific idiosyncrasies or generic variance in instruction-following.
- §5 (Experiments): The results claim substantial reduction in cross-LLM portability and robustness to adaptive attackers, but lack quantitative metrics with error bars, dataset sizes, or specific attack details. This makes verification of the central claim challenging beyond the high-level description.
- §4.3 (Analysis of Anchoring): There is no analysis of how code-symbol anchoring interacts with tokenizer or embedding differences across LLMs. If the anchoring relies on shared code symbols, adversaries with partial knowledge could potentially recover functionality.
minor comments (2)
- Abstract: The abstract mentions 'substantially reduces' without defining what 'substantial' means in terms of specific percentages or metrics.
- Introduction: Some references to existing approaches could be expanded to better highlight the gaps filled by PragLocker.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. The comments identify valuable opportunities to improve the formal presentation, experimental reporting, and security analysis. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: §3.2 (Method Construction): The description of injecting noise derived from target-model feedback lacks a formal characterization or equation defining the noise distribution. Without this, it is difficult to evaluate whether the non-portability is due to target-specific idiosyncrasies or generic variance in instruction-following.
Authors: We agree that a formal characterization is needed for rigor. In the revised manuscript we will add an explicit definition in §3.2 that models the injected noise as a perturbation drawn from the discrepancy between the target LLM's output distribution on the anchored prompt and a reference distribution obtained from feedback queries. This formulation will make clear that non-portability arises from model-specific response idiosyncrasies rather than generic instruction-following variance. revision: yes
-
Referee: §5 (Experiments): The results claim substantial reduction in cross-LLM portability and robustness to adaptive attackers, but lack quantitative metrics with error bars, dataset sizes, or specific attack details. This makes verification of the central claim challenging beyond the high-level description.
Authors: The referee correctly notes that additional quantitative detail is required. We will revise §5 to report means and standard deviations (error bars) over repeated trials, state the exact sizes of all evaluation datasets, and describe the adaptive attack procedures including query budgets, attacker model variants, and the precise success criteria used to measure portability reduction and robustness. revision: yes
-
Referee: §4.3 (Analysis of Anchoring): There is no analysis of how code-symbol anchoring interacts with tokenizer or embedding differences across LLMs. If the anchoring relies on shared code symbols, adversaries with partial knowledge could potentially recover functionality.
Authors: We acknowledge the value of examining tokenizer and embedding interactions. The anchoring step deliberately selects code symbols that appear in the vocabularies of the evaluated LLMs; the subsequent target-specific noise is what prevents straightforward recovery. In the revision we will expand §4.3 with a discussion of observed tokenizer overlap across the tested models and explain why partial-knowledge recovery remains ineffective once noise is applied. If space permits we will also include a short empirical note drawn from our existing cross-model experiments. revision: partial
Circularity Check
No significant circularity: forward construction process with experimental validation
full rationale
The paper presents PragLocker as a procedural construction: anchor semantics via code symbols then inject noise derived from target-model feedback to produce non-portable prompts. This is described as a forward engineering method rather than any quantity defined in terms of its own outputs, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or derivations reduce the non-portability claim to a tautology or post-hoc fit; the abstract explicitly frames the result as an outcome of the described process, with cross-LLM experiments serving as external checks. The derivation chain remains self-contained against benchmarks and does not exhibit any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Target LLMs produce distinguishable feedback signals that can be used to craft non-portable noise without harming task performance on the target.
Reference graph
Works this paper leans on
-
[1]
2024 IEEE Symposium on Security and Privacy (SP) , pages=
No Privacy Left Outside: On the (In-) Security of TEE-Shielded DNN Partition for On-Device ML , author=. 2024 IEEE Symposium on Security and Privacy (SP) , pages=. 2023 , organization=
work page 2024
-
[2]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Bert: Pre-training of deep bidirectional transformers for language understanding , author=. arXiv preprint arXiv:1810.04805 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
-
[4]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Proceedings of NAACL-HLT , pages=
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , author=. Proceedings of NAACL-HLT , pages=
-
[6]
Advances in neural information processing systems , volume=
Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
-
[7]
Proceedings of the 46th International Symposium on Computer Architecture , pages=
DeepAttest: An end-to-end attestation framework for deep neural networks , author=. Proceedings of the 46th International Symposium on Computer Architecture , pages=
-
[8]
Proceedings of the 38th Annual Computer Security Applications Conference , pages=
Boosting Neural Networks to Decompile Optimized Binaries , author=. Proceedings of the 38th Annual Computer Security Applications Conference , pages=
-
[9]
31st USENIX Security Symposium (USENIX Security 22) , pages=
\ DnD \ : A \ Cross-Architecture \ deep neural network decompiler , author=. 31st USENIX Security Symposium (USENIX Security 22) , pages=
- [10]
- [11]
-
[12]
Pushing large language models to the 6G edge: Vision, challenges, and opportunities,
Pushing large language models to the 6g edge: Vision, challenges, and opportunities , author=. arXiv preprint arXiv:2309.16739 , year=
- [13]
- [14]
- [15]
-
[16]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Knockoff nets: Stealing functionality of black-box models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[17]
29th USENIX security symposium (USENIX Security 20) , pages=
High accuracy and high fidelity extraction of neural networks , author=. 29th USENIX security symposium (USENIX Security 20) , pages=
-
[18]
Towards the Science of Security and Privacy in Machine Learning
Towards the science of security and privacy in machine learning , author=. arXiv preprint arXiv:1611.03814 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
27th USENIX Security Symposium (USENIX Security 18) , pages=
Turning your weakness into a strength: Watermarking deep neural networks by backdooring , author=. 27th USENIX Security Symposium (USENIX Security 18) , pages=
-
[20]
30th USENIX Security Symposium (USENIX Security 21) , pages=
Entangled watermarks as a defense against model extraction , author=. 30th USENIX Security Symposium (USENIX Security 21) , pages=
-
[21]
Expert Systems with Applications , volume=
An invisible and robust watermarking scheme using convolutional neural networks , author=. Expert Systems with Applications , volume=. 2022 , publisher=
work page 2022
-
[22]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Mlcapsule: Guarded offline deployment of machine learning as a service , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[23]
Advances in neural information processing systems , volume=
Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks , author=. Advances in neural information processing systems , volume=
-
[24]
Towards trained model confidentiality and integrity using trusted execution environments , author=. Applied Cryptography and Network Security Workshops: ACNS 2021 Satellite Workshops, AIBlock, AIHWS, AIoTS, CIMSS, Cloud S&P, SCI, SecMT, and SiMLA, Kamakura, Japan, June 21--24, 2021, Proceedings , pages=. 2021 , organization=
work page 2021
-
[25]
2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=
Hardware-assisted intellectual property protection of deep learning models , author=. 2020 57th ACM/IEEE Design Automation Conference (DAC) , pages=. 2020 , organization=
work page 2020
-
[26]
International Conference on Machine Learning , pages=
NNSplitter: an active defense solution for DNN model via automated weight obfuscation , author=. International Conference on Machine Learning , pages=. 2023 , organization=
work page 2023
-
[27]
Multimedia Security, Idea Group Publishing, Singapore , pages=
Steganography and digital watermarking techniques for protection of intellectual property , author=. Multimedia Security, Idea Group Publishing, Singapore , pages=
-
[28]
One-time pad cryptography , author=. Cryptologia , volume=. 1996 , publisher=
work page 1996
-
[29]
Counting The Cost Of Training Large Language Models , howpublished =. 2022 , author =
work page 2022
-
[30]
2022 IEEE Symposium on Security and Privacy (SP) , pages=
Model stealing attacks against inductive graph neural networks , author=. 2022 IEEE Symposium on Security and Privacy (SP) , pages=. 2022 , organization=
work page 2022
-
[31]
8th International Conference on Learning Representations , year=
Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks , author=. 8th International Conference on Learning Representations , year=
-
[32]
31st USENIX Security Symposium (USENIX Security 22) , pages=
Teacher model fingerprinting attacks against transfer learning , author=. 31st USENIX Security Symposium (USENIX Security 22) , pages=
-
[33]
2022 IEEE symposium on security and privacy (SP) , pages=
Copy, right? a testing framework for copyright protection of deep learning models , author=. 2022 IEEE symposium on security and privacy (SP) , pages=. 2022 , organization=
work page 2022
- [34]
- [35]
-
[36]
Information Quarterly , volume=
Trustzone: Integrated hardware and software security , author=. Information Quarterly , volume=
-
[37]
The Bell system technical journal , volume=
Communication theory of secrecy systems , author=. The Bell system technical journal , volume=. 1949 , publisher=
work page 1949
-
[38]
Proceedings of the 55th Annual Design Automation Conference , pages=
Reverse engineering convolutional neural networks through side-channel information leaks , author=. Proceedings of the 55th Annual Design Automation Conference , pages=
-
[39]
2022 IEEE symposium on security and privacy (SP) , pages=
Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories , author=. 2022 IEEE symposium on security and privacy (SP) , pages=. 2022 , organization=
work page 2022
-
[40]
29th USENIX Security Symposium (USENIX Security 20) , pages=
Cache telepathy: Leveraging shared resource attacks to learn \ DNN \ architectures , author=. 29th USENIX Security Symposium (USENIX Security 20) , pages=
-
[41]
A Fast, Performant, Secure Distributed Training Framework For LLM , year=
Huang, Wei and Wang, Yinggui and Cheng, Anda and Zhou, Aihui and Yu, Chaofan and Wang, Lei , booktitle=. A Fast, Performant, Secure Distributed Training Framework For LLM , year=
-
[42]
Advances in neural information processing systems , volume=
Attention is all you need , author=. Advances in neural information processing systems , volume=
-
[43]
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
GLUE: A multi-task benchmark and analysis platform for natural language understanding , author=. arXiv preprint arXiv:1804.07461 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[44]
Training Verifiers to Solve Math Word Problems
Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[45]
Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task , author=. arXiv preprint arXiv:1809.08887 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[46]
PubMedQA: A Dataset for Biomedical Research Question Answering
Pubmedqa: A dataset for biomedical research question answering , author=. arXiv preprint arXiv:1909.06146 , year=
work page internal anchor Pith review arXiv 1909
-
[47]
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Squad: 100,000+ questions for machine comprehension of text , author=. arXiv preprint arXiv:1606.05250 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[49]
Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , author=. arXiv preprint arXiv:1910.13461 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[50]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[51]
arXiv preprint arXiv:2103.10360 , year=
Glm: General language model pretraining with autoregressive blank infilling , author=. arXiv preprint arXiv:2103.10360 , year=
-
[52]
MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture , pages=
DarKnight: An accelerated framework for privacy and integrity preserving deep learning using trusted hardware , author=. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture , pages=
-
[53]
PPFL: privacy-preserving federated learning with trusted execution environments , author=. Proceedings of the 19th annual international conference on mobile systems, applications, and services , pages=
-
[54]
Darknetz: towards model privacy at the edge using trusted execution environments , author=. Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services , pages=
-
[55]
2023 IEEE Symposium on Security and Privacy (SP) , pages=
Shadownet: A secure and efficient on-device model inference system for convolutional neural networks , author=. 2023 IEEE Symposium on Security and Privacy (SP) , pages=. 2023 , organization=
work page 2023
-
[56]
2022 USENIX Annual Technical Conference (USENIX ATC 22) , pages=
\ SOTER \ : Guarding Black-box Inference for General Neural Networks at the Edge , author=. 2022 USENIX Annual Technical Conference (USENIX ATC 22) , pages=
work page 2022
-
[57]
International conference on machine learning , pages=
Efficientnet: Rethinking model scaling for convolutional neural networks , author=. International conference on machine learning , pages=. 2019 , organization=
work page 2019
-
[58]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[59]
Serdab: An IoT framework for partitioning neural networks computation across multiple enclaves , author=. 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) , pages=. 2020 , organization=
work page 2020
-
[60]
Assessing the Ability of Self-Attention Networks to Learn Word Order
Assessing the ability of self-attention networks to learn word order , author=. arXiv preprint arXiv:1906.00592 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[61]
13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) , pages=
Graviton: Trusted execution environments on \ GPUs \ , author=. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) , pages=
-
[62]
Proceedings of the 59th ACM/IEEE Design Automation Conference , pages=
Guardnn: secure accelerator architecture for privacy-preserving deep learning , author=. Proceedings of the 59th ACM/IEEE Design Automation Conference , pages=
- [63]
-
[64]
The 25th Annual International Conference on Mobile Computing and Networking , pages=
Occlumency: Privacy-preserving remote deep-learning inference using SGX , author=. The 25th Annual International Conference on Mobile Computing and Networking , pages=
-
[65]
Proceedings of the ACM Symposium on Cloud Computing , pages=
Lasagna: Accelerating secure deep learning inference in sgx-enabled edge cloud , author=. Proceedings of the ACM Symposium on Cloud Computing , pages=
-
[66]
Proceedings of the 11th ACM Symposium on Cloud Computing , pages=
Vessels: Efficient and scalable deep learning prediction on trusted processors , author=. Proceedings of the 11th ACM Symposium on Cloud Computing , pages=
-
[67]
Occlum: Secure and efficient multitasking inside a single enclave of intel sgx , author=. Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems , pages=
-
[68]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Goten: Gpu-outsourcing trusted execution of neural network training , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[69]
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
Slalom: Fast, verifiable and private execution of neural networks in trusted hardware , author=. arXiv preprint arXiv:1806.03287 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[70]
arXiv preprint arXiv:2312.00025 , year=
Secure Transformer Inference , author=. arXiv preprint arXiv:2312.00025 , year=
-
[71]
2020 IEEE Symposium on Security and Privacy (SP) , pages=
Privacy risks of general-purpose language models , author=. 2020 IEEE Symposium on Security and Privacy (SP) , pages=. 2020 , organization=
work page 2020
-
[72]
International conference on machine learning , pages=
Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy , author=. International conference on machine learning , pages=. 2016 , organization=
work page 2016
-
[73]
27th USENIX security symposium (USENIX security 18) , pages=
\ GAZELLE \ : A low latency framework for secure neural network inference , author=. 27th USENIX security symposium (USENIX security 18) , pages=
-
[74]
Proceedings of the 2018 on Asia conference on computer and communications security , pages=
Chameleon: A hybrid secure computation framework for machine learning applications , author=. Proceedings of the 2018 on Asia conference on computer and communications security , pages=
work page 2018
-
[75]
2024 IEEE Symposium on Security and Privacy (SP) , pages=
Promptcare: Prompt copyright protection by watermark injection and verification , author=. 2024 IEEE Symposium on Security and Privacy (SP) , pages=. 2024 , organization=
work page 2024
-
[76]
arXiv preprint arXiv:2509.03117 , year=
PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs , author=. arXiv preprint arXiv:2509.03117 , year=
work page internal anchor Pith review arXiv
-
[77]
34th USENIX Security Symposium (USENIX Security 25) , pages=
Prompt obfuscation for large language models , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=
-
[78]
arXiv preprint arXiv:2405.00298 , year=
The reversing machine: reconstructing memory assumptions , author=. arXiv preprint arXiv:2405.00298 , year=
-
[79]
Cryptology ePrint Archive , year=
Intel SGX explained , author=. Cryptology ePrint Archive , year=
-
[80]
Encrypting Confidential Data at Rest , author =. 2025 , month = may, url =
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.