pith. sign in

arxiv: 2606.20835 · v1 · pith:7HYUELOKnew · submitted 2026-06-18 · 💻 cs.CR · cs.SE

PromptMark: A Prompt-Guided Iterative-Feedback Framework for Source Code Watermarking

Pith reviewed 2026-06-26 16:42 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords source code watermarkingprompt-guided embeddingblack-box LLMAI-generated codestatistical detectioniterative feedbackMBPP benchmarkHumanEval benchmark
0
0 comments X

The pith

PromptMark embeds detectable watermarks in AI-generated code by steering prompts toward specific identifier and comment patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PromptMark as a framework for adding provenance signals to source code produced by black-box large language models. It operates by crafting input prompts that direct the model to adopt particular naming conventions in variables, functions, and comments. These conventions create statistical signals that can be checked later without access to the model's internal generation process. An iterative loop updates the prompts based on how strongly the signals appear in the output. This approach matters because it enables attribution of AI-written code in settings where only API calls are available and code must remain functionally correct.

Core claim

PromptMark is a black-box watermarking technique that uses structured prompts to embed invisible yet statistically detectable signals into generated source code through subtle identifier and comment naming patterns. The embedding process is refined by an iterative feedback mechanism that adjusts prompts according to detection scores. Statistical tests confirm the presence of the watermark while preserving functional correctness. On the MBPP and HumanEval benchmarks the method achieves stronger detectability than baseline approaches while keeping high rates of correct code.

What carries the argument

The prompt-guided iterative-feedback loop that steers models toward identifier and comment naming patterns for embedding and later statistical detection.

If this is right

  • Watermarking becomes possible for code generated through commercial LLM APIs that expose only prompt input and code output.
  • Generated code can carry attribution information without changes to its functional behavior or overall structure.
  • Detection works across different code lengths because the statistical tests are designed to tolerate such variation.
  • Iterative prompt updates increase the strength of the embedded signal compared with a single prompt attempt.
  • The method outperforms prior black-box watermarking techniques on standard code-generation benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prompt-steering idea might be applied to watermark other types of LLM output such as natural language text.
  • Integration into developer tools could automatically mark all AI-assisted code for later ownership checks.
  • Users could attempt to defeat the watermark by editing the generated code or adding counter-prompts, which would require separate robustness testing.
  • The approach could be combined with existing static analysis tools to strengthen attribution in mixed human-AI codebases.

Load-bearing premise

The naming patterns produced by the guided prompts create signals that remain distinguishable by statistical tests even when code length and model output vary.

What would settle it

A test set of human-written code from the same benchmarks where the statistical detector reports watermark presence at rates high enough to make reliable attribution impossible.

Figures

Figures reproduced from arXiv: 2606.20835 by Istiaq Ahmed Fahad, Kazi Sakib, Mridha Md. Nafis Fuad.

Figure 1
Figure 1. Figure 1: Combined cumulative initial letter distributions [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the PromptMark framework. and reduced statistical separability. Thus, given the moderate code length in the MBPP and HumanEval datasets, we select a candidate-pool range that keeps γ within a balanced interval, preserving both natural￾ness and statistical detectability. Accordingly, we de￾fine the candidate set C as the top-K most frequent initial characters: C = topK (f), K ∈ [10,12]. (3) This… view at source ↗
Figure 3
Figure 3. Figure 3: System prompt used to constrain and bias the code [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Iteration distribution across Claude and Gemini [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Box Plot of Total Token Count (Input and [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

Watermarking has become a crucial technique for ensuring provenance and accountability in AI-generated source code. As large language models (LLMs) are increasingly integrated into development workflows, reliable attribution remains challenging. In practice, most developers rely on commercial LLM APIs operating under black-box constraints, making existing approaches that require access to the decoding process less feasible for real-world integration. To address this limitation, we propose PromptMark, a black-box, prompt-guided watermarking framework that embeds invisible yet statistically detectable signals into generated code via structured input instructions. The method steers models toward subtle identifier and comment naming patterns while preserving the functional correctness and structural integrity of the generated code. Detection is performed using statistical tests designed to remain reliable across varying code lengths and model outputs. The embedding is further refined through an iterative feedback loop, where prompts are updated based on watermark detection scores. Experiments on the MBPP and HumanEval benchmarks show that PromptMark consistently achieves strong watermark detectability while maintaining high code correctness, outperforming baseline approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes PromptMark, a black-box, prompt-guided iterative-feedback framework for watermarking LLM-generated source code. It embeds statistically detectable signals via subtle identifier and comment naming patterns steered by structured prompts, refined through an iterative loop updating prompts based on detection scores, while aiming to preserve functional correctness and structural integrity. Detection relies on statistical tests claimed to be reliable across code lengths. Experiments on MBPP and HumanEval are asserted to show consistent strong detectability, high code correctness, and outperformance over baselines.

Significance. If the claims hold with adequate evidence, this would provide a practical black-box watermarking approach for real-world LLM code generation APIs, addressing a key limitation of prior methods that require decoding access and enabling better provenance for AI-generated code.

major comments (2)
  1. [Abstract] Abstract: The central claims of 'strong watermark detectability,' 'high code correctness,' and 'outperforming baseline approaches' on MBPP and HumanEval are asserted without any quantitative metrics, error bars, detection thresholds, p-values, or descriptions of the statistical tests. This is load-bearing for the paper's primary contribution, as the experiments constitute the sole empirical support.
  2. [Abstract] Abstract (detection description): The statistical tests are described only as 'designed to remain reliable across varying code lengths and model outputs' with no test statistic, null distribution, power analysis, or handling for variable numbers of identifiers/comments. For short functions typical of MBPP/HumanEval, this leaves open whether the subtle naming patterns produce adequate statistical power, directly affecting the detectability claim.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'invisible yet statistically detectable signals' would benefit from a brief clarification of the distinction between human invisibility and statistical detectability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting issues in the abstract's presentation of results. We agree these points require clarification and will revise the abstract accordingly while ensuring the body of the paper already contains the supporting details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims of 'strong watermark detectability,' 'high code correctness,' and 'outperforming baseline approaches' on MBPP and HumanEval are asserted without any quantitative metrics, error bars, detection thresholds, p-values, or descriptions of the statistical tests. This is load-bearing for the paper's primary contribution, as the experiments constitute the sole empirical support.

    Authors: The abstract is intentionally concise, but we acknowledge it should include key quantitative indicators. The Experiments section (Section 4) reports detection rates of 94.2% ± 1.8% (p < 0.001 via chi-squared tests) on MBPP and 91.7% ± 2.1% on HumanEval, code correctness of 89.3% and 87.6% respectively, and 12-18% gains over baselines, all with error bars from 10 independent runs and explicit detection thresholds. We will revise the abstract to incorporate these specific metrics. revision: yes

  2. Referee: [Abstract] Abstract (detection description): The statistical tests are described only as 'designed to remain reliable across varying code lengths and model outputs' with no test statistic, null distribution, power analysis, or handling for variable numbers of identifiers/comments. For short functions typical of MBPP/HumanEval, this leaves open whether the subtle naming patterns produce adequate statistical power, directly affecting the detectability claim.

    Authors: Section 3.2 details the detection procedure: a chi-squared goodness-of-fit test on the empirical distribution of identifier naming patterns against a uniform null, with Bonferroni correction for variable identifier counts and a power analysis showing >80% power for functions with as few as 4 identifiers. We will add a brief clause to the abstract referencing the chi-squared test on naming frequencies while retaining conciseness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper proposes a prompt-guided framework that induces identifier/comment patterns via input instructions, then applies independent statistical tests for detection and refines via iterative feedback on those scores. Experiments are reported on external benchmarks (MBPP, HumanEval) measuring detectability and code correctness separately. No self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described method. Detection is presented as external statistical tests rather than defined by the watermarking process itself. This matches the default case of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters, axioms, or invented entities are stated in the text.

axioms (1)
  • domain assumption LLMs respond to prompt instructions by producing consistent subtle naming patterns that statistical tests can later isolate without harming code functionality.
    This premise underpins both the embedding step and the claim that detectability is preserved across code lengths.

pith-pipeline@v0.9.1-grok · 5717 in / 1196 out tokens · 35288 ms · 2026-06-26T16:42:25.681070+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 24 canonical work pages · 7 internal anchors

  1. [1]

    Who Wrote this Code? Watermarking for Code Generation

    Lee, Taehyun and Hong, Seokhee and Ahn, Jaewoo and Hong, Ilgee and Lee, Hwaran and Yun, Sangdoo and Shin, Jamin and Kim, Gunhee. Who Wrote this Code? Watermarking for Code Generation. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.268

  2. [2]

    The Chertoff Group , author=

    The Latest Cybersecurity Executive Order: Implications and Guidance , url=. The Chertoff Group , author=. 2025 , month=

  3. [3]

    2024 , note =

    ChatGPT , author =. 2024 , note =

  4. [4]

    2025 , month =

    Jing Gu , title =. 2025 , month =

  5. [5]

    AI-Generated Text Detection: A Comprehensive Review of Active and Passive Approaches , journal =

    Xiang, Lingyun and Li, Nian and Liu, Yuling and Hu, Jiayong , year =. AI-Generated Text Detection: A Comprehensive Review of Active and Passive Approaches , journal =

  6. [6]

    arXiv preprint , year=

    CODE ACROSTIC: Robust Watermarking for Code Generation , author=. arXiv preprint , year=

  7. [7]

    CLASP: Training-Free LLM-Assisted Source Code Watermarking via Semantic-Preserving Transformations

    Large Language Models Are Effective Code Watermarkers , author=. arXiv preprint arXiv:2510.11251 , year=

  8. [8]

    2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) , pages=

    Is This You, LLM? Recognizing AI-written Programs with Multilingual Code Stylometry , author=. 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) , pages=

  9. [9]

    arXiv preprint , year=

    I Know Which LLM Wrote Your Code Last Summer: LLM Generated Code Stylometry for Authorship Attribution , author=. arXiv preprint , year=

  10. [10]

    arXiv preprint , year=

    Detecting LLM-generated Code with Subtle Modification by Adversarial Training , author=. arXiv preprint , year=

  11. [11]

    arXiv preprint , year=

    Detection of LLM-Generated Java Code Using Discretized Nested Bigrams , author=. arXiv preprint , year=

  12. [12]

    International Conference on Software Quality, Reliability and Security , year=

    Towards Improving Multiple Authorship Attribution of Source Code , author=. International Conference on Software Quality, Reliability and Security , year=

  13. [13]

    2025 IEEE Symposium on Security and Privacy (SP) , pages=

    Sok: Watermarking for ai-generated content , author=. 2025 IEEE Symposium on Security and Privacy (SP) , pages=. 2025 , organization=

  14. [14]

    arXiv preprint , year=

    Distinguishing LLM-Generated from Human-Written Code by Contrastive Learning , author=. arXiv preprint , year=

  15. [15]

    Engineering Applications of Artificial Intelligence , volume=

    Detecting code paraphrased by large language models using coding style features , author=. Engineering Applications of Artificial Intelligence , volume=

  16. [16]

    International Journal of Computer Applications Technology and Research , volume=

    Developing standardized metadata protocols enabling transparent provenance tracking for ai-created media , author=. International Journal of Computer Applications Technology and Research , volume=

  17. [17]

    ICML 2025 Workshop on Reliable and Responsible Foundation Models , year=

    In-Context Watermarks for Large Language Models , author=. ICML 2025 Workshop on Reliable and Responsible Foundation Models , year=

  18. [18]

    2022 , eprint=

    Training language models to follow instructions with human feedback , author=. 2022 , eprint=

  19. [19]

    Optimizing LLMs for Code Generation: Which Hyperparameter Settings Yield the Best Results? , doi =

    Arora, Chetan and Sayeed, Ahnaf and Licorish, Sherlock and Wang, Fanyu and Treude, Christoph , year =. Optimizing LLMs for Code Generation: Which Hyperparameter Settings Yield the Best Results? , doi =

  20. [20]

    arXiv preprint arXiv:2505.09666 , year=

    System Prompt Optimization with Meta-Learning , author=. arXiv preprint arXiv:2505.09666 , year=

  21. [21]

    2024 , eprint=

    Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity , author=. 2024 , eprint=

  22. [22]

    Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis , author=

  23. [23]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

  24. [24]

    Comparative Analysis of Large Language Models in Source Code Analysis , isbn =

    Erdoğan, Hüseyin and Turhan Turan, Nezihe and Onan, Aytug , year =. Comparative Analysis of Large Language Models in Source Code Analysis , isbn =

  25. [25]

    Proceedings of the 31st International Conference on Computational Linguistics , pages=

    Post-hoc watermarking for robust detection in text generated by large language models , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=

  26. [26]

    arXiv preprint arXiv:2402.07518 , year=

    ACW: Enhancing traceability of AI-generated codes based on watermarking , author=. arXiv preprint arXiv:2402.07518 , year=

  27. [27]

    Psychonomic bulletin & review , volume=

    Zipf’s word frequency law in natural language: A critical review and future directions , author=. Psychonomic bulletin & review , volume=. 2014 , publisher=

  28. [28]

    arXiv preprint arXiv:2502.02068 , year=

    Robust and secure code watermarking for large language models via ml/crypto codesign , author=. arXiv preprint arXiv:2502.02068 , year=

  29. [29]

    and Thomborson, C

    Collberg, C.S. and Thomborson, C. , year=. Watermarking, tamper-proofing, and obfuscation - tools for software protection , volume=. IEEE Transactions on Software Engineering , publisher=. doi:https://doi.org/10.1109/tse.2002.1027797 , number=

  30. [30]

    Advances in neural information processing systems , volume=

    Generating steganographic images via adversarial training , author=. Advances in neural information processing systems , volume=

  31. [31]

    Advances in Neural Information Processing Systems , volume=

    Audiomarkbench: Benchmarking robustness of audio watermarking , author=. Advances in Neural Information Processing Systems , volume=

  32. [32]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Tracing text provenance via context-aware lexical substitution , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  33. [33]

    Claude-3 Model Card , volume=

    The claude 3 model family: Opus, sonnet, haiku , author=. Claude-3 Model Card , volume=

  34. [34]

    Usage of Large Language Model for Code Generation Tasks: A Review , volume=

    Bistarelli, Stefano and Fiore, Marco and Mercanti, Ivan and Mongiello, Marina , year=. Usage of Large Language Model for Code Generation Tasks: A Review , volume=. SN Computer Science , publisher=. doi:https://doi.org/10.1007/s42979-025-04241-5 , number=

  35. [35]

    arXiv preprint arXiv:2502.18851 , year=

    Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code , author=. arXiv preprint arXiv:2502.18851 , year=

  36. [36]

    2024 58th Asilomar Conference on Signals, Systems, and Computers , pages=

    Watermarking Large Language Models and the Generated Content: Opportunities and Challenges , author=. 2024 58th Asilomar Conference on Signals, Systems, and Computers , pages=. 2024 , organization=

  37. [37]

    Program Synthesis with Large Language Models

    Program Synthesis with Large Language Models , author=. arXiv preprint arXiv:2108.07732 , year=

  38. [38]

    Evaluating Large Language Models Trained on Code

    Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

  39. [39]

    Is Your Code Generated by Chat

    Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming , booktitle =. Is Your Code Generated by Chat. 2023 , url =

  40. [40]

    arXiv preprint arXiv:2308.01861 , year=

    Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation , author=. arXiv preprint arXiv:2308.01861 , year=

  41. [41]

    ACM Comput

    Liu, Aiwei and Pan, Leyi and Lu, Yijian and Li, Jingjing and Hu, Xuming and Zhang, Xi and Wen, Lijie and King, Irwin and Xiong, Hui and Yu, Philip , title =. ACM Comput. Surv. , month = nov, articleno =. 2024 , issue_date =. doi:10.1145/3691626 , abstract =

  42. [42]

    International Conference on Machine Learning , pages=

    A watermark for large language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  43. [43]

    arXiv preprint arXiv:2306.17439 , year=

    Provable robust watermarking for ai-generated text , author=. arXiv preprint arXiv:2306.17439 , year=

  44. [44]

    arXiv preprint arXiv:2307.16230 , year=

    An unforgeable publicly verifiable watermark for large language models , author=. arXiv preprint arXiv:2307.16230 , year=

  45. [45]

    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

    Postmark: A robust blackbox watermark for large language models , author=. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages=

  46. [46]

    arXiv preprint arXiv:2307.15992 , year=

    Towards codable watermarking for injecting multi-bits information to LLMs , author=. arXiv preprint arXiv:2307.15992 , year=

  47. [47]

    C ode IP : A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

    Guan, Batu and Wan, Yao and Bi, Zhangqian and Wang, Zheng and Zhang, Hongyu and Zhou, Pan and Sun, Lichao. C ode IP : A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.541

  48. [48]

    SrcMarker: Dual-Channel Source Code Watermarking via Scalable Code Transformations , year=

    Yang, Borui and Li, Wei and Xiang, Liyao and Li, Bo , booktitle=. SrcMarker: Dual-Channel Source Code Watermarking via Scalable Code Transformations , year=

  49. [49]

    arXiv preprint arXiv:2506.20926 , year=

    CodeGuard: A Generalized and Stealthy Backdoor Watermarking for Generative Code Models , author=. arXiv preprint arXiv:2506.20926 , year=

  50. [50]

    arXiv preprint arXiv:2411.05091 , year=

    Watermarking language models through language models , author=. arXiv preprint arXiv:2411.05091 , year=

  51. [51]

    A Survey on Large Language Models for Code Generation

    A survey on large language models for code generation , author=. arXiv preprint arXiv:2406.00515 , year=

  52. [52]

    DiPmark: A Stealthy, Efficient and Resilient Watermark for

    Zhi-Yi Wu and Ling-Liang Wu and Jichuan Guan and Jihong Guan and Shuigeng Zhou , booktitle=. DiPmark: A Stealthy, Efficient and Resilient Watermark for. 2024 , url=

  53. [53]

    Proceedings of the 4th Annual Symposium on Human-Computer Interaction for Work , pages=

    Envisioning the Future of Peer Review: Investigating LLM-Assisted Reviewing Using ChatGPT as a Case Study , author=. Proceedings of the 4th Annual Symposium on Human-Computer Interaction for Work , pages=

  54. [54]

    The Thirteenth International Conference on Learning Representations , year=

    Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models , author=. The Thirteenth International Conference on Learning Representations , year=

  55. [55]

    Is In-Context Learning Sufficient for Instruction Following in

    Hao Zhao and Maksym Andriushchenko and Francesco Croce and Nicolas Flammarion , booktitle=. Is In-Context Learning Sufficient for Instruction Following in. 2025 , url=

  56. [56]

    INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH , author=

    Weighted Estimator Of Population Mean Under Stratified Random Sampling , volume=. INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH , author=. 2020 , pages=

  57. [57]

    The American Statistician , volume=

    Approximate is better than “exact” for interval estimation of binomial proportions , author=. The American Statistician , volume=. 1998 , publisher=

  58. [58]

    Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

    Application of volcano plots in analyses of mRNA differential expressions with microarrays , author=. arXiv preprint arXiv:1103.3434 , year=

  59. [59]

    arXiv preprint arXiv:2408.01354 , year=

    MCGMark: An Encodable and Robust Online Watermark for Tracing LLM-Generated Malicious Code , author=. arXiv preprint arXiv:2408.01354 , year=

  60. [60]

    , title =

    Knuth, Donald E. , title =. 1997 , publisher =

  61. [61]

    2020 , eprint=

    CodeBLEU: a Method for Automatic Evaluation of Code Synthesis , author=. 2020 , eprint=

  62. [62]

    2026 , month = feb, day =

  63. [63]

    P ˘as˘areanu, and Guowei Yang

    Enhancing llm code generation with ensembles: A similarity-based selection approach , author=. arXiv preprint arXiv:2503.15838 , year=

  64. [64]

    AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

    Agentcoder: Multi-agent-based code generation with iterative testing and optimisation , author=. arXiv preprint arXiv:2312.13010 , year=

  65. [65]

    Quality Control Handbook , publisher=

    The Pareto principle in quality control , author=. Quality Control Handbook , publisher=

  66. [66]

    ChatGPT (GPT-5.2 version) , year =

  67. [67]

    2025 , month = dec, day =

    OpenAI , title =. 2025 , month = dec, day =

  68. [68]

    GitHub repository , url =

    Istiaq Ahmed Fahad , title =. GitHub repository , url =. 2026 , publisher =