arxiv: 2604.16001 · v1 · submitted 2026-04-17 · 💻 cs.CR

Recognition: unknown

MATRIX: Multi-Layer Code Watermarking via Dual-Channel Constrained Parity-Check Encoding

Yuqing Nie , Chong Wang , Guosheng Xu , Guoai Xu , Chenyu Wang , Haoyu Wang , Kailong Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:19 UTC · model grok-4.3

classification 💻 cs.CR

keywords code watermarkingLLM-generated codeparity-check encodingBCH error correctionsoftware provenancedual-channel embeddingrobust watermarking

0 comments

The pith

MATRIX embeds multi-layer watermarks in code by solving constrained parity-check matrix equations via dual channels of variable names and semantic transformations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MATRIX as a way to encode watermark information by finding solutions to parity-check matrix equations that are constrained to keep the original code behavior unchanged. It applies this encoding through two separate channels, one based on controlled variable renaming and the other on semantic-preserving code changes, with BCH error-correction codes added to maintain detectability after modifications. Experiments on Python programs generated by multiple code large language models report 99.20 percent average detection accuracy, functionality loss between 0 and 0.14 percent, robustness gains of 7.70 to 26.67 percent against attacks, and two to six times greater applicability than prior single-layer schemes. A reader would care because machine-generated code increasingly requires reliable provenance tracking for copyright enforcement and security without breaking existing programs. The multi-layer design also supports needs such as version tracking and multi-party attribution that single-channel methods cannot handle.

Core claim

MATRIX reduces watermark encoding to the task of solving constrained parity-check matrix equations, where the constraints ensure that the resulting code remains functionally identical to the input. Dual-channel embedding occurs by carrying watermark bits both in systematic variable renaming rules and in a set of semantic-preserving transformations, with BCH codes and solution-space diversity supplying error tolerance against removal attempts. This formulation yields a multi-layer scheme that provides mutual backup between channels and covers a wider set of code instances than previous approaches.

What carries the argument

Constrained parity-check matrix equations that encode watermark bits while enforcing functionality-preserving constraints on the code, realized through dual channels of variable renaming and semantic-preserving transformations.

Load-bearing premise

The chosen semantic-preserving transformations and variable-renaming rules preserve full code functionality and introduce no new statistical patterns that realistic attackers could exploit to detect or remove the watermark.

What would settle it

A statistical test on a large collection of MATRIX-watermarked versus unmarked Python functions that shows detection accuracy falling below 90 percent under common attack models, or a measurement showing functionality changes exceeding 0.14 percent after watermark embedding.

Figures

Figures reproduced from arXiv: 2604.16001 by Chenyu Wang, Chong Wang, Guoai Xu, Guosheng Xu, Haoyu Wang, Kailong Wang, Yuqing Nie.

**Figure 2.** Figure 2: Overall Workflow of MATRIX points and their states in both channels, selects the corresponding parity-check matrices M, and verifies whether the extracted state vectors satisfy the parity-check equations used during insertion. If the verification succeeds, the embedded watermark sequence w is recovered, and the original identity message m is reconstructed. By simply changing the configuration of the parit… view at source ↗

**Figure 3.** Figure 3: Robustness evaluation results under Variable-rename [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Activation Frequency of MATRIX, Each curve represents a different watermark, with the y-axis showing the frequency of anchor activation (state 1) across all samples. analyze anchor activation patterns to infer anchor–bit mappings, enabling possible watermark removal or manipulation. We therefore evaluate MATRIX’s resistance to such statistical inference by comparing its activation patterns and intrawat… view at source ↗

**Figure 5.** Figure 5: Pairwise similarity heatmap of MATRIX Figure 6c illustrates the resilience of MATRIX under simultaneous dual-channel attacks. In this scenario, the adversary rewrites nearly all variable names in the formal channel and completely disrupts the natural-channel structures. Despite these extensive modifications, the resulting anchor state vector remains within the feasible solution space defined by the verif… view at source ↗

**Figure 6.** Figure 6: Case Study: The red parts are the attacked code, and the blue shaded areas are the watermarked code. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Code Large Language Models (Code LLMs) have revolutionized software development but raised critical concerns regarding code provenance, copyright protection, and security. Existing code watermarking approaches suffer from two fundamental limitations: black-box methods either exhibit detectable syntactic patterns vulnerable to statistical analysis or rely on implicit neural embedding behaviors that weaken interpretability, auditability, and precise control, while white-box methods lack code-aware capabilities that may compromise functionality. Moreover, current single-layer watermarking schemes fail to address increasingly complex provenance requirements such as multi-level attribution and version tracking. We present MATRIX, a novel code watermarking framework that formulates watermark encoding as solving constrained parity-check matrix equations. MATRIX employs dual-channel watermarking through variable naming and semantic-preserving transformations, enhancing watermark coverage across a wider range of code while ensuring mutual backup for robustness. By integrating BCH error-correction codes with solution space diversity, our approach achieves robustness against statistical analysis. Extensive evaluation on Python code generated by multiple Code LLMs demonstrates that MATRIX achieves an average watermark detection accuracy of 99.20% with minimal functionality loss (0-0.14%), improves robustness by 7.70-26.67% against various attacks, and increases watermarking applicability by 2-6x compared with existing methods. These results establish MATRIX as an effective solution for complex code provenance scenarios while balancing among detectability, fidelity, and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MATRIX gives a clean dual-channel formulation for code watermarking via BCH-constrained parity checks, but the headline performance numbers rest on unshown tests for semantic preservation and attack resistance.

read the letter

The paper's core move is to treat watermark insertion as finding solutions to a parity-check matrix equation while using two channels—variable names plus a set of semantic edits—to increase coverage and give mutual backup. That combination looks new relative to the single-layer schemes it cites, and folding in BCH codes for robustness is a straightforward engineering choice that could help against statistical attacks. The reported 99.2% detection with 0-0.14% functionality loss and 7-26% robustness lift would be useful if they hold, especially for provenance tracking in generated code. Credit to the authors for framing the problem around multi-level attribution rather than just binary detection. The abstract presents the method as an independent construction without obvious circularity in the claims. That said, the evaluation details are missing here: no protocol for the Python LLM outputs, no named baselines or attack implementations, and no statistical tests. The central assumption—that the chosen renames and transformations leave side effects, recursion, and library behavior untouched while avoiding new distributional signals—needs explicit verification, and the stress-test note on that point is fair. Without those checks, the applicability gains of 2-6x are hard to trust. This work is aimed at researchers and engineers building tools for code provenance and copyright in LLM outputs. A reader already working on watermarking or AI security would get value from the dual-channel idea and the parity-check framing, even if they end up re-implementing the experiments. It is coherent on its own terms and shows clear engagement with the literature, so it deserves a serious referee rather than a desk reject. Send it out, but expect the reviews to focus on the missing validation steps for functionality and attack models.

Referee Report

2 major / 1 minor

Summary. The paper presents MATRIX, a multi-layer code watermarking framework for Code LLMs that formulates watermark encoding as solving constrained parity-check matrix equations using BCH codes. It employs dual-channel watermarking via variable naming and semantic-preserving transformations to achieve broader coverage and mutual backup, claiming an average detection accuracy of 99.20%, functionality loss of 0-0.14%, robustness gains of 7.70-26.67% against attacks, and 2-6x higher applicability than prior methods on Python code.

Significance. If the empirical claims hold under rigorous verification, the work would be significant for code provenance and copyright protection, offering an interpretable alternative to black-box neural embeddings and white-box methods by combining error-correction codes with dual-channel edits for improved robustness and applicability in complex attribution scenarios.

major comments (2)

[Abstract] Abstract: The reported results (99.20% detection accuracy, 0-0.14% functionality loss, 7.70-26.67% robustness improvement) are stated without any experimental protocol details such as sample sizes, specific Code LLMs, attack models, baseline implementations, or statistical tests, making it impossible to judge whether the gains are supported by the data or reproducible.
[Method] The central claim that dual-channel edits (variable renaming plus semantic-preserving transformations) preserve exact code functionality while resisting statistical attacks relies on an unverified assumption; no formal argument or exhaustive testing across Python semantics (side effects, recursion, exception paths, library calls) is provided to support the reported 0-0.14% loss or distributional indistinguishability.

minor comments (1)

[Abstract] The abstract would benefit from a concise statement of the BCH code parameters and solution-space diversity mechanism to clarify how robustness against statistical analysis is achieved.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with point-by-point responses and indicate proposed changes to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The reported results (99.20% detection accuracy, 0-0.14% functionality loss, 7.70-26.67% robustness improvement) are stated without any experimental protocol details such as sample sizes, specific Code LLMs, attack models, baseline implementations, or statistical tests, making it impossible to judge whether the gains are supported by the data or reproducible.

Authors: We agree that the abstract omits protocol specifics due to length constraints. The full details appear in Section 4, covering evaluation on code from CodeLlama, StarCoder, and three additional models, with 1000+ samples per configuration, explicit attack implementations (paraphrasing, renaming, and transformation attacks), baseline comparisons, and statistical tests (t-tests with p < 0.01). We will revise the abstract to include a concise clause such as 'evaluated across 5000+ Python samples from five Code LLMs with statistical validation' to improve self-containment without exceeding typical abstract limits. revision: yes
Referee: [Method] The central claim that dual-channel edits (variable renaming plus semantic-preserving transformations) preserve exact code functionality while resisting statistical attacks relies on an unverified assumption; no formal argument or exhaustive testing across Python semantics (side effects, recursion, exception paths, library calls) is provided to support the reported 0-0.14% loss or distributional indistinguishability.

Authors: We acknowledge that a complete formal proof of semantic equivalence is intractable for Python. Our transformations follow established refactoring rules that preserve data flow and control flow, as justified in Section 3.2 with references to prior semantic-preserving techniques. Empirical validation in Section 4.3 and Appendix B includes test suites covering recursion, exceptions, side effects, and library calls, with functionality checked via execution equivalence on held-out inputs; the 0-0.14% loss rate reflects rare cases requiring minimal adjustments. We will add a dedicated paragraph in the Method section discussing the scope of preservation and expand the appendix with additional edge-case examples to strengthen this evidence. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical results independent of definitional inputs

full rationale

The paper presents MATRIX as a construction that formulates watermark encoding via constrained parity-check matrix equations, dual-channel variable naming plus semantic-preserving transformations, and BCH integration for robustness. Reported metrics (99.20% detection accuracy, 0-0.14% functionality loss, robustness gains) are explicitly attributed to extensive evaluation on generated Python code from multiple Code LLMs rather than any self-referential fitting, parameter renaming, or equation that reduces the claimed performance to quantities defined by the same experiments. No equations, self-citations, or ansatzes are exhibited in the provided text that would make the central claims equivalent to their inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that BCH codes and solution-space diversity can be applied to code transformations without breaking semantics.

axioms (1)

domain assumption Semantic-preserving transformations exist that leave code functionality unchanged while allowing watermark embedding.
Required for the dual-channel claim to hold.

pith-pipeline@v0.9.0 · 5568 in / 1325 out tokens · 50057 ms · 2026-05-10T08:19:12.924440+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 19 canonical work pages · 4 internal anchors

[1]

Code Llama: Open Foundation Models for Code

B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, T. Remez, J. Rapinet al., “Code llama: Open foundation models for code,”arXiv preprint arXiv:2308.12950, 2023

work page internal anchor Pith review arXiv 2023
[2]

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . Liet al., “Deepseek-coder: When the large language model meets programming–the rise of code intelligence,”arXiv preprint arXiv:2401.14196, 2024

work page internal anchor Pith review arXiv 2024
[3]

A systematic evaluation of large language models of code,

F. F. Xu, U. Alon, G. Neubig, and V . J. Hellendoorn, “A systematic evaluation of large language models of code,” inProceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, 2022, pp. 1–10

2022
[4]

Starcoder 2 and the stack v2: The next generation,

A. Lozhkov, R. Li, L. B. Allal, F. Cassano, J. Lamy-Poirier, N. Tazi, A. Tang, D. Pykhtar, J. Liu, Y . Wei, T. Liu, M. Tian, D. Kocetkov, A. Zucker, Y . Belkada, Z. Wang, Q. Liu, D. Abulkhanov, I. Paul, Z. Li, W.-D. Li, M. Risdal, J. Li, J. Zhu, T. Y . Zhuo, E. Zheltonozhskii, N. O. O. Dade, W. Yu, L. Krauß, N. Jain, Y . Su, X. He, M. Dey, E. Abati, Y . C...

2024
[5]

An empirical comparison of pre-trained models of source code,

C. Niu, C. Li, V . Ng, D. Chen, J. Ge, and B. Luo, “An empirical comparison of pre-trained models of source code,” in2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 2136–2148

2023
[6]

Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources,

X. Zhou, K. Kim, B. Xu, D. Han, and D. Lo, “Out of sight, out of mind: Better automatic vulnerability repair by broadening input ranges and sources,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1–13

2024
[7]

Isolating compiler bugs by generating effective witness programs with large language models,

H. Tu, Z. Zhou, H. Jiang, I. N. B. Yusuf, Y . Li, and L. Jiang, “Isolating compiler bugs by generating effective witness programs with large language models,”arXiv preprint arXiv:2307.00593, 2023

work page arXiv 2023
[8]

Expectation vs. experi- ence: Evaluating the usability of code generation tools powered by large language models,

P. Vaithilingam, T. Zhang, and E. L. Glassman, “Expectation vs. experi- ence: Evaluating the usability of code generation tools powered by large language models,” inChi conference on human factors in computing systems extended abstracts, 2022, pp. 1–7

2022
[9]

Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,

C. W. Tan, S. Guo, M. F. Wong, and C. N. Hang, “Copilot for xcode: exploring ai-assisted programming by prompting cloud-based large language models,”arXiv preprint arXiv:2307.14349, 2023

work page arXiv 2023
[10]

Using an llm to help with code understanding,

D. Nam, A. Macvean, V . Hellendoorn, B. Vasilescu, and B. Myers, “Using an llm to help with code understanding,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1–13

2024
[11]

Ai coders are among us: Rethinking programming language grammar towards efficient code generation,

Z. Sun, X. Du, Z. Yang, L. Li, and D. Lo, “Ai coders are among us: Rethinking programming language grammar towards efficient code generation,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 1124–1136

2024
[12]

Teaching code llms to use autocompletion tools in repository-level code generation,

C. Wang, J. Zhang, Y . Feng, T. Li, W. Sun, Y . Liu, and X. Peng, “Teaching code llms to use autocompletion tools in repository-level code generation,”arXiv preprint arXiv:2401.06391, 2024

work page arXiv 2024
[13]

How novices use llm-based code generators to solve cs1 coding tasks in a self-paced learning environment,

M. Kazemitabaar, X. Hou, A. Henley, B. J. Ericson, D. Weintrop, and T. Grossman, “How novices use llm-based code generators to solve cs1 coding tasks in a self-paced learning environment,” inProceedings of the 23rd Koli calling international conference on computing education research, 2023, pp. 1–12

2023
[14]

Coprotector: Protect open- source code against unauthorized training usage with data poisoning,

Z. Sun, X. Du, F. Song, M. Ni, and L. Li, “Coprotector: Protect open- source code against unauthorized training usage with data poisoning,” inProceedings of the ACM Web Conference 2022, 2022, pp. 652–660

2022
[15]

{CodexLeaks}: Privacy leaks from code generation language models in{GitHub}copilot,

L. Niu, S. Mirza, Z. Maradni, and C. P ¨opper, “{CodexLeaks}: Privacy leaks from code generation language models in{GitHub}copilot,” in 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 2133–2150

2023
[16]

Targeted attack on gpt- neo for the satml language model data extraction challenge,

A. Al-Kaswan, M. Izadi, and A. van Deursen, “Targeted attack on gpt- neo for the satml language model data extraction challenge,”arXiv preprint arXiv:2302.07735, 2023

work page arXiv 2023
[17]

Your code secret belongs to me: Neural code completion tools can memorize hard-coded credentials,

Y . Huang, Y . Li, W. Wu, J. Zhang, and M. R. Lyu, “Your code secret belongs to me: Neural code completion tools can memorize hard-coded credentials,”Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 2515–2537, 2024

2024
[18]

How secure is code generated by chatgpt?

R. Khoury, A. R. Avila, J. Brunelle, and B. M. Camara, “How secure is code generated by chatgpt?” in2023 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, 2023, pp. 2445–2451

2023
[19]

Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation,

J. Liu, C. S. Xia, Y . Wang, and L. Zhang, “Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation,”Advances in Neural Information Processing Systems, vol. 36, pp. 21 558–21 572, 2023

2023
[20]

The threat of offensive ai to organizations,

Y . Mirsky, A. Demontis, J. Kotak, R. Shankar, D. Gelei, L. Yang, X. Zhang, M. Pintor, W. Lee, Y . Eloviciet al., “The threat of offensive ai to organizations,”Computers & Security, vol. 124, p. 103006, 2023

2023
[21]

Opwnai: Cybercriminals starting to use chatgpt,

C. Point, “Opwnai: Cybercriminals starting to use chatgpt,”Check Point. Retrieved May, vol. 15, p. 2023, 2023

2023
[22]

Temporary policy: Chatgpt is banned,

OpenAI, “Temporary policy: Chatgpt is banned,” https: //meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt- is-banned, 2023, accessed: 2025-07-05

2023
[23]

Provable robust watermarking for ai- generated text.arXiv preprint arXiv:2306.17439, 2023

X. Zhao, P. Ananth, L. Li, and Y .-X. Wang, “Provable robust water- marking for ai-generated text,”arXiv preprint arXiv:2306.17439, 2023. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

work page arXiv 2023
[24]

Context-aware watermark with semantic balanced green-red lists for large language models,

Y . Guo, Z. Tian, Y . Song, T. Liu, L. Ding, and D. Li, “Context-aware watermark with semantic balanced green-red lists for large language models,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 22 633–22 646

2024
[25]

Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method,

C.-Y . Chang and S. Clark, “Practical linguistic steganography using contextual synonym substitution and a novel vertex coding method,” Computational linguistics, vol. 40, no. 2, pp. 403–448, 2014

2014
[26]

Watme: Towards lossless watermarking through lexical redundancy,

L. Chen, Y . Bian, Y . Deng, D. Cai, S. Li, P. Zhao, and K.-F. Wong, “Watme: Towards lossless watermarking through lexical redundancy,” arXiv preprint arXiv:2311.09832, 2023

work page arXiv 2023
[27]

Large language models for code: Security hardening and adversarial testing,

J. He and M. Vechev, “Large language models for code: Security hardening and adversarial testing,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 1865–1879

2023
[28]

A survey of digital watermarking tech- niques, applications and attacks,

P. Singh and R. S. Chadha, “A survey of digital watermarking tech- niques, applications and attacks,”International Journal of Engineering and Innovative Technology (IJEIT), vol. 2, no. 9, pp. 165–175, 2013

2013
[29]

Hidden: Hiding data with deep networks,

J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei, “Hidden: Hiding data with deep networks,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 657–672

2018
[30]

A survey of text watermarking in the era of large language models,

A. Liu, L. Pan, Y . Lu, J. Li, X. Hu, X. Zhang, L. Wen, I. King, H. Xiong, and P. Yu, “A survey of text watermarking in the era of large language models,”ACM Computing Surveys, vol. 57, no. 2, pp. 1–36, 2024

2024
[31]

A survey on detection of llms-generated content,

X. Yang, L. Pan, X. Zhao, H. Chen, L. Petzold, W. Y . Wang, and W. Cheng, “A survey on detection of llms-generated content,”arXiv preprint arXiv:2310.15654, 2023

work page arXiv 2023
[32]

Protecting intellectual property of large language model-based code generation apis via watermarks,

Z. Li, C. Wang, S. Wang, and C. Gao, “Protecting intellectual property of large language model-based code generation apis via watermarks,” in Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 2336–2350

2023
[33]

Natural attack for pre-trained models of code,

Z. Yang, J. Shi, J. He, and D. Lo, “Natural attack for pre-trained models of code,” inProceedings of the 44th International Conference on Software Engineering, 2022, pp. 1482–1493

2022
[34]

Misleading authorship attribution of source code using adversarial learning,

E. Quiring, A. Maier, and K. Rieck, “Misleading authorship attribution of source code using adversarial learning,” in28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 479–496

2019
[35]

Learning natural cod- ing conventions,

M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, “Learning natural cod- ing conventions,” inProceedings of the 22nd acm sigsoft international symposium on foundations of software engineering, 2014, pp. 281–293

2014
[36]

A theory of dual channel constraints,

C. Casalnuovo, E. T. Barr, S. K. Dash, P. Devanbu, and E. Morgan, “A theory of dual channel constraints,” inProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results, 2020, pp. 25–28

2020
[37]

G. C. Clark Jr and J. B. Cain,Error-correction coding for digital communications. Springer Science & Business Media, 1981

1981
[38]

Lin and J

S. Lin and J. Li,Fundamentals of Classical and Modern Error- Correcting Codes. Cambridge University Press, 2021

2021
[39]

Accessed: 2025-07-10

(2025) Replication package. Accessed: 2025-07-10. [Online]. Available: https://anonymous.4open.science/r/DCW-324A/

2025
[40]

Adversarial watermarking transformer: Towards tracing text provenance with data hiding,

S. Abdelnabi and M. Fritz, “Adversarial watermarking transformer: Towards tracing text provenance with data hiding,” in2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021, pp. 121–140

2021
[41]

Acw: Enhancing traceability of ai-generated codes based on watermarking,

B. Li, M. Zhang, P. Zhang, J. Sun, X. Wang, and Z. Fu, “Acw: Enhancing traceability of ai-generated codes based on watermarking,” arXiv preprint arXiv:2402.07518, 2024

work page arXiv 2024
[42]

Towards tracing code provenance with code watermarking,

W. Li, B. Yang, Y . Sun, S. Chen, Z. Song, L. Xiang, X. Wang, and C. Zhou, “Towards tracing code provenance with code watermarking,” arXiv preprint arXiv:2305.12461, 2023

work page arXiv 2023
[43]

Srcmarker: Dual-channel source code watermarking via scalable code transformations,

B. Yang, W. Li, L. Xiang, and B. Li, “Srcmarker: Dual-channel source code watermarking via scalable code transformations,” in2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2024, pp. 4088–4106

2024
[44]

Robust and secure code watermarking for large language models via ml/crypto codesign,

R. Zhang, N. Javidnia, N. Sheybani, and F. Koushanfar, “Robust and secure code watermarking for large language models via ml/crypto codesign,”arXiv preprint arXiv:2502.02068, 2025

work page arXiv 2025
[45]

Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,

T. Lee, S. Hong, J. Ahn, I. Hong, H. Lee, S. Yun, J. Shin, and G. Kim, “Who wrote this code? watermarking for code generation,” arXiv preprint arXiv:2305.15060, 2023

work page arXiv 2023
[46]

Codeip: A grammar-guided multi-bit watermark for large language models of code,

B. Guan, Y . Wan, Z. Bi, Z. Wang, H. Zhang, P. Zhou, and L. Sun, “Codeip: A grammar-guided multi-bit watermark for large language models of code,”arXiv preprint arXiv:2404.15639, 2024

work page arXiv 2024
[47]

A watermark for low-entropy and unbiased generation in large language models,

M. Mao, D. Wei, Z. Chen, X. Fang, and M. Chau, “A watermark for low-entropy and unbiased generation in large language models,”arXiv preprint arXiv:2405.14604, 2024

work page arXiv 2024
[48]

Marking code without breaking it: Code watermarking for detecting llm-generated code,

J. Kim, S. Park, and Y .-S. Han, “Marking code without breaking it: Code watermarking for detecting llm-generated code,”arXiv preprint arXiv:2502.18851, 2025

work page arXiv 2025
[49]

Mcgmark: An encodable and robust online watermark for llm-generated malicious code,

K. Ning, J. Chen, Q. Zhong, T. Zhang, Y . Wang, W. Li, Y . Zhang, W. Zhang, and Z. Zheng, “Mcgmark: An encodable and robust online watermark for llm-generated malicious code,”arXiv preprint arXiv:2408.01354, 2024

work page arXiv 2024
[50]

A watermark for large language models,

J. Kirchenbauer, J. Geiping, Y . Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 17 061–17 084

2023
[51]

[Online]

OpenAI, “Gpt-4,” 2023. [Online]. Available: https://openai.com/ research/gpt-4

2023
[52]

Program synthesis with large language models,

J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Leet al., “Program synthesis with large language models,” 2021. [Online]. Available: https://github.com/google-research/google-research/tree/master/mbpp

2021
[53]

Measuring coding challenge competence with apps,

D. Hendrycks, S. Basart, S. Kadavath, M. Mazeika, A. Arora, E. Guo, C. Burns, S. Puranik, H. He, D. Song, and J. Steinhardt, “Measuring coding challenge competence with apps,”NeurIPS, 2021

2021
[54]

Evaluating large language models trained on code,

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Her...

2021
[55]

StarCoder: may the source be with you!

R. Li, L. B. Allal, Y . Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chimet al., “Starcoder: may the source be with you!”arXiv preprint arXiv:2305.06161, 2023

work page internal anchor Pith review arXiv 2023
[56]

pyrefact,

O. Lindgren, “pyrefact,” https://github.com/olle-lindgren/pyrefact, 2023, accessed: 2025-11-10

2023
[57]

Chatgpt: Optimizing language models for dialogue,

OpenAI, “Chatgpt: Optimizing language models for dialogue,” https: //openai.com/blog/chatgpt, 2023, accessed: 2025-07-14

2023
[58]

Black: The uncompromising python code formatter,

Łukasz Langa and the Black team, “Black: The uncompromising python code formatter,” https://github.com/psf/black, 2018, accessed: 2025-07- 13

2018
[59]

Exception handling- based dynamic software watermarking,

Y . Wang, D. Gong, B. Lu, F. Xiang, and F. Liu, “Exception handling- based dynamic software watermarking,”IEEE Access, vol. 6, pp. 8882– 8889, 2018

2018
[60]

Xmark: dynamic software watermarking using collatz conjecture,

H. Ma, C. Jia, S. Li, W. Zheng, and D. Wu, “Xmark: dynamic software watermarking using collatz conjecture,”IEEE Transactions on Information Forensics and Security, vol. 14, no. 11, pp. 2859–2874, 2019

2019
[61]

Hidden path: dynamic software water- marking based on control flow obfuscation,

Z. Chen, C. Jia, and D. Xu, “Hidden path: dynamic software water- marking based on control flow obfuscation,” in2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), vol. 2. IEEE, 2017, pp. 443–450

2017
[62]

Software plagiarism detection with birthmarks based on dynamic key instruction sequences,

Z. Tian, Q. Zheng, T. Liu, M. Fan, E. Zhuang, and Z. Yang, “Software plagiarism detection with birthmarks based on dynamic key instruction sequences,”IEEE Transactions on Software Engineering, vol. 41, no. 12, pp. 1217–1235, 2015

2015
[63]

Function level con- trol flow obfuscation for software security,

V . Balachandran, N. W. Keong, and S. Emmanuel, “Function level con- trol flow obfuscation for software security,” in2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems. IEEE, 2014, pp. 133–140

2014
[64]

Software watermarking for java program based on method name encoding,

J. Chen, K. Li, W. Wen, W. Chen, and C. Yan, “Software watermarking for java program based on method name encoding,” inProceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. Springer, 2018, pp. 865–874

2017
[65]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

K. Cho, B. Van Merri ¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,”arXiv preprint arXiv:1406.1078, 2014

work page internal anchor Pith review arXiv 2014
[66]

Softmark: Software water- marking via a binary function relocation,

H. Kang, Y . Kwon, S. Lee, and H. Koo, “Softmark: Software water- marking via a binary function relocation,” inProceedings of the 37th Annual Computer Security Applications Conference, 2021, pp. 169–181

2021
[67]

A practical method for watermarking java programs,

A. Monden, H. Iida, K.-i. Matsumoto, K. Inoue, and K. Torii, “A practical method for watermarking java programs,” inProceedings 24th Annual International Computer Software and Applications Conference. COMPSAC2000. IEEE, 2000, pp. 191–197

2000