pith. sign in

arxiv: 2508.11925 · v3 · pith:RPSSJYZTnew · submitted 2025-08-16 · 💻 cs.CR · cs.CL· cs.LG

Optimizing Token Choice for Code Watermarking: An RL Approach

classification 💻 cs.CR cs.CLcs.LG
keywords codecodetracerwatermarkingtokenapproachfunctionalitylearningwatermark
0
0 comments X
read the original abstract

Protecting intellectual property on LLM-generated code necessitates effective watermarking systems that can operate within code's highly structured, syntactically constrained nature. In this work, we introduce CodeTracer, an innovative adaptive code watermarking framework underpinned by a novel reinforcement learning training paradigm. At its core, CodeTracer features a policy-driven approach that utilizes a parameterized model to intelligently bias token choices during next-token prediction. This strategy ensures that embedded watermarks maintain code functionality while exhibiting subtle yet statistically detectable deviations from typical token distributions. To facilitate policy learning, we devise a comprehensive reward system that seamlessly integrates execution feedback with watermark embedding signals, balancing process-level and outcome-level rewards. Additionally, we employ Gumbel Top-k reparameterization to enable gradient-based optimization of discrete watermarking decisions. Extensive comparative evaluations demonstrate CodeTracer's significant superiority over state-of-the-art baselines in both watermark detectability and the preservation of generated code's functionality. Our code is available at https://github.com/TimeLovercc/CodeTracer.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.