Constrained Code Generation with Discrete Diffusion

Ferdinando Fioretto; Lize Shao; Michael Cardei; Wenxi Wang; Zichen Xie

arxiv: 2605.16829 · v1 · pith:XFSFPFTMnew · submitted 2026-05-16 · 💻 cs.CL · cs.PL

Constrained Code Generation with Discrete Diffusion

Lize Shao , Michael Cardei , Zichen Xie , Ferdinando Fioretto , Wenxi Wang This is my paper

Pith reviewed 2026-05-19 21:20 UTC · model grok-4.3

classification 💻 cs.CL cs.PL

keywords constrained code generationdiscrete diffusionneurosymbolic inferencecode synthesisconstraint satisfactionprogram generationdenoising operators

0 comments

The pith

Constrained Diffusion for Code augments discrete diffusion samplers with optimization-driven operators that steer denoising toward programs satisfying functional, security, and syntax constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how discrete diffusion models for code can enforce program-level constraints during generation by intervening at each denoising step, where the full intermediate program state is visible. It does so through a training-free addition of constraint-aware operators that use program analysis to spot relevant regions and mathematical optimization to make local adjustments to the trajectory. This matters to readers who generate code because it promises higher rates of correct and secure outputs without retraining the underlying model or applying heavy post-hoc fixes. The approach keeps changes localized and uses less corrective effort than baselines while preserving closeness to the original model's distribution.

Core claim

CDC augments the base discrete diffusion sampler with constraint-aware denoising operators that combine mathematical optimization with program analysis to identify constraint-relevant regions of the intermediate program state and locally adjust the denoising trajectory, steering generation toward feasible programs while remaining close to the base model. Across code generation benchmarks, CDC consistently improves constraint satisfaction in functional correctness, security, and even syntax, outperforming discrete diffusion and autoregressive baselines with less corrective computation and more localized edits.

What carries the argument

constraint-aware denoising operators that combine mathematical optimization with program analysis to identify constraint-relevant regions and locally adjust the denoising trajectory

If this is right

Constraint satisfaction rises for functional correctness, security properties, and syntax on code-generation benchmarks.
The method outperforms both plain discrete diffusion and autoregressive baselines while using fewer corrective steps.
Edits remain localized and the generated programs stay close to the base model's output distribution.
Constraints can be enforced at the full-program level during iterative refinement rather than only at the end.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local-adjustment idea could transfer to diffusion models for other structured outputs such as molecule design or formal proofs where intermediate states are also exposed.
Because the operators act only on selected regions, the technique may scale to longer programs or more numerous constraints without a proportional increase in cost.
Future experiments could test whether the same operators improve performance when the base diffusion model itself was trained on constrained data.

Load-bearing premise

Program analysis can reliably locate constraint-relevant regions inside noisy or partially denoised intermediate program states, and the resulting local optimization adjustments will increase constraint satisfaction without lowering overall sample quality or needing model retraining.

What would settle it

Applying CDC to standard code-generation benchmarks and measuring no rise in the fraction of outputs that pass functional tests, security checks, or syntax validation relative to the unmodified discrete diffusion sampler.

Figures

Figures reproduced from arXiv: 2605.16829 by Ferdinando Fioretto, Lize Shao, Michael Cardei, Wenxi Wang, Zichen Xie.

**Figure 1.** Figure 1: CDC vs. other diffusion code baselines on HumanEval-X (HE-X), MBPP, CWEval, and LLMSecEval+. ∗Equal contribution. †Equal senior-author contribution. arXiv:2605.16829v1 [cs.CL] 16 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of CDC. At inference time, sampling starts from the fully masked sequence xT and repeatedly applies Eq. 2 until it obtains the generated sequence x0. 5 Constrained Diffusion for Code Generation The formulation above motivates Constrained Diffusion for Code (CDC): at each timestep, the denoiser proposes a full clean program distribution xˆ (t) 0 , creating a natural point to evaluate, localize, and… view at source ↗

**Figure 3.** Figure 3: Edited tokens per correction attempt (fewer means higher efficiency): (a) functionality corrections on [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Efficiency and locality of CDC vs. AR+Reprompt [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: CWEval efficiency comparison between AR re-prompting and MDFI. MDFI lowers pipeline token cost, [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Per-language efficiency means on CWEval. MDFI substantially reduces edited tokens, edit span, and edit [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Component and localization-choice ablation of CDC on HumanEval-X C++ ( [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: MDFI scope ablation. (a) Insertion amount K on CWEval and LLMSecEval+; func-sec@1 plateaus at K ∈[8, 12]. (b) Remasking neighborhood scope on CWEval; the deployed Parent+Leaf rule peaks at 34.3%, beating both Token-Window and broader (Use–Def Slice) alternatives [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

Discrete diffusion models are a powerful, emerging paradigm for code generation. They construct programs through iterative refinement of partially corrupted token sequences and enable parallel token refinement. Importantly, this paradigm exposes a global program state at each denoising step, which provides a natural intervention point for enforcing program-level functionality and security constraints, guiding the generation before the final code is committed. Building on this observation, the paper introduces Constrained Diffusion for Code (CDC), a training-free neurosymbolic inference framework that integrates constraint satisfaction directly into the reverse denoising process. CDC augments the base discrete diffusion sampler with constraint-aware denoising operators that combine mathematical optimization with program analysis to identify constraint-relevant regions of the intermediate program state and locally adjust the denoising trajectory, steering generation toward feasible programs while remaining close to the base model. Across code generation benchmarks, CDC consistently improves constraint satisfaction in functional correctness, security, and even syntax, outperforming discrete diffusion and autoregressive baselines with less corrective computation and more localized edits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds constraint enforcement inside discrete diffusion denoising steps for code via program analysis and local optimization, without any retraining.

read the letter

The main takeaway is that this work shows how to steer discrete diffusion code models toward satisfying functional, security, and syntax constraints by intervening during the iterative denoising process rather than after the fact. It does this with a training-free setup called CDC that uses constraint-aware operators to spot relevant parts of the partial program and tweak the local trajectory accordingly. That combination of diffusion's global state exposure with neurosymbolic adjustments looks like the actual novelty here, and it is presented as building directly on existing discrete diffusion samplers for code without introducing new fitted parameters or circular self-references. The approach claims consistent gains on benchmarks while keeping edits localized and sample quality close to the base model, which is a practical advantage if it holds up. The training-free aspect and the focus on reducing post-generation repair effort are clear strengths that could appeal to people working on reliable code generation pipelines. The soft spot is the reliance on standard program analysis to identify constraint-relevant regions in still-noisy or partially corrupted token sequences. Those intermediates are often syntactically broken, so parsing or static checks can fail or return misleading regions, and the paper's abstract does not spell out error-tolerant handling or fallback mechanisms. If the full methods section shows they tested this under realistic noise levels or added robustness, that concern shrinks; otherwise it remains a load-bearing assumption worth probing. This paper is for researchers in constrained code generation and diffusion-based modeling who want a neurosymbolic inference trick rather than a full retraining pipeline. A reader already familiar with discrete diffusion for code would get the most value from the operator design and the benchmark comparisons. It deserves a serious referee because the core mechanism is new enough and the claims are falsifiable enough to warrant detailed review, even if the evaluation details need tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Constrained Diffusion for Code (CDC), a training-free neurosymbolic inference framework for code generation with discrete diffusion models. CDC augments the base discrete diffusion sampler with constraint-aware denoising operators that combine mathematical optimization and program analysis to identify constraint-relevant regions of the intermediate program state and locally adjust the denoising trajectory. This steers generation toward programs satisfying functional correctness, security, and syntax constraints while remaining close to the base model distribution. The paper claims consistent improvements on code generation benchmarks over discrete diffusion and autoregressive baselines, achieved with less corrective computation and more localized edits.

Significance. If the central claims hold, the work would be significant for constrained code generation by enabling enforcement of program-level constraints during iterative denoising without retraining. The training-free neurosymbolic approach and exploitation of global intermediate states in diffusion models are clear strengths that could improve reliability in functional correctness and security for generated code.

major comments (2)

[Abstract and CDC framework description] The central mechanism (described in the abstract and the CDC framework section) assumes that program analysis can reliably identify constraint-relevant regions inside partially denoised, often syntactically invalid token sequences. Standard static analysis tools will frequently fail to parse or will return spurious regions on such noisy intermediates, and no explicit robustness mechanism (e.g., error-tolerant parsing or learned region prediction) is detailed. This assumption is load-bearing for the claim that local optimization adjustments steer trajectories toward feasible programs without degrading sample quality or requiring retraining.
[Abstract and experimental evaluation section] The abstract asserts 'consistent improvements' on functional correctness, security, and syntax benchmarks, yet provides no quantitative results, error bars, baseline numbers, or details on how constraint operators are implemented and evaluated. Without these, the magnitude and reliability of the reported gains cannot be assessed, weakening the empirical support for the framework's superiority over discrete diffusion and autoregressive baselines.

minor comments (2)

[Abstract] The phrase 'less corrective computation' is used without a precise definition or measurement protocol; clarify how this is quantified relative to baselines.
[Method] Notation for the constraint-aware denoising operators could be introduced more formally with equations to make the combination of optimization and program analysis explicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating planned revisions to improve the manuscript's clarity and completeness.

read point-by-point responses

Referee: [Abstract and CDC framework description] The central mechanism (described in the abstract and the CDC framework section) assumes that program analysis can reliably identify constraint-relevant regions inside partially denoised, often syntactically invalid token sequences. Standard static analysis tools will frequently fail to parse or will return spurious regions on such noisy intermediates, and no explicit robustness mechanism (e.g., error-tolerant parsing or learned region prediction) is detailed. This assumption is load-bearing for the claim that local optimization adjustments steer trajectories toward feasible programs without degrading sample quality or requiring retraining.

Authors: We appreciate the referee highlighting the importance of robustness in program analysis for noisy intermediates. The CDC framework incorporates lightweight, syntax-tolerant analysis within the constraint-aware operators, using token-pattern heuristics and partial matching to identify relevant regions even on invalid sequences, with mathematical optimization providing the primary steering mechanism. While Section 3 outlines this approach at a high level, we agree that explicit discussion of error tolerance would strengthen the presentation. We will add a paragraph detailing fallback strategies for parsing failures and how they preserve proximity to the base model distribution. revision: yes
Referee: [Abstract and experimental evaluation section] The abstract asserts 'consistent improvements' on functional correctness, security, and syntax benchmarks, yet provides no quantitative results, error bars, baseline numbers, or details on how constraint operators are implemented and evaluated. Without these, the magnitude and reliability of the reported gains cannot be assessed, weakening the empirical support for the framework's superiority over discrete diffusion and autoregressive baselines.

Authors: The abstract is written as a concise summary of contributions and high-level outcomes. Quantitative results—including specific improvement rates on the benchmarks, standard deviations across runs as error bars, direct baseline comparisons, and operator implementation details—are reported in the experimental evaluation section with supporting tables and analysis. To better support the claims within the abstract's length constraints, we will revise it to include representative numerical highlights of the gains while preserving readability. revision: yes

Circularity Check

0 steps flagged

No circularity: CDC is a training-free augmentation using external program analysis and optimization

full rationale

The paper presents CDC as a neurosymbolic inference method that augments an existing discrete diffusion sampler with constraint-aware denoising operators. These operators combine mathematical optimization and program analysis to steer generation toward feasible programs. No equations or steps in the provided description reduce by construction to fitted parameters, self-defined quantities, or load-bearing self-citations. The framework is explicitly training-free and operates on top of a base model without re-deriving its core sampling process from the constraints themselves. The central claim (improved constraint satisfaction via localized edits) remains independent of the inputs it modifies.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of newly introduced constraint-aware denoising operators that combine optimization and program analysis; the abstract does not list explicit free parameters or invented entities beyond these operators, and relies on standard assumptions of discrete diffusion models.

axioms (1)

domain assumption Discrete diffusion models expose a global program state at each denoising step that can serve as an intervention point for constraints.
This observation is stated as the key motivation for CDC in the abstract.

invented entities (1)

constraint-aware denoising operators no independent evidence
purpose: To combine mathematical optimization with program analysis for locally adjusting the denoising trajectory toward constraint-satisfying programs.
These operators are introduced as the core augmentation to the base sampler; no independent evidence outside the paper is provided in the abstract.

pith-pipeline@v0.9.0 · 5698 in / 1421 out tokens · 37415 ms · 2026-05-19T21:20:20.107748+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

yt = arg min y DKL(y || x̂(t)0) + Σ λj νt,j(y; rt, c, St)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 11 internal anchors

[1]

Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando De Freitas, Koray Kavukcuoglu, and Oriol Vinyals

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushm...

work page doi:10.1126/science.abq1158 2022
[2]

Repocoder: Repository-level code completion through itera- tive retrieval and generation.arXiv:2303.12570, 2023

Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. Repocoder: Repository-level code completion through iterative retrieval and generation, 2023. URL https://arxiv.org/abs/2303.12570. 9 Constrained Code Generation with Discrete Diffusion

work page arXiv 2023
[3]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering, 2024. URL https: //arxiv.org/abs/2405.15793

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Diffu- coder: Understanding and improving masked diffusion mod- els for code generation.arXiv preprint arXiv:2506.20639,

Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, and Yizhe Zhang. DiffuCoder: Understanding and improving masked diffusion models for code generation, 2025. URL https: //arxiv.org/abs/2506.20639

work page arXiv 2025
[5]

Dream-coder 7b: An open diffusion language model for code.arXiv preprint arXiv:2509.01142,

Zhihui Xie, Jiacheng Ye, Lin Zheng, Jiahui Gao, Jingwei Dong, Zirui Wu, Xueliang Zhao, Shansan Gong, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream-Coder 7b: An open diffusion language model for code, 2025. URLhttps://arxiv.org/abs/2509.01142

work page arXiv 2025
[6]

Asleep at the keyboard? assessing the security of github copilot’s code contributions, 2021

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. Asleep at the keyboard? assessing the security of github copilot’s code contributions, 2021. URL https://arxiv.org/ abs/2108.09293

work page arXiv 2021
[7]

From vulnerabilities to remediation: A systematic literature review of LLMs in code security, 2024

Enna Basic and Alberto Giaretta. From vulnerabilities to remediation: A systematic literature review of llms in code security, 2024. URLhttps://arxiv.org/abs/2412.15004

work page arXiv 2024
[8]

Díaz Ferreyra, Markus Mutas, Salem Dhiff, and Riccardo Scandariato

Catherine Tony, Nicolás E. Díaz Ferreyra, Markus Mutas, Salem Dhiff, and Riccardo Scandariato. Prompting techniques for secure code generation: A systematic investigation, 2025. URL https://arxiv.org/abs/ 2407.07064

work page arXiv 2025
[9]

Teaching large language models to self-debug,

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. Teaching large language models to self-debug,

work page
[10]

URLhttps://arxiv.org/abs/2304.05128

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme

J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. Improved lexically constrained decoding for translation and monolingual rewriting. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Lingu...

work page doi:10.18653/v1/n19-1090 2019
[12]

Wang, and Xi Victoria Lin

Ansong Ni, Srini Iyer, Dragomir Radev, Ves Stoyanov, Wen tau Yih, Sida I. Wang, and Xi Victoria Lin. Lever: Learning to verify language-to-code generation with execution, 2023. URL https://arxiv.org/abs/ 2302.08468

work page arXiv 2023
[13]

Type-constrained code generation with language models.Proceedings of the ACM on Programming Languages, 9(PLDI):601–626, June

Niels Mündler, Jingxuan He, Hao Wang, Koushik Sen, Dawn Song, and Martin Vechev. Type-constrained code generation with language models.Proceedings of the ACM on Programming Languages, 9(PLDI):601–626, June

work page
[14]

Type- Constrained Code Generation with Language Models

ISSN 2475-1421. doi: 10.1145/3729274. URLhttp://dx.doi.org/10.1145/3729274

work page doi:10.1145/3729274
[15]

CyberSecEval 2: A wide-ranging cybersecurity evaluation suite for large language models.arXiv preprint arXiv:2404.13161, 2024

Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ah- mad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, and Joshua Saxe. Cyberseceval 2: A wide-ranging cybersecurity evaluation suite for large language models, 2024. URL https://arxiv.org/abs/2404.13161

work page arXiv 2024
[16]

Simple guidance mechanisms for discrete diffusion models.arXiv preprint arXiv:2412.10193, 2024

Yair Schiff, Subham Sekhar Sahoo, Hao Phung, Guanghan Wang, Sam Boshar, Hugo Dalla-torre, Bernardo P. de Almeida, Alexander Rush, Thomas Pierrot, and V olodymyr Kuleshov. Simple guidance mechanisms for discrete diffusion models, 2025. URLhttps://arxiv.org/abs/2412.10193

work page arXiv 2025
[17]

Frey, Tim G

Nate Gruver, Samuel Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, and Andrew Gordon Wilson. Protein design with guided discrete diffusion, 2023. URL https://arxiv.org/abs/2305.20009

work page arXiv 2023
[18]

Con- strained discrete diffusion, 2025

Michael Cardei, Jacob K Christopher, Thomas Hartvigsen, Bhavya Kailkhura, and Ferdinando Fioretto. Con- strained discrete diffusion, 2025. URLhttps://arxiv.org/abs/2503.09790

work page arXiv 2025
[19]

Christopher, Michael Cardei, Jinhao Liang, and Ferdinando Fioretto

Jacob K. Christopher, Michael Cardei, Jinhao Liang, and Ferdinando Fioretto. Neuro-symbolic generative diffusion models for physically grounded, robust, and safe generation, 2025. URL https://arxiv.org/ abs/2506.01121

work page arXiv 2025
[20]

Constrained decoding of diffusion LLMs with context-free grammars, 2025

Niels Mündler, Jasper Dekoninck, and Martin Vechev. Constrained decoding of diffusion LLMs with context-free grammars, 2025. URLhttps://arxiv.org/abs/2508.10111

work page arXiv 2025
[21]

Dream 7B: Diffusion Large Language Models

Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models, 2025. URLhttps://arxiv.org/abs/2508.15487

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Chiu, Alexander Rush, and V olodymyr Kuleshov

Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models, 2024. URLhttps: //arxiv.org/abs/2406.07524. 10 Constrained Code Generation with Discrete Diffusion

work page arXiv 2024
[23]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models, 2025. URLhttps://arxiv.org/abs/2502.09992

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, Y . K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. DeepSeek-Coder: When the large language model meets programming – the rise of code intelligence, 2024. URLhttps://arxiv.org/abs/2401.14196

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Code Llama: Open Foundation Models for Code

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nico...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[27]

Program Synthesis with Large Language Models

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. Program synthesis with large language models, 2021. URLhttps://arxiv.org/abs/2108.07732

work page internal anchor Pith review Pith/arXiv arXiv 2021
[28]

CodeGeeX: A pre-trained model for code generation with multilingual benchmarking on HumanEval-X

Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, and Jie Tang. CodeGeeX: A pre-trained model for code generation with multilingual benchmarking on HumanEval-X. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pages 5673–5684, N...

work page doi:10.1145/3580305.3599790 2023
[29]

Cweval: Outcome- driven evaluation on functionality and security of llm code generation, 2025

Jinjun Peng, Leyi Cui, Kele Huang, Junfeng Yang, and Baishakhi Ray. CWEval: Outcome-driven evaluation on functionality and security of LLM code generation, 2025. URL https://arxiv.org/abs/2501.08200

work page arXiv 2025
[30]

Díaz Ferreyra, and Riccardo Scandariato

Catherine Tony, Markus Mutas, Nicolás E. Díaz Ferreyra, and Riccardo Scandariato. LLMSecEval: A dataset of natural language prompts for security evaluations. In2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), pages 588–592. IEEE, 2023. doi: 10.1109/MSR59073.2023.00084. URL https://doi.org/10.1109/MSR59073.2023.00084

work page doi:10.1109/msr59073.2023.00084 2023
[31]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020. URL https: //arxiv.org/abs/2006.11239

work page internal anchor Pith review Pith/arXiv arXiv 2020
[32]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models, 2022. URLhttps://arxiv.org/abs/2112.10752

work page internal anchor Pith review Pith/arXiv arXiv 2022
[33]

Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg

Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces, 2021. URLhttps://arxiv.org/abs/2107.03006

work page arXiv 2021
[34]

Qwen2.5-Coder Technical Report

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, Kai Dang, Yang Fan, Yichang Zhang, An Yang, Rui Men, Fei Huang, Bo Zheng, Yibo Miao, Shanghaoran Quan, Yunlong Feng, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, and Junyang Lin. Qwen2.5- Coder technical report, 2024. URLhttps://arxiv.org/a...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Modeling and discovering vulnerabilities with code property graphs

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. Modeling and discovering vulnerabilities with code property graphs. In2014 IEEE Symposium on Security and Privacy, pages 590–604. IEEE, 2014. doi: 10.1109/SP.2014.44. URLhttps://doi.org/10.1109/SP.2014.44

work page doi:10.1109/sp.2014.44 2014
[36]

no further hint

Max Brunsfeld. Tree-sitter: An incremental parsing system. https://tree-sitter.github.io/ tree-sitter/, 2018. URLhttps://tree-sitter.github.io/tree-sitter/. Broader Impact This work aims to improve the reliability and security of code generation by steering diffusion models toward programs that better satisfy functional and security constraints. Potential...

work page 2018

[1] [1]

Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando De Freitas, Koray Kavukcuoglu, and Oriol Vinyals

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushm...

work page doi:10.1126/science.abq1158 2022

[2] [2]

Repocoder: Repository-level code completion through itera- tive retrieval and generation.arXiv:2303.12570, 2023

Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. Repocoder: Repository-level code completion through iterative retrieval and generation, 2023. URL https://arxiv.org/abs/2303.12570. 9 Constrained Code Generation with Discrete Diffusion

work page arXiv 2023

[3] [3]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering, 2024. URL https: //arxiv.org/abs/2405.15793

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

Diffu- coder: Understanding and improving masked diffusion mod- els for code generation.arXiv preprint arXiv:2506.20639,

Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, and Yizhe Zhang. DiffuCoder: Understanding and improving masked diffusion models for code generation, 2025. URL https: //arxiv.org/abs/2506.20639

work page arXiv 2025

[5] [5]

Dream-coder 7b: An open diffusion language model for code.arXiv preprint arXiv:2509.01142,

Zhihui Xie, Jiacheng Ye, Lin Zheng, Jiahui Gao, Jingwei Dong, Zirui Wu, Xueliang Zhao, Shansan Gong, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream-Coder 7b: An open diffusion language model for code, 2025. URLhttps://arxiv.org/abs/2509.01142

work page arXiv 2025

[6] [6]

Asleep at the keyboard? assessing the security of github copilot’s code contributions, 2021

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. Asleep at the keyboard? assessing the security of github copilot’s code contributions, 2021. URL https://arxiv.org/ abs/2108.09293

work page arXiv 2021

[7] [7]

From vulnerabilities to remediation: A systematic literature review of LLMs in code security, 2024

Enna Basic and Alberto Giaretta. From vulnerabilities to remediation: A systematic literature review of llms in code security, 2024. URLhttps://arxiv.org/abs/2412.15004

work page arXiv 2024

[8] [8]

Díaz Ferreyra, Markus Mutas, Salem Dhiff, and Riccardo Scandariato

Catherine Tony, Nicolás E. Díaz Ferreyra, Markus Mutas, Salem Dhiff, and Riccardo Scandariato. Prompting techniques for secure code generation: A systematic investigation, 2025. URL https://arxiv.org/abs/ 2407.07064

work page arXiv 2025

[9] [9]

Teaching large language models to self-debug,

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. Teaching large language models to self-debug,

work page

[10] [10]

URLhttps://arxiv.org/abs/2304.05128

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme

J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, and Benjamin Van Durme. Improved lexically constrained decoding for translation and monolingual rewriting. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Lingu...

work page doi:10.18653/v1/n19-1090 2019

[12] [12]

Wang, and Xi Victoria Lin

Ansong Ni, Srini Iyer, Dragomir Radev, Ves Stoyanov, Wen tau Yih, Sida I. Wang, and Xi Victoria Lin. Lever: Learning to verify language-to-code generation with execution, 2023. URL https://arxiv.org/abs/ 2302.08468

work page arXiv 2023

[13] [13]

Type-constrained code generation with language models.Proceedings of the ACM on Programming Languages, 9(PLDI):601–626, June

Niels Mündler, Jingxuan He, Hao Wang, Koushik Sen, Dawn Song, and Martin Vechev. Type-constrained code generation with language models.Proceedings of the ACM on Programming Languages, 9(PLDI):601–626, June

work page

[14] [14]

Type- Constrained Code Generation with Language Models

ISSN 2475-1421. doi: 10.1145/3729274. URLhttp://dx.doi.org/10.1145/3729274

work page doi:10.1145/3729274

[15] [15]

CyberSecEval 2: A wide-ranging cybersecurity evaluation suite for large language models.arXiv preprint arXiv:2404.13161, 2024

Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ah- mad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, and Joshua Saxe. Cyberseceval 2: A wide-ranging cybersecurity evaluation suite for large language models, 2024. URL https://arxiv.org/abs/2404.13161

work page arXiv 2024

[16] [16]

Simple guidance mechanisms for discrete diffusion models.arXiv preprint arXiv:2412.10193, 2024

Yair Schiff, Subham Sekhar Sahoo, Hao Phung, Guanghan Wang, Sam Boshar, Hugo Dalla-torre, Bernardo P. de Almeida, Alexander Rush, Thomas Pierrot, and V olodymyr Kuleshov. Simple guidance mechanisms for discrete diffusion models, 2025. URLhttps://arxiv.org/abs/2412.10193

work page arXiv 2025

[17] [17]

Frey, Tim G

Nate Gruver, Samuel Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, and Andrew Gordon Wilson. Protein design with guided discrete diffusion, 2023. URL https://arxiv.org/abs/2305.20009

work page arXiv 2023

[18] [18]

Con- strained discrete diffusion, 2025

Michael Cardei, Jacob K Christopher, Thomas Hartvigsen, Bhavya Kailkhura, and Ferdinando Fioretto. Con- strained discrete diffusion, 2025. URLhttps://arxiv.org/abs/2503.09790

work page arXiv 2025

[19] [19]

Christopher, Michael Cardei, Jinhao Liang, and Ferdinando Fioretto

Jacob K. Christopher, Michael Cardei, Jinhao Liang, and Ferdinando Fioretto. Neuro-symbolic generative diffusion models for physically grounded, robust, and safe generation, 2025. URL https://arxiv.org/ abs/2506.01121

work page arXiv 2025

[20] [20]

Constrained decoding of diffusion LLMs with context-free grammars, 2025

Niels Mündler, Jasper Dekoninck, and Martin Vechev. Constrained decoding of diffusion LLMs with context-free grammars, 2025. URLhttps://arxiv.org/abs/2508.10111

work page arXiv 2025

[21] [21]

Dream 7B: Diffusion Large Language Models

Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models, 2025. URLhttps://arxiv.org/abs/2508.15487

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Chiu, Alexander Rush, and V olodymyr Kuleshov

Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models, 2024. URLhttps: //arxiv.org/abs/2406.07524. 10 Constrained Code Generation with Discrete Diffusion

work page arXiv 2024

[23] [23]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models, 2025. URLhttps://arxiv.org/abs/2502.09992

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, Y . K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. DeepSeek-Coder: When the large language model meets programming – the rise of code intelligence, 2024. URLhttps://arxiv.org/abs/2401.14196

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

Code Llama: Open Foundation Models for Code

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nico...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[26] [26]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page internal anchor Pith review Pith/arXiv arXiv 2021

[27] [27]

Program Synthesis with Large Language Models

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. Program synthesis with large language models, 2021. URLhttps://arxiv.org/abs/2108.07732

work page internal anchor Pith review Pith/arXiv arXiv 2021

[28] [28]

CodeGeeX: A pre-trained model for code generation with multilingual benchmarking on HumanEval-X

Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, and Jie Tang. CodeGeeX: A pre-trained model for code generation with multilingual benchmarking on HumanEval-X. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, pages 5673–5684, N...

work page doi:10.1145/3580305.3599790 2023

[29] [29]

Cweval: Outcome- driven evaluation on functionality and security of llm code generation, 2025

Jinjun Peng, Leyi Cui, Kele Huang, Junfeng Yang, and Baishakhi Ray. CWEval: Outcome-driven evaluation on functionality and security of LLM code generation, 2025. URL https://arxiv.org/abs/2501.08200

work page arXiv 2025

[30] [30]

Díaz Ferreyra, and Riccardo Scandariato

Catherine Tony, Markus Mutas, Nicolás E. Díaz Ferreyra, and Riccardo Scandariato. LLMSecEval: A dataset of natural language prompts for security evaluations. In2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), pages 588–592. IEEE, 2023. doi: 10.1109/MSR59073.2023.00084. URL https://doi.org/10.1109/MSR59073.2023.00084

work page doi:10.1109/msr59073.2023.00084 2023

[31] [31]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020. URL https: //arxiv.org/abs/2006.11239

work page internal anchor Pith review Pith/arXiv arXiv 2020

[32] [32]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models, 2022. URLhttps://arxiv.org/abs/2112.10752

work page internal anchor Pith review Pith/arXiv arXiv 2022

[33] [33]

Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg

Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces, 2021. URLhttps://arxiv.org/abs/2107.03006

work page arXiv 2021

[34] [34]

Qwen2.5-Coder Technical Report

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, Kai Dang, Yang Fan, Yichang Zhang, An Yang, Rui Men, Fei Huang, Bo Zheng, Yibo Miao, Shanghaoran Quan, Yunlong Feng, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, and Junyang Lin. Qwen2.5- Coder technical report, 2024. URLhttps://arxiv.org/a...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [35]

Modeling and discovering vulnerabilities with code property graphs

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. Modeling and discovering vulnerabilities with code property graphs. In2014 IEEE Symposium on Security and Privacy, pages 590–604. IEEE, 2014. doi: 10.1109/SP.2014.44. URLhttps://doi.org/10.1109/SP.2014.44

work page doi:10.1109/sp.2014.44 2014

[36] [36]

no further hint

Max Brunsfeld. Tree-sitter: An incremental parsing system. https://tree-sitter.github.io/ tree-sitter/, 2018. URLhttps://tree-sitter.github.io/tree-sitter/. Broader Impact This work aims to improve the reliability and security of code generation by steering diffusion models toward programs that better satisfy functional and security constraints. Potential...

work page 2018