Systematic review of thirteen malicious-code prompt corpora for coding LLM refusal evaluation that catalogs construction methods, surfaces gaps in human baselines, cross-corpus comparability, and malware taxonomies, and proposes methodological improvements.
InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–11
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.
A within-subject study of 12 developers found that security training reduced validated weaknesses by 31.5% and critical issues by 79.2% in LLM-assisted backend coding.
DeepGuard aggregates multi-layer representations in code LLMs to raise the secure-and-correct generation rate by 11.9% on average over baselines like SVEN while preserving correctness and generalizing to new vulnerability types.
LLMs perform well on basic syntactic and semantic bugs in small code but struggle with complex security vulnerabilities and large production codebases.
citing papers explorer
-
Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)
Systematic review of thirteen malicious-code prompt corpora for coding LLM refusal evaluation that catalogs construction methods, surfaces gaps in human baselines, cross-corpus comparability, and malware taxonomies, and proposes methodological improvements.
-
Constrained Code Generation with Discrete Diffusion
Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.
-
A Quasi-Experimental Developer Study of Security Training in LLM-Assisted Web Application Development
A within-subject study of 12 developers found that security training reduced validated weaknesses by 31.5% and critical issues by 79.2% in LLM-assisted backend coding.
-
DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation
DeepGuard aggregates multi-layer representations in code LLMs to raise the secure-and-correct generation rate by 11.9% on average over baselines like SVEN while preserving correctness and generalizing to new vulnerability types.
-
Can LLMs Find Bugs in Code? An Evaluation from Beginner Errors to Security Vulnerabilities in Python and C++
LLMs perform well on basic syntactic and semantic bugs in small code but struggle with complex security vulnerabilities and large production codebases.