Bugs in large language models generated code: An empirical study

Tambon, F · 2024 · arXiv 2403.08937

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

cs.SE · 2026-05-06 · accept · novelty 6.0

A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.

What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study

cs.SE · 2026-04-29 · unverdicted · novelty 6.0

Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.

Prompt Optimization for LLM Code Generation via Reinforcement Learning

cs.SE · 2026-05-18 · unverdicted · novelty 5.0

A PPO agent with hybrid actions and test-driven rewards optimizes prompts for code LLMs, raising strict Pass@1 scores on MBPP+, HumanEval+, and APPS over prior methods.

An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code

cs.SE · 2026-04-25 · unverdicted · novelty 4.0

Locally deployed LLMs achieve 43-45% accuracy on Python bug detection but frequently produce only partial identifications of problematic code regions.

citing papers explorer

Showing 4 of 4 citing papers.

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code cs.SE · 2026-05-06 · accept · none · ref 116
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study cs.SE · 2026-04-29 · unverdicted · none · ref 53
Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.
Prompt Optimization for LLM Code Generation via Reinforcement Learning cs.SE · 2026-05-18 · unverdicted · none · ref 31
A PPO agent with hybrid actions and test-driven rewards optimizes prompts for code LLMs, raising strict Pass@1 scores on MBPP+, HumanEval+, and APPS over prior methods.
An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code cs.SE · 2026-04-25 · unverdicted · none · ref 14
Locally deployed LLMs achieve 43-45% accuracy on Python bug detection but frequently produce only partial identifications of problematic code regions.

Bugs in large language models generated code: An empirical study

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer