Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks

· 2024 · arXiv 2406.15325

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code

cs.SE · 2026-04-25 · unverdicted · novelty 4.0

Locally deployed LLMs achieve 43-45% accuracy on Python bug detection but frequently produce only partial identifications of problematic code regions.

Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation

cs.SE · 2026-04-16 · unverdicted · novelty 3.0

RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.

citing papers explorer

Showing 2 of 2 citing papers.

An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code cs.SE · 2026-04-25 · unverdicted · none · ref 13
Locally deployed LLMs achieve 43-45% accuracy on Python bug detection but frequently produce only partial identifications of problematic code regions.
Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation cs.SE · 2026-04-16 · unverdicted · none · ref 47
RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.

Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks

fields

years

verdicts

representative citing papers

citing papers explorer