Locally deployed LLMs achieve 43-45% accuracy on Python bug detection but frequently produce only partial identifications of problematic code regions.
Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SE 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.
citing papers explorer
-
An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code
Locally deployed LLMs achieve 43-45% accuracy on Python bug detection but frequently produce only partial identifications of problematic code regions.
-
Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation
RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.