A synthesis of 30 secondary studies finds strong benchmark accuracy for LLM code generation but weak real-world generalization, fragile robustness, pervasive efficiency issues, and under-reported bias, calling for domain-aware improvements and standardized evaluation.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Tertiary Review of Large Language Model-Based Code Generating Tasks: Trends, Challenges, and Future Directions
A synthesis of 30 secondary studies finds strong benchmark accuracy for LLM code generation but weak real-world generalization, fragile robustness, pervasive efficiency issues, and under-reported bias, calling for domain-aware improvements and standardized evaluation.