AI agents modify logging less often than humans in 58.4% of repositories but produce higher log density when they change it; explicit logging instructions are rare (4.7%) and ignored 67% of the time, with humans performing 72.5% of post-generation log repairs.
Comparing human and llm generated code: The jury is still out!arXiv preprint arXiv:2501.16857
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 5years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
AI-generated code requires less maintenance than human-written code, mostly involving feature additions by humans rather than bug fixes.
Comparative case study on a postgraduate Java assignment finds PureAI and PostAI projects simpler with lower code smell density than PreAI but show oversimplification and weaker responsibility separation.
LLM-generated code matches human-written code in overall readability but exhibits different issue patterns, and prompt engineering has limited impact on improving it.
The paper proposes shifting GenAI research in software engineering from narrow performance metrics to also include developer well-being, social context, and sustainable productivity.
citing papers explorer
-
Do AI Coding Agents Log Like Humans? An Empirical Study
AI agents modify logging less often than humans in 58.4% of repositories but produce higher log density when they change it; explicit logging instructions are rare (4.7%) and ignored 67% of the time, with humans performing 72.5% of post-generation log repairs.
-
To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study
AI-generated code requires less maintenance than human-written code, mostly involving feature additions by humans rather than bug fixes.
-
Can LLMs Produce Better Object-Oriented Designs than Human-Involved Development?
Comparative case study on a postgraduate Java assignment finds PureAI and PostAI projects simpler with lower code smell density than PreAI but show oversimplification and weaker responsibility separation.
-
The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code
LLM-generated code matches human-written code in overall readability but exhibits different issue patterns, and prompt engineering has limited impact on improving it.
-
At What Cost? Software Developers' Well-Being in the Age of GenAI
The paper proposes shifting GenAI research in software engineering from narrow performance metrics to also include developer well-being, social context, and sustainable productivity.