A publicly released dataset of 15,591 configuration artifacts for five agentic AI coding tools, drawn from 4,738 GitHub repositories along with associated files and AI-co-authored commits.
hub
TOSEM https: //arxiv.org/abs/2509.14745, forthcoming
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
fields
cs.SE 13years
2026 13roles
background 2representative citing papers
AI agents modify logging less often than humans in 58.4% of repositories but produce higher log density when they change it; explicit logging instructions are rare (4.7%) and ignored 67% of the time, with humans performing 72.5% of post-generation log repairs.
AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
Analysis of 9,799 human-reviewed agentic PRs shows only 35.7% of rejections reflect clear agent failures, with 31.2% due to workflow constraints and 33.1% lacking clear rationale, plus notable interaction differences across agents.
AI-generated code requires less maintenance than human-written code, mostly involving feature additions by humans rather than bug fixes.
Hot fixes show urgency patterns with reduced collaboration and testing, differing from regular fixes, and human versus AI agents display over 10 distinct repair behaviors in large-scale GitHub data.
Reviewer bots' higher comment volume on AI agent PRs is associated with slower resolutions and poorer average feedback quality, while feedback quality itself has no association with PR outcomes.
AI-generated security pull requests frequently contain a small set of recurring weaknesses, with many flawed ones merged and rejections driven by process factors rather than technical issues.
AI coding assistants introduce code issues that persist in 22.7% of cases across real projects, creating measurable long-term technical debt.
The paper presents a vision for an agentic code review framework spanning PR Creation, Augmentation, Reviewer Selection, AI-Assisted Review, and Retrospective, with humans retained at quality gates.
Code review agents achieve 45.20% merge rate on PRs versus 68.37% for humans, with 60.2% of agent-only closed PRs showing 0-30% signal quality.
Agentic Agile-V uses Agile-V as backbone and a Specify-Constrain-Orchestrate-Prove-Evolve-Verify loop to convert AI agent conversations into traceable engineering artifacts with acceptance evidence.
citing papers explorer
-
A Dataset of Agentic AI Coding Tool Configurations
A publicly released dataset of 15,591 configuration artifacts for five agentic AI coding tools, drawn from 4,738 GitHub repositories along with associated files and AI-co-authored commits.
-
Do AI Coding Agents Log Like Humans? An Empirical Study
AI agents modify logging less often than humans in 58.4% of repositories but produce higher log density when they change it; explicit logging instructions are rare (4.7%) and ignored 67% of the time, with humans performing 72.5% of post-generation log repairs.
-
AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub
AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
-
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests
AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
-
Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study
Analysis of 9,799 human-reviewed agentic PRs shows only 35.7% of rejections reflect clear agent failures, with 31.2% due to workflow constraints and 33.1% lacking clear rationale, plus notable interaction differences across agents.
-
To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study
AI-generated code requires less maintenance than human-written code, mostly involving feature additions by humans rather than bug fixes.
-
Hot Fixing in the Wild
Hot fixes show urgency patterns with reduced collaboration and testing, differing from regular fixes, and human versus AI agents display over 10 distinct repair behaviors in large-scale GitHub data.
-
On the Footprints of Reviewer Bots Feedback on Agentic Pull Requests in OSS GitHub Repositories
Reviewer bots' higher comment volume on AI agent PRs is associated with slower resolutions and poorer average feedback quality, while feedback quality itself has no association with PR outcomes.
-
Insights into Security-Related AI-Generated Pull Requests
AI-generated security pull requests frequently contain a small set of recurring weaknesses, with many flawed ones merged and rejections driven by process factors rather than technical issues.
-
Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild
AI coding assistants introduce code issues that persist in 22.7% of cases across real projects, creating measurable long-term technical debt.
-
Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review
The paper presents a vision for an agentic code review framework spanning PR Creation, Augmentation, Reviewer Selection, AI-Assisted Review, and Retrospective, with humans retained at quality gates.
-
From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests
Code review agents achieve 45.20% merge rate on PRs versus 68.37% for humans, with 60.2% of agent-only closed PRs showing 0-30% signal quality.
-
Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development
Agentic Agile-V uses Agile-V as backbone and a Specify-Constrain-Orchestrate-Prove-Evolve-Verify loop to convert AI agent conversations into traceable engineering artifacts with acceptance evidence.