Agentic Software Engineering: Foundational Pillars and a Research Roadmap

Ahmed E. Hassan; Bram Adams; Dayi Lin; Dong Qiu; Hao Li; Tse-Hsun Chen; Yutaro Kashiwa

arxiv: 2509.06216 · v3 · pith:CJ4O33U3new · submitted 2025-09-07 · 💻 cs.SE · cs.AI

Agentic Software Engineering: Foundational Pillars and a Research Roadmap

Ahmed E. Hassan , Hao Li , Dayi Lin , Bram Adams , Tse-Hsun Chen , Yutaro Kashiwa , Dong Qiu This is my paper

classification 💻 cs.SE cs.AI

keywords agenticengineeringsoftwareagentagentsfoundationalfuturepillars

0 comments

read the original abstract

Agentic Software Engineering (SE 3.0) represents a new era where intelligent agents are tasked not with simple code generation, but with achieving complex, goal-oriented SE objectives. To harness these new capabilities while ensuring trustworthiness, we must recognize a fundamental duality within the SE field in the Agentic SE era, comprising two symbiotic modalities: SE for Humans and SE for Agents. This duality demands a radical reimagining of the foundational pillars of SE (actors, processes, tools, and artifacts) which manifest differently across each modality. We propose two purpose-built workbenches to support this vision. The Agent Command Environment (ACE) serves as a command center where humans orchestrate and mentor agent teams, handling outputs such as Merge-Readiness Packs (MRPs) and Consultation Request Packs (CRPs). The Agent Execution Environment (AEE) is a digital workspace where agents perform tasks while invoking human expertise when facing ambiguity or complex trade-offs. This bi-directional partnership, which supports agent-initiated human callbacks and handovers, gives rise to new, structured engineering activities (i.e., processes) that redefine human-AI collaboration, elevating the practice from agentic coding to true agentic software engineering. This paper presents the Structured Agentic Software Engineering (SASE) vision, outlining several of the foundational pillars for the future of SE. The paper culminates in a research roadmap that identifies a few key challenges and opportunities while briefly discussing the resulting impact of this future on SE education. Our goal is not to offer a definitive solution, but to provide a conceptual scaffold with structured vocabulary to catalyze a community-wide dialogue, pushing the SE community to think beyond its classic, human-centric tenets toward a disciplined, scalable, and trustworthy agentic future.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Dataset of Agentic AI Coding Tool Configurations
cs.SE 2026-05 accept novelty 8.0

A publicly released dataset of 15,591 configuration artifacts for five agentic AI coding tools, drawn from 4,738 GitHub repositories along with associated files and AI-co-authored commits.
N-Version Programming with Coding Agents
cs.SE 2026-06 unverdicted novelty 7.0

Diverse AI coding agents in N-version programming reduce mean failures from 387.44 to 130.99 in triples on the Launch Interceptor Program, with 11,844 zero-failure units observed across 1M tests.
AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub
cs.SE 2026-04 accept novelty 7.0

AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
Code Review Agent Benchmark
cs.SE 2026-03 unverdicted novelty 7.0

c-CRAB benchmark shows state-of-the-art code review agents solve only around 40% of tasks derived from human reviews, suggesting potential for human-AI collaboration.
AgenticSZZ: Temporal Knowledge Graph-Guided Agentic Bug-Inducing Commit Identification
cs.SE 2026-02 conditional novelty 7.0

AgenticSZZ reframes bug-inducing commit identification as temporal knowledge graph search navigated by an LLM agent, reporting F1 scores of 0.47-0.79 and up to 34% improvement over prior SZZ methods on three datasets.
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests
cs.SE 2026-01 unverdicted novelty 7.0

AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.
Agentic Persona Generation with Critique-Refinement: An Industrial Evaluation
cs.SE 2026-06 unverdicted novelty 6.0

PerGent, an agentic critique-refinement system for persona generation, reaches 96.9% expert approval in an industrial evaluation at Kinaxis and reproduces more pre-LLM expert content than single-shot baselines.
Case Studies and Reflections on Agentic Software Engineering for Rapid Development of Digital Music Instruments
cs.SE 2026-05 accept novelty 5.0

Agentic AI was used to rapidly reimplement Music Mouse, translate the Continuator system, and add a 3D UI to a tracker sequencer, with reflections on effective practices for audio software development.
KISS Sorcar: A Stupidly-Simple General-Purpose and Software Engineering AI Assistant
cs.SE 2026-04 unverdicted novelty 5.0

KISS Sorcar introduces a simple layered agent framework and VS Code IDE that reaches 62.2% pass rate on Terminal Bench 2.0 by combining ReAct execution, summarization-based continuation, parallel tools, persistent his...
Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows
cs.SE 2026-04 unverdicted novelty 5.0

Large-scale analysis of AI bot PRs shows Copilot and Codex achieve the highest CI/CD success rates but more frequent AI contributions correlate with reduced workflow reliability.
Bias in the Loop: Auditing LLM-as-a-Judge for Software Engineering
cs.SE 2026-04 unverdicted novelty 5.0

LLM judges for code tasks show high sensitivity to prompt biases that systematically favor certain options, changing accuracy and model rankings even when code is unchanged.
Beyond the 'Diff': Addressing Agentic Entropy in Agentic Software Development
cs.SE 2026-03 unverdicted novelty 5.0

Agentic entropy names the systemic drift in AI coding agents away from architectural intent; a new framework using conformity seeding, reasoning monitoring, and causal graph interfaces supplies process-level oversight...
Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective
cs.SE 2025-11 conditional novelty 5.0

Qualitative interview study with 22 practitioners identifies multi-level benefits, challenges, and mitigation strategies for using LLMs in software development.
How Do Developers Maintain and Evolve Their Agents' Instructions? An Empirical Study
cs.SE 2026-06 unverdicted novelty 4.0

The authors describe a research plan for mining ACF evolution, classifying changes via a maintenance taxonomy, and linking change types to code quality metrics in agent-driven repositories.
Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering
cs.SE 2026-06 unverdicted novelty 4.0

Coding benchmarks misalign with agentic software engineering because they conflate model and harness, grade against single references, and provide no component-level iteration signals.
Quality and Security Signals in AI-Generated Python Refactoring Pull Requests
cs.SE 2026-05 unverdicted novelty 4.0

Empirical analysis of AI refactoring PRs shows quality attribute improvements in 22.5% of cases with new Pylint issues in 24.17% and Bandit findings in 4.7%, yet 73.5% developer acceptance.
Fairness in Multi-Agent Systems for Software Engineering: An SDLC-Oriented Rapid Review
cs.SE 2026-04 unverdicted novelty 2.0

A rapid review of fairness in LLM-enabled multi-agent systems for the software development lifecycle concludes that the field lacks standardized evaluations, broad coverage, and effective governance, leaving it unprep...