Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development

Christopher Koch

arxiv: 2605.20456 · v1 · pith:3T2SMBNSnew · submitted 2026-05-19 · 💻 cs.SE · cs.AI· cs.MA

Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development

Christopher Koch This is my paper

Pith reviewed 2026-05-21 06:48 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.MA

keywords agentic AIsoftware engineeringAgile processesverificationhardware developmentprocess frameworkcode generation

0 comments

The pith

Agentic AI coding improves when conversational intent is turned into contracts and verified artifacts through structured processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that agentic AI systems can inspect codebases, generate plans, edit files, and run tests, yet studies show mixed productivity results and repeated failures in repository setup, dependencies, and hardware verification. Rather than focusing on better prompts, it claims the key issue is engineering process control. It introduces Agentic Agile-V, which uses an Agile-V backbone plus a SCOPE-V loop to convert dialogue into requirements, constraints, and acceptance evidence. A sympathetic reader would care because this means AI tools raise the stakes for clear specifications and independent checks instead of removing the need for them. If the claim holds, development teams could reduce rework by enforcing these controls before agents act.

Core claim

Agentic AI does not eliminate engineering discipline; it increases the value of requirements, constraints, traceability, independent verification, and human approval. The paper proposes Agentic Agile-V as a framework that applies Agile-V as the lifecycle and introduces a task-level SCOPE-V loop to transform conversational intent into structured engineering artifacts and acceptance evidence. It contributes a taxonomy of minimum input artifacts, a conversation-to-contract gate, risk-adaptive workflows, and an evidence-bundle acceptance model.

What carries the argument

The SCOPE-V loop (Specify, Constrain, Orchestrate, Prove, Evolve, Verify) inside the Agentic Agile-V framework, which converts exploratory dialogue into traceable contracts and verified outputs while preserving human approval.

If this is right

Minimum input artifact taxonomies will standardize how agents receive requirements for software, firmware, and hardware tasks.
The conversation-to-contract gate will separate exploratory chat from implementation to prevent unverified changes.
Risk-adaptive workflows will adjust the level of specification and verification for features, bug fixes, testing, and hardware work.
Evidence-bundle acceptance will supply concrete criteria for approving agent-generated artifacts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar gates and loops could extend to agentic tasks in data pipelines or scientific computing to reduce setup errors.
Large open-source projects might adopt the framework to improve traceability when multiple agents contribute to the same repository.
Hardware teams could test the evidence-bundle model first on RTL verification benchmarks to quantify gains in approval speed.

Load-bearing premise

The assumption that a conversation-to-contract gate and SCOPE-V loop will reliably convert current agent failures in repository setup, dependency handling, and hardware verification into successful outcomes.

What would settle it

A controlled experiment that measures success rates on repository setup, dependency resolution, and hardware verification tasks when agents use the conversation-to-contract gate and SCOPE-V loop versus current unstructured methods, showing no meaningful improvement.

Figures

Figures reproduced from arXiv: 2605.20456 by Christopher Koch.

**Figure 1.** Figure 1: High-level Agentic Agile-V model. C. The Missing Bridge Current tools provide execution surfaces, sandboxing, repository instructions, test execution, and pull-request workflows. Existing engineering processes provide requirements discipline, verification logic, review, and release gates. The missing bridge is a lightweight framework that tells teams what input to provide to agents, how to structure agen… view at source ↗

read the original abstract

Agentic AI coding systems can inspect repositories, plan implementation steps, edit files, call tools, run tests, and submit pull requests. These capabilities make software and hardware development faster in some settings, but current evidence does not support the simple claim that autonomous code generation automatically improves engineering outcomes. Controlled studies report productivity gains in some enterprise tasks, slowdowns in mature open-source work, moderate but heterogeneous meta-analytic effects, and persistent failures in repository setup, dependency handling, permission gating, and hardware verification. This paper argues that the central problem is no longer prompt engineering; it is engineering process control. It synthesizes evidence from agentic software engineering, GitHub-scale adoption studies, repository-level agent configuration, productivity trials, issue-resolution benchmarks, and hardware/RTL verification research. It proposes Agentic Agile-V, a process framework that uses Agile-V as the lifecycle backbone and a task-level SCOPE-V loop - Specify, Constrain, Orchestrate, Prove, Evolve, and Verify - to convert conversational intent into structured engineering artifacts and acceptance evidence. The paper contributes: (i) a taxonomy of minimum input artifacts for agentic software, firmware, and hardware work; (ii) a conversation-to-contract gate that separates exploratory dialogue from implementation; (iii) risk-adaptive feature, bug-fix, testing, and hardware workflows; and (iv) an evidence-bundle acceptance model for agent-generated artifacts. The paper concludes that agentic AI does not eliminate engineering discipline; it increases the value of requirements, constraints, traceability, independent verification, and human approval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a position paper that names a process framework for agentic coding but offers no worked examples showing how its SCOPE-V loop actually resolves the failures it cites.

read the letter

The paper's core point is that agentic AI tools do not replace engineering discipline; they make requirements, constraints, traceability, and verification more necessary. It synthesizes studies on productivity gains, slowdowns in open-source settings, and recurring problems like repository setup, dependency handling, and hardware verification, then proposes Agentic Agile-V as the fix. The backbone is an Agile-V lifecycle plus a task-level SCOPE-V loop (Specify, Constrain, Orchestrate, Prove, Evolve, Verify), a conversation-to-contract gate, a taxonomy of minimum artifacts, risk-adaptive workflows, and an evidence-bundle acceptance model. These are the concrete contributions. The synthesis of existing evidence on agent failures is useful and the argument that discipline becomes more valuable rather than less is stated plainly. The taxonomy and gate idea give teams something specific to try. The soft spot is the missing link between the proposed steps and the documented problems. The paper lists failures from prior work and describes the SCOPE-V loop at a high level, but it does not walk through even one case—such as a missing dependency manifest or an RTL timing violation—to show which step would detect it and how the output would differ from current agent behavior. Without that trace or any new trial, the claim that the framework converts failures into reliable outcomes stays at the level of suggestion. This paper is for readers who build or manage AI-assisted engineering teams and want structured process ideas rather than new measurements. A practitioner or process researcher would find the recommendations worth discussing. It deserves a serious referee because the synthesis is grounded and the proposal is detailed enough to be tested or refined in review.

Referee Report

1 major / 2 minor

Summary. The manuscript synthesizes evidence from agentic AI coding studies, GitHub adoption data, productivity trials, and hardware verification research to argue that current agentic systems exhibit persistent failures in repository setup, dependency handling, permission gating, and RTL/hardware verification. It proposes the Agentic Agile-V framework, which uses an Agile-V lifecycle backbone together with a task-level SCOPE-V loop (Specify, Constrain, Orchestrate, Prove, Evolve, Verify), a conversation-to-contract gate, a taxonomy of minimum input artifacts, risk-adaptive workflows, and an evidence-bundle acceptance model. The central claim is that agentic AI does not reduce the need for engineering discipline but instead increases the value of requirements, constraints, traceability, independent verification, and human approval.

Significance. If the proposed mechanisms can be shown to address the cited failure modes, the work would supply a structured process model for reliable integration of agentic tools into software and hardware engineering. The synthesis of heterogeneous evidence (controlled trials, meta-analyses, and domain benchmarks) and the explicit taxonomy of artifacts plus evidence-bundle model are constructive contributions that could guide both practitioners and future empirical studies, even without new validation data in the present manuscript.

major comments (1)

[SCOPE-V loop description and failure-mode synthesis] The assertion that the SCOPE-V loop systematically converts documented agent failures (repository setup, dependency handling, hardware/RTL verification) into reliable outcomes lacks any concrete mapping or worked example. No trace is supplied showing how a specific step such as 'Constrain' or 'Prove' would detect or prevent, for example, a missing dependency manifest or an RTL timing violation that current agents miss. This mapping is load-bearing for the claim that the framework reliably mitigates the limitations identified in the productivity and verification literature.

minor comments (2)

[Figures and workflow diagrams] The high-level workflow diagrams would be clearer if each SCOPE-V step were explicitly annotated with the failure modes it is intended to address.
[Introduction] The distinction between the literature synthesis and the novel framework elements could be stated more explicitly in the introduction to help readers separate established findings from the proposal.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We are encouraged by the recognition of the evidence synthesis, the taxonomy of artifacts, and the evidence-bundle model as potential contributions. We respond point-by-point to the major comment below and outline the revisions we will make.

read point-by-point responses

Referee: The assertion that the SCOPE-V loop systematically converts documented agent failures (repository setup, dependency handling, hardware/RTL verification) into reliable outcomes lacks any concrete mapping or worked example. No trace is supplied showing how a specific step such as 'Constrain' or 'Prove' would detect or prevent, for example, a missing dependency manifest or an RTL timing violation that current agents miss. This mapping is load-bearing for the claim that the framework reliably mitigates the limitations identified in the productivity and verification literature.

Authors: We agree that the manuscript presents the SCOPE-V loop primarily at the level of process definition and does not supply an explicit step-by-step trace or worked example that maps individual steps (e.g., Constrain or Prove) onto the concrete failure modes cited in the literature review. The framework is intended to address these issues by requiring explicit constraints on input artifacts (including dependency manifests and timing specifications) and by inserting independent proof and verification gates before acceptance; however, the current text leaves the operationalization implicit. In the revised manuscript we will add a new illustrative subsection containing two concise worked examples: one tracing a repository-setup and dependency-handling failure through the full SCOPE-V sequence, and one tracing an RTL timing violation. Each example will show the specific artifacts generated or checked at the Constrain, Prove, and Verify steps and how these steps differ from unaided agent behavior. This addition will be placed in Section 4 or as a new Figure 3 to make the mitigation claim more traceable without requiring new empirical data. revision: yes

Circularity Check

0 steps flagged

Proposal framework synthesizes external evidence without self-referential reduction or definitional loops

full rationale

The manuscript is a conceptual process proposal that synthesizes cited studies on agentic productivity, repository-level failures, and hardware verification to motivate Agentic Agile-V and the SCOPE-V loop. No equations, fitted parameters, or self-definitional constructs appear that would reduce the proposed conversation-to-contract gate, taxonomy of artifacts, or evidence-bundle model back to the input failure modes by construction. The central claim draws on external benchmarks rather than load-bearing self-citations or ansatz smuggling; the derivation chain is therefore self-contained against the referenced evidence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework rests on domain assumptions about agent limitations drawn from cited studies and introduces new process constructs without independent falsifiable evidence.

axioms (1)

domain assumption Current agentic AI systems exhibit persistent failures in repository setup, dependency handling, permission gating, and hardware verification.
Stated directly in the abstract as the basis for shifting focus from prompt engineering to process control.

invented entities (2)

Agentic Agile-V no independent evidence
purpose: Process framework combining Agile-V lifecycle with SCOPE-V loop for agentic engineering.
Newly proposed construct in this paper with no external validation provided.
SCOPE-V loop no independent evidence
purpose: Task-level cycle of Specify, Constrain, Orchestrate, Prove, Evolve, Verify.
Invented acronym and sequence presented as the core mechanism.

pith-pipeline@v0.9.0 · 5810 in / 1477 out tokens · 32299 ms · 2026-05-21T06:48:09.746540+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SCOPE-V micro-cycle: Specify (task brief with acceptance criteria), Constrain (boundaries, no new dependencies), Orchestrate (inspect then plan), Prove (unit tests, simulation, formal checks), Evolve, Verify.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Evidence-bundle acceptance model and risk-adaptive gates (R0–R3) requiring traceable requirements, simulation logs, HIL evidence.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 9 internal anchors

[1]

Agile V: A Compliance-Ready Frame- work for AI-Augmented Engineering – From Concept to Audit-Ready Delivery,

C. Koch and J. A. Wellbrock, “Agile V: A Compliance-Ready Frame- work for AI-Augmented Engineering – From Concept to Audit-Ready Delivery,” arXiv:2602.20684, 2026. [Online]. Available: https://arxiv. org/abs/2602.20684

work page arXiv 2026
[2]

Large Language Model-Based Agents for Software Engineering: A Survey

J. Liu et al., “Large Language Model-Based Agents for Software Engineering: A Survey,” arXiv:2409.02977, 2024. [Online]. Available: https://arxiv.org/abs/2409.02977

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

A Survey on Code Generation with LLM-based Agents

Y . Dong et al., “A Survey on Code Generation with LLM-based Agents,” arXiv:2508.00083, 2025. [Online]. Available: https://arxiv.org/abs/2508. 00083

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

How much does AI impact development speed? An enterprise-based randomized controlled trial,

E. Paradis et al., “How much does AI impact development speed? An enterprise-based randomized controlled trial,” arXiv:2410.12944, 2024. [Online]. Available: https://arxiv.org/abs/2410.12944

work page arXiv 2024
[5]

Measuring the impact of early-2025 AI on experienced open-source developer productivity,

J. Becker, N. Rush, E. Barnes, and D. Rein, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity,” arXiv:2507.09089, 2025. [Online]. Available: https://arxiv.org/abs/2507. 09089

work page arXiv 2025
[6]

A meta-analysis of the effect of generative AI on productivity and learning in programming

S. Maier, M. Gunzenhaeuser, J. Schweisthal, M. Schneider, and S. Feuerriegel, “A meta-analysis of the effect of generative AI on produc- tivity and learning in programming,” arXiv:2605.04779, 2026. [Online]. Available: https://arxiv.org/abs/2605.04779

work page internal anchor Pith review Pith/arXiv arXiv 2026
[7]

The impact of LLM-assistants on software developer productivity: A systematic review and mapping study,

A. Mohamed, M. Assi, and M. Guizani, “The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review,” arXiv:2507.03156, 2025. [Online]. Available: https://arxiv.org/abs/2507. 03156

work page arXiv 2025
[8]

AIDev: Studying AI Coding Agents on GitHub,

H. Li, H. Zhang, and A. E. Hassan, “AIDev: Studying AI Coding Agents on GitHub,” arXiv:2602.09185, 2026. [Online]. Available: https://arxiv. org/abs/2602.09185

work page arXiv 2026
[9]

Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance

G. Pinna, J. Gong, D. Williams, and F. Sarro, “Comparing AI Cod- ing Agents: A Task-Stratified Analysis of Pull Request Acceptance,” arXiv:2602.08915, 2026. [Online]. Available: https://arxiv.org/abs/2602. 08915

work page internal anchor Pith review Pith/arXiv arXiv 2026
[10]

On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub,

M. Watanabe, H. Li, Y . Kashiwa, B. Reid, H. Iida, and A. E. Hassan, “On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub,” arXiv:2509.14745, 2025. [Online]. Available: https://arxiv. org/abs/2509.14745

work page arXiv 2025
[11]

Configuring Agentic AI Coding Tools: An Exploratory Study

M. Galster, S. Mohsenimofidi, J. L. Lulla, M. A. Abubakar, C. Treude, and S. Baltes, “Configuring Agentic AI Coding Tools: An Exploratory Study,” arXiv:2602.14690, 2026. [Online]. Available: https://arxiv.org/ abs/2602.14690

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents,

J. L. Lulla, S. Mohsenimofidi, M. Galster, J. M. Zhang, S. Baltes, and C. Treude, “On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents,” arXiv:2601.20404, 2026. [Online]. Available: https: //arxiv.org/abs/2601.20404

work page arXiv 2026
[13]

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

T. Gloaguen, N. Muendler, M. Mueller, V . Raychev, and M. Vechev, “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” arXiv:2602.11988, 2026. [Online]. Available: https: //arxiv.org/abs/2602.11988

work page arXiv 2026
[14]

Instruction Adherence in Coding Agent Configuration Files: A Factorial Study of Four File-Structure Variables

D. McMillan, “Instruction Adherence in Coding Agent Configu- ration Files: A Factorial Study of Four File-Structure Variables,” arXiv:2605.10039, 2026. [Online]. Available: https://arxiv.org/abs/2605. 10039

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging,

Z. Ni et al., “GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging,” arXiv:2508.18993, 2025. [Online]. Available: https://arxiv.org/abs/2508. 18993

work page arXiv 2025
[16]

RepoMaster: Autonomous Exploration and Un- derstanding of GitHub Repositories for Complex Task Solving,

H. Wang et al., “RepoMaster: Autonomous Exploration and Un- derstanding of GitHub Repositories for Complex Task Solving,” arXiv:2505.21577, 2025. [Online]. Available: https://arxiv.org/abs/2505. 21577

work page arXiv 2025
[17]

SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

T. Han et al., “SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?” arXiv:2603.15401, 2026. [Online]. Available: https://arxiv.org/abs/2603.15401

work page arXiv 2026
[18]

SWE- rebench V2: Language-Agnostic SWE Task Collection at Scale,

I. Badertdinov, M. Nekrashevich, A. Shevtsov, and A. Golubev, “SWE- rebench V2: Language-Agnostic SWE Task Collection at Scale,” arXiv:2602.23866, 2026. [Online]. Available: https://arxiv.org/abs/2602. 23866

work page arXiv 2026
[19]

Agentic Software Issue Resolution with Large Language Models: A Survey,

Z. Jiang, D. Lo, and Z. Liu, “Agentic Software Issue Resolution with Large Language Models: A Survey,” arXiv:2512.22256, 2025. [Online]. Available: https://arxiv.org/abs/2512.22256

work page arXiv 2025
[21]

Available: https://arxiv.org/abs/2601.11655

[Online]. Available: https://arxiv.org/abs/2601.11655

work page arXiv
[23]

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

[Online]. Available: https://arxiv.org/abs/2511.03690

work page internal anchor Pith review Pith/arXiv arXiv
[24]

GitHub’s new AI coding agent can fix bugs for you,

T. Warren, “GitHub’s new AI coding agent can fix bugs for you,” The Verge, 2025. [Online]. Available: https://www.theverge.com/news/ 669339/github-ai-coding-agent-fix-bugs

work page 2025
[25]

GitHub is launching a hub for multiple AI coding agents,

T. Warren, “GitHub is launching a hub for multiple AI coding agents,” The Verge, 2025. [Online]. Available: https://www.theverge.com/news/ 808032/github-ai-agent-hq-coding-openai-anthropic

work page 2025
[26]

Google Antigravity is an agent-first coding tool built for Gemini 3,

D. Preston, “Google Antigravity is an agent-first coding tool built for Gemini 3,” The Verge, 2025. [Online]. Available: https://www.theverge. com/news/822833/google-antigravity-ide-coding-agent-gemini-3-pro

work page 2025
[27]

OpenAI, Anthropic, and Block Are Team- ing Up to Make AI Agents Play Nice,

S. Levy, “OpenAI, Anthropic, and Block Are Team- ing Up to Make AI Agents Play Nice,” Wired,

work page
[28]

Available: https://www.wired.com/story/ openai-anthropic-and-block-are-teaming-up-on-ai-agent-standards

[Online]. Available: https://www.wired.com/story/ openai-anthropic-and-block-are-teaming-up-on-ai-agent-standards

work page
[29]

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

Z. Ji, Z. Li, W. Jiang, Y . Gao, and S. Wang, “Measuring the Permis- sion Gate: A Stress-Test Evaluation of Claude Code’s Auto Mode,” arXiv:2604.04978, 2026. [Online]. Available: https://arxiv.org/abs/2604. 04978

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs,

P. Jin et al., “RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs,” arXiv:2507.16200, 2025. [Online]. Available: https://arxiv.org/abs/2507.16200

work page arXiv 2025
[31]

FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification,

G.-W. Wan et al., “FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification,” arXiv:2507.04276, 2025. [Online]. Available: https://arxiv.org/abs/2507.04276

work page arXiv 2025
[32]

A Survey of Research in Large Language Models for Electronic Design Automation,

J. Pan, G. Zhou, C.-C. Chang, I. Jacobson, J. Hu, and Y . Chen, “A Survey of Research in Large Language Models for Electronic Design Automation,” arXiv:2501.09655, 2025. [Online]. Available: https://arxiv. org/abs/2501.09655

work page arXiv 2025
[33]

A Survey: Collaborative Hardware and Software De- sign in the Era of Large Language Models,

C. Guo et al., “A Survey: Collaborative Hardware and Software De- sign in the Era of Large Language Models,” arXiv:2410.07265, 2024. [Online]. Available: https://arxiv.org/abs/2410.07265

work page arXiv 2024
[34]

Revolution or Hype? Seeking the Limits of Large Models in Hardware Design,

Q. Xu, L. Stok, R. Drechsler, X. Wang, G. L. Zhang, and I. L. Markov, “Revolution or Hype? Seeking the Limits of Large Models in Hardware Design,” arXiv:2509.04905, 2025. [Online]. Available: https://arxiv.org/ abs/2509.04905

work page arXiv 2025
[35]

AI-assisted Programming May Decrease the Productiv- ity of Experienced Developers by Increasing Maintenance Burden,

F. Xu, P. K. Medappa, M. M. Tunc, M. Vroegindeweij, and J. C. Fransoo, “AI-assisted Programming May Decrease the Productiv- ity of Experienced Developers by Increasing Maintenance Burden,” arXiv:2510.10165, 2025. [Online]. Available: https://arxiv.org/abs/2510. 10165

work page arXiv 2025
[36]

Rethinking Software Engineering for Agentic AI Systems

M. Alenezi, “Rethinking Software Engineering for Agentic AI Systems,” arXiv:2604.10599, 2026. [Online]. Available: https://arxiv.org/abs/2604. 10599

work page internal anchor Pith review Pith/arXiv arXiv 2026

[1] [1]

Agile V: A Compliance-Ready Frame- work for AI-Augmented Engineering – From Concept to Audit-Ready Delivery,

C. Koch and J. A. Wellbrock, “Agile V: A Compliance-Ready Frame- work for AI-Augmented Engineering – From Concept to Audit-Ready Delivery,” arXiv:2602.20684, 2026. [Online]. Available: https://arxiv. org/abs/2602.20684

work page arXiv 2026

[2] [2]

Large Language Model-Based Agents for Software Engineering: A Survey

J. Liu et al., “Large Language Model-Based Agents for Software Engineering: A Survey,” arXiv:2409.02977, 2024. [Online]. Available: https://arxiv.org/abs/2409.02977

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

A Survey on Code Generation with LLM-based Agents

Y . Dong et al., “A Survey on Code Generation with LLM-based Agents,” arXiv:2508.00083, 2025. [Online]. Available: https://arxiv.org/abs/2508. 00083

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

How much does AI impact development speed? An enterprise-based randomized controlled trial,

E. Paradis et al., “How much does AI impact development speed? An enterprise-based randomized controlled trial,” arXiv:2410.12944, 2024. [Online]. Available: https://arxiv.org/abs/2410.12944

work page arXiv 2024

[5] [5]

Measuring the impact of early-2025 AI on experienced open-source developer productivity,

J. Becker, N. Rush, E. Barnes, and D. Rein, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity,” arXiv:2507.09089, 2025. [Online]. Available: https://arxiv.org/abs/2507. 09089

work page arXiv 2025

[6] [6]

A meta-analysis of the effect of generative AI on productivity and learning in programming

S. Maier, M. Gunzenhaeuser, J. Schweisthal, M. Schneider, and S. Feuerriegel, “A meta-analysis of the effect of generative AI on produc- tivity and learning in programming,” arXiv:2605.04779, 2026. [Online]. Available: https://arxiv.org/abs/2605.04779

work page internal anchor Pith review Pith/arXiv arXiv 2026

[7] [7]

The impact of LLM-assistants on software developer productivity: A systematic review and mapping study,

A. Mohamed, M. Assi, and M. Guizani, “The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review,” arXiv:2507.03156, 2025. [Online]. Available: https://arxiv.org/abs/2507. 03156

work page arXiv 2025

[8] [8]

AIDev: Studying AI Coding Agents on GitHub,

H. Li, H. Zhang, and A. E. Hassan, “AIDev: Studying AI Coding Agents on GitHub,” arXiv:2602.09185, 2026. [Online]. Available: https://arxiv. org/abs/2602.09185

work page arXiv 2026

[9] [9]

Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance

G. Pinna, J. Gong, D. Williams, and F. Sarro, “Comparing AI Cod- ing Agents: A Task-Stratified Analysis of Pull Request Acceptance,” arXiv:2602.08915, 2026. [Online]. Available: https://arxiv.org/abs/2602. 08915

work page internal anchor Pith review Pith/arXiv arXiv 2026

[10] [10]

On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub,

M. Watanabe, H. Li, Y . Kashiwa, B. Reid, H. Iida, and A. E. Hassan, “On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub,” arXiv:2509.14745, 2025. [Online]. Available: https://arxiv. org/abs/2509.14745

work page arXiv 2025

[11] [11]

Configuring Agentic AI Coding Tools: An Exploratory Study

M. Galster, S. Mohsenimofidi, J. L. Lulla, M. A. Abubakar, C. Treude, and S. Baltes, “Configuring Agentic AI Coding Tools: An Exploratory Study,” arXiv:2602.14690, 2026. [Online]. Available: https://arxiv.org/ abs/2602.14690

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents,

J. L. Lulla, S. Mohsenimofidi, M. Galster, J. M. Zhang, S. Baltes, and C. Treude, “On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents,” arXiv:2601.20404, 2026. [Online]. Available: https: //arxiv.org/abs/2601.20404

work page arXiv 2026

[13] [13]

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

T. Gloaguen, N. Muendler, M. Mueller, V . Raychev, and M. Vechev, “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” arXiv:2602.11988, 2026. [Online]. Available: https: //arxiv.org/abs/2602.11988

work page arXiv 2026

[14] [14]

Instruction Adherence in Coding Agent Configuration Files: A Factorial Study of Four File-Structure Variables

D. McMillan, “Instruction Adherence in Coding Agent Configu- ration Files: A Factorial Study of Four File-Structure Variables,” arXiv:2605.10039, 2026. [Online]. Available: https://arxiv.org/abs/2605. 10039

work page internal anchor Pith review Pith/arXiv arXiv 2026

[15] [15]

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging,

Z. Ni et al., “GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging,” arXiv:2508.18993, 2025. [Online]. Available: https://arxiv.org/abs/2508. 18993

work page arXiv 2025

[16] [16]

RepoMaster: Autonomous Exploration and Un- derstanding of GitHub Repositories for Complex Task Solving,

H. Wang et al., “RepoMaster: Autonomous Exploration and Un- derstanding of GitHub Repositories for Complex Task Solving,” arXiv:2505.21577, 2025. [Online]. Available: https://arxiv.org/abs/2505. 21577

work page arXiv 2025

[17] [17]

SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

T. Han et al., “SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?” arXiv:2603.15401, 2026. [Online]. Available: https://arxiv.org/abs/2603.15401

work page arXiv 2026

[18] [18]

SWE- rebench V2: Language-Agnostic SWE Task Collection at Scale,

I. Badertdinov, M. Nekrashevich, A. Shevtsov, and A. Golubev, “SWE- rebench V2: Language-Agnostic SWE Task Collection at Scale,” arXiv:2602.23866, 2026. [Online]. Available: https://arxiv.org/abs/2602. 23866

work page arXiv 2026

[19] [19]

Agentic Software Issue Resolution with Large Language Models: A Survey,

Z. Jiang, D. Lo, and Z. Liu, “Agentic Software Issue Resolution with Large Language Models: A Survey,” arXiv:2512.22256, 2025. [Online]. Available: https://arxiv.org/abs/2512.22256

work page arXiv 2025

[20] [21]

Available: https://arxiv.org/abs/2601.11655

[Online]. Available: https://arxiv.org/abs/2601.11655

work page arXiv

[21] [23]

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

[Online]. Available: https://arxiv.org/abs/2511.03690

work page internal anchor Pith review Pith/arXiv arXiv

[22] [24]

GitHub’s new AI coding agent can fix bugs for you,

T. Warren, “GitHub’s new AI coding agent can fix bugs for you,” The Verge, 2025. [Online]. Available: https://www.theverge.com/news/ 669339/github-ai-coding-agent-fix-bugs

work page 2025

[23] [25]

GitHub is launching a hub for multiple AI coding agents,

T. Warren, “GitHub is launching a hub for multiple AI coding agents,” The Verge, 2025. [Online]. Available: https://www.theverge.com/news/ 808032/github-ai-agent-hq-coding-openai-anthropic

work page 2025

[24] [26]

Google Antigravity is an agent-first coding tool built for Gemini 3,

D. Preston, “Google Antigravity is an agent-first coding tool built for Gemini 3,” The Verge, 2025. [Online]. Available: https://www.theverge. com/news/822833/google-antigravity-ide-coding-agent-gemini-3-pro

work page 2025

[25] [27]

OpenAI, Anthropic, and Block Are Team- ing Up to Make AI Agents Play Nice,

S. Levy, “OpenAI, Anthropic, and Block Are Team- ing Up to Make AI Agents Play Nice,” Wired,

work page

[26] [28]

Available: https://www.wired.com/story/ openai-anthropic-and-block-are-teaming-up-on-ai-agent-standards

[Online]. Available: https://www.wired.com/story/ openai-anthropic-and-block-are-teaming-up-on-ai-agent-standards

work page

[27] [29]

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

Z. Ji, Z. Li, W. Jiang, Y . Gao, and S. Wang, “Measuring the Permis- sion Gate: A Stress-Test Evaluation of Claude Code’s Auto Mode,” arXiv:2604.04978, 2026. [Online]. Available: https://arxiv.org/abs/2604. 04978

work page internal anchor Pith review Pith/arXiv arXiv 2026

[28] [30]

RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs,

P. Jin et al., “RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs,” arXiv:2507.16200, 2025. [Online]. Available: https://arxiv.org/abs/2507.16200

work page arXiv 2025

[29] [31]

FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification,

G.-W. Wan et al., “FIXME: Towards End-to-End Benchmarking of LLM-Aided Design Verification,” arXiv:2507.04276, 2025. [Online]. Available: https://arxiv.org/abs/2507.04276

work page arXiv 2025

[30] [32]

A Survey of Research in Large Language Models for Electronic Design Automation,

J. Pan, G. Zhou, C.-C. Chang, I. Jacobson, J. Hu, and Y . Chen, “A Survey of Research in Large Language Models for Electronic Design Automation,” arXiv:2501.09655, 2025. [Online]. Available: https://arxiv. org/abs/2501.09655

work page arXiv 2025

[31] [33]

A Survey: Collaborative Hardware and Software De- sign in the Era of Large Language Models,

C. Guo et al., “A Survey: Collaborative Hardware and Software De- sign in the Era of Large Language Models,” arXiv:2410.07265, 2024. [Online]. Available: https://arxiv.org/abs/2410.07265

work page arXiv 2024

[32] [34]

Revolution or Hype? Seeking the Limits of Large Models in Hardware Design,

Q. Xu, L. Stok, R. Drechsler, X. Wang, G. L. Zhang, and I. L. Markov, “Revolution or Hype? Seeking the Limits of Large Models in Hardware Design,” arXiv:2509.04905, 2025. [Online]. Available: https://arxiv.org/ abs/2509.04905

work page arXiv 2025

[33] [35]

AI-assisted Programming May Decrease the Productiv- ity of Experienced Developers by Increasing Maintenance Burden,

F. Xu, P. K. Medappa, M. M. Tunc, M. Vroegindeweij, and J. C. Fransoo, “AI-assisted Programming May Decrease the Productiv- ity of Experienced Developers by Increasing Maintenance Burden,” arXiv:2510.10165, 2025. [Online]. Available: https://arxiv.org/abs/2510. 10165

work page arXiv 2025

[34] [36]

Rethinking Software Engineering for Agentic AI Systems

M. Alenezi, “Rethinking Software Engineering for Agentic AI Systems,” arXiv:2604.10599, 2026. [Online]. Available: https://arxiv.org/abs/2604. 10599

work page internal anchor Pith review Pith/arXiv arXiv 2026