The Impact of Configuring Agentic AI Coding Tools on Build-vs-Buy Decisions: A Study Protocol

Christoph Treude; Jai Lal Lulla; Jie M. Zhang; Matthias Galster; Sebastian Baltes

arxiv: 2606.03907 · v1 · pith:QQKLYHTEnew · submitted 2026-06-02 · 💻 cs.SE · cs.AI· cs.HC

The Impact of Configuring Agentic AI Coding Tools on Build-vs-Buy Decisions: A Study Protocol

Jai Lal Lulla , Matthias Galster , Jie M. Zhang , Sebastian Baltes , Christoph Treude This is my paper

Pith reviewed 2026-06-28 08:40 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.HC

keywords agentic AI coding toolsbuild-versus-buy decisionsconfiguration mechanismslibrary selectionsoftware securitydisclosure accuracybenchmark protocolClaude Code

0 comments

The pith

Configuration mechanisms supplied to agentic AI coding tools measurably change whether the tools build functionality from scratch or import external libraries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a pre-registered experimental protocol to determine which ways of configuring agentic AI coding tools most strongly affect their build-versus-buy decisions. These decisions matter because they shape software security, licensing compliance, performance, and long-term maintainability. The protocol runs controlled tasks on staged benchmark projects that contain clear build-versus-buy points, using tools such as Claude Code and OpenAI Codex. Configurations are varied from none at all to context files with preferences or prohibitions, autonomously discoverable Skills, MCP-enabled discovery tools, and permission controls. Nine hypotheses guide the collection of data on selected libraries and the completeness and accuracy of any disclosures about them.

Core claim

The authors establish a reusable benchmark and analysis pipeline that manipulates configuration mechanisms supplied to agentic AI coding tools and measures resulting changes in library selection, disclosure completeness, and disclosure accuracy across identifiable build-versus-buy decision points in staged projects.

What carries the argument

The configuration mechanisms, including context files, Skills, MCP-enabled discovery, and permission controls, which are supplied to the agentic AI coding tools to alter their autonomous decisions on whether to implement code or import libraries.

If this is right

Context files with explicit prohibitions will reduce unwanted library imports more than soft preferences alone.
Permission controls will produce more accurate disclosures of newly introduced libraries than discovery tools or Skills.
The released benchmark dataset will allow direct comparison of build-versus-buy behavior across additional agentic AI coding tools.
Results will identify which configuration types most reliably steer tools toward preferred library choices or custom implementations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers could use the most effective configurations as defaults in future agentic tools to reduce unintended dependency risks.
The protocol's design could be adapted to test how the same configurations affect other autonomous decisions such as security scanning or test generation.
If real projects contain more ambiguous decision points than the staged benchmarks, the measured effects might be smaller outside controlled settings.

Load-bearing premise

The staged benchmark projects contain identifiable build-versus-buy decision points that represent the situations agentic AI tools will encounter in real development workflows.

What would settle it

Executing the protocol and observing no statistically significant differences in library selection rates or disclosure accuracy across the tested configuration conditions would show that the mechanisms do not alter the decisions.

read the original abstract

Agentic AI coding tools write code with increasing autonomy and in doing so decide when to import a library and when to implement functionality from scratch. These decisions, whether to build functionality from scratch or buy into an external library, hereafter build-versus-buy, carry direct consequences for software security, licensing compliance, performance, and long-term maintainability. Yet no controlled experimental study has examined what governs build-versus-buy decisions in agentic AI coding tools. Configuration mechanisms, i.e., the means by which developers tailor agentic AI coding tool behavior to a project or workflow, are one of the primary means by which practitioners can influence these decisions. However, it is unclear which configuration mechanisms influence build-versus-buy decisions most effectively. We present a pre-registered protocol to study how configuration mechanisms alter build-versus-buy behavior in two popular agentic AI coding tools: Claude Code and OpenAI Codex. We will execute controlled programming tasks drawn from a benchmark of staged projects, each constructed around identifiable build-versus-buy points, and will manipulate the configuration supplied to each tool, ranging from no configuration, through context files with soft preferences and explicit prohibitions, to Skills (instructions that can be autonomously discovered), MCP-enabled library discovery tools, and permission controls, measuring which libraries the tool selects, whether it discloses newly introduced libraries, and whether those disclosures are complete and accurate. Nine pre-registered hypotheses structure the protocol. The resulting benchmark dataset and analysis pipeline will be released as a reusable artifact for evaluating build-versus-buy behavior in agentic AI coding tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a pre-registered protocol paper laying out an experiment on configuration effects in agentic coding tools, with no data or results included.

read the letter

The paper describes a controlled study plan to test whether different configuration approaches change how tools like Claude Code and OpenAI Codex decide between writing code from scratch or importing libraries. It focuses on nine hypotheses around context files, Skills, MCP discovery, and permission controls, with measurements for library selection and disclosure accuracy.

The protocol stands out for its pre-registration and commitment to releasing the benchmark dataset plus analysis pipeline. That setup targets a practical gap in how developers steer agent behavior on security and compliance issues. The choice of two real tools and the range of configurations tested gives the design some grounding in current practice.

The main limitation is that everything rests on the staged benchmark projects. The abstract says the tasks are built around identifiable decision points, but it gives no detail on how those points were chosen or validated against actual developer trade-offs in licensing, performance, or maintenance. If the points feel artificial, any measured effects on selection or disclosure may not carry over. The protocol also leaves open how disclosure accuracy will be scored when tools produce partial or ambiguous outputs.

This work is aimed at empirical software engineering researchers who study AI-assisted development. Someone running similar experiments or building evaluation benchmarks could use the released artifacts directly.

The paper shows clear thinking on the experimental structure and literature connections. It deserves peer review so reviewers can flag issues in task design and analysis before the study runs.

Referee Report

1 major / 0 minor

Summary. The manuscript presents a pre-registered protocol for a controlled study examining how configuration mechanisms (context files, Skills, MCP-enabled discovery, permission controls) affect build-versus-buy decisions in agentic AI coding tools (Claude Code and OpenAI Codex). Tasks are drawn from a benchmark of staged projects constructed around identifiable decision points; outcomes include library selection, disclosure presence, and disclosure accuracy. The protocol is organized around nine pre-registered hypotheses and includes plans to release the benchmark dataset and analysis pipeline as a reusable artifact.

Significance. If executed, the study would supply the first controlled evidence on configuration effects on autonomous library decisions in AI coding agents, with implications for security, licensing, and maintainability. Pre-registration of hypotheses, explicit manipulation of configuration mechanisms, and commitment to artifact release are strengths that support reproducibility and cumulative research.

major comments (1)

[Abstract / benchmark construction] Abstract (benchmark construction paragraph): the protocol states that projects are 'constructed around identifiable build-versus-buy points' but supplies no method, criteria, or validation procedure (e.g., expert review, comparison to real project histories, or pilot testing) to confirm these points reflect authentic trade-offs in security, licensing, performance, or maintainability rather than artificial insertions. This assumption is load-bearing for the claim that measured changes in library selection and disclosure will generalize beyond the staged tasks.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on the protocol. The single major comment raises a valid point about the level of detail provided on benchmark construction. We respond point-by-point below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract / benchmark construction] Abstract (benchmark construction paragraph): the protocol states that projects are 'constructed around identifiable build-versus-buy points' but supplies no method, criteria, or validation procedure (e.g., expert review, comparison to real project histories, or pilot testing) to confirm these points reflect authentic trade-offs in security, licensing, performance, or maintainability rather than artificial insertions. This assumption is load-bearing for the claim that measured changes in library selection and disclosure will generalize beyond the staged tasks.

Authors: We agree that the abstract paragraph is brief and does not enumerate the construction method, criteria, or validation steps. The full protocol manuscript contains a methods subsection on benchmark construction that defines the decision-point identification process (drawing from recurring patterns in open-source repositories concerning security, licensing, and maintainability). However, the referee is correct that explicit validation procedures are not described. We will revise the manuscript to add a dedicated paragraph on construction criteria, pilot testing, and planned expert review; we will also insert a concise summary of these procedures into the abstract. These changes will be made in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: forward-looking protocol with no derivations or fitted results

full rationale

This document is a pre-registered study protocol that proposes future experiments on configuration effects in agentic AI tools. It states nine hypotheses but performs no derivations, fits no parameters, makes no predictions that reduce to author-defined quantities, and invokes no self-citations as load-bearing uniqueness theorems or ansatzes. The benchmark is described as 'staged' and 'constructed around identifiable build-versus-buy points' without claiming that representativeness follows from prior author results by construction. All claims remain testable hypotheses rather than self-referential outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The protocol rests on domain assumptions about the controllability of agentic tool behavior and the representativeness of constructed tasks; no free parameters or invented entities are introduced because no quantitative model or derivation is present.

axioms (2)

domain assumption Build-versus-buy decisions made by agentic AI coding tools have direct consequences for software security, licensing, performance, and maintainability.
This premise justifies the entire study and is stated in the opening sentences of the abstract.
domain assumption The listed configuration mechanisms (context files, Skills, MCP tools, permission controls) can be supplied to the target tools in a controlled and comparable manner.
This assumption enables the experimental manipulation described in the protocol.

pith-pipeline@v0.9.1-grok · 5824 in / 1410 out tokens · 24098 ms · 2026-06-28T08:40:08.822167+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 5 canonical work pages

[1]

2026 , url =

Edwin Ong and Alex Vikati , title =. 2026 , url =

2026
[2]

Configuring Agentic

Galster, Matthias and Mohsenimofidi, Seyedmoein and Lulla, Jai Lal and Abubakar, Muhammad Auwal and Treude, Christoph and Baltes, Sebastian , booktitle =. Configuring Agentic. 2026 , url =

2026
[3]

German, Ali Ouni, Takashi Ishio, and Katsuro Inoue

Kula, Raula Gaikovina and German, Daniel M. and Ouni, Ali and Ishio, Takashi and Inoue, Katsuro , year=. Do developers update their library dependencies?: An empirical study on the impact of security advisories on library migration , volume=. Empirical Software Engineering , publisher=. doi:10.1007/s10664-017-9521-5 , number=

work page doi:10.1007/s10664-017-9521-5
[4]

Structure and Evolution of Package Dependency Networks , year=

Kikas, Riivo and Gousios, Georgios and Dumas, Marlon and Pfahl, Dietmar , booktitle=. Structure and Evolution of Package Dependency Networks , year=
[5]

A taxonomy of attacks on open-source software supply chains

Ladisa, Piergiorgio and Plate, Henrik and Martinez, Matias and Barais, Olivier , booktitle =. 2023 , volume =. doi:10.1109/SP46215.2023.10179304 , url =

work page doi:10.1109/sp46215.2023.10179304 2023
[6]

Claude Code , howpublished =
[7]

OpenAI Codex , howpublished =
[8]

Proceedings of the 1st Journal Ahead Workshop (JAWs) at the International Conference on Software Engineering (ICSE) , year=

On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents , author=. Proceedings of the 1st Journal Ahead Workshop (JAWs) at the International Conference on Software Engineering (ICSE) , year=
[9]

2024 , eprint=

Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects , author=. 2024 , eprint=

2024
[10]

experience: Evaluating the usability of code generation tools powered by large language models

Vaithilingam, Priyan and Zhang, Tianyi and Glassman, Elena L. , title =. Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems , articleno =. 2022 , isbn =. doi:10.1145/3491101.3519665 , abstract =

work page doi:10.1145/3491101.3519665 2022
[11]

Model Context Protocol , howpublished=
[12]

2024 , eprint=

Does Prompt Formatting Have Any Impact on LLM Performance? , author=. 2024 , eprint=

2024
[13]

GitHub Innovation Graph , howpublished=
[14]

Zhang and Mark Harman and Don Syme and Joost Noppen and Helen Yannakoudakis and Detlef Nauck , title =

Lukas Twist and Jie M. Zhang and Mark Harman and Don Syme and Joost Noppen and Helen Yannakoudakis and Detlef Nauck , title =. 2025 , eprint =

2025
[15]

Kazi Amit Hasan and Jerin Yasmin and Huizi Hao and Yuan Tian and Safwat Hassan and Steven H. H. Ding , title =. 22nd. 2025 , url =. doi:10.1109/MSR66628.2025.00065 , timestamp =

work page doi:10.1109/msr66628.2025.00065 2025
[16]

Proceedings of the 28th USENIX Conference on Security Symposium , pages =

Zimmermann, Markus and Staicu, Cristian-Alexandru and Tenny, Cam and Pradel, Michael , title =. Proceedings of the 28th USENIX Conference on Security Symposium , pages =. 2019 , isbn =

2019
[17]

2026 , eprint=

Agentic Much? Adoption of Coding Agents on GitHub , author=. 2026 , eprint=

2026
[18]

2026 , month =

Jude Gao , title =. 2026 , month =

2026
[19]

2006 , url=

Software Reuse and Commercial Off-the-Shelf Software , author=. 2006 , url=

2006
[20]

Computer , month = jan, pages =

Boehm, Barry and Abts, Chris , title =. Computer , month = jan, pages =. 1999 , issue_date =. doi:10.1109/2.738311 , abstract =

work page doi:10.1109/2.738311 1999
[21]

2026 , howpublished =

Agent Skills Overview , author =. 2026 , howpublished =

2026

[1] [1]

2026 , url =

Edwin Ong and Alex Vikati , title =. 2026 , url =

2026

[2] [2]

Configuring Agentic

Galster, Matthias and Mohsenimofidi, Seyedmoein and Lulla, Jai Lal and Abubakar, Muhammad Auwal and Treude, Christoph and Baltes, Sebastian , booktitle =. Configuring Agentic. 2026 , url =

2026

[3] [3]

German, Ali Ouni, Takashi Ishio, and Katsuro Inoue

Kula, Raula Gaikovina and German, Daniel M. and Ouni, Ali and Ishio, Takashi and Inoue, Katsuro , year=. Do developers update their library dependencies?: An empirical study on the impact of security advisories on library migration , volume=. Empirical Software Engineering , publisher=. doi:10.1007/s10664-017-9521-5 , number=

work page doi:10.1007/s10664-017-9521-5

[4] [4]

Structure and Evolution of Package Dependency Networks , year=

Kikas, Riivo and Gousios, Georgios and Dumas, Marlon and Pfahl, Dietmar , booktitle=. Structure and Evolution of Package Dependency Networks , year=

[5] [5]

A taxonomy of attacks on open-source software supply chains

Ladisa, Piergiorgio and Plate, Henrik and Martinez, Matias and Barais, Olivier , booktitle =. 2023 , volume =. doi:10.1109/SP46215.2023.10179304 , url =

work page doi:10.1109/sp46215.2023.10179304 2023

[6] [6]

Claude Code , howpublished =

[7] [7]

OpenAI Codex , howpublished =

[8] [8]

Proceedings of the 1st Journal Ahead Workshop (JAWs) at the International Conference on Software Engineering (ICSE) , year=

On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents , author=. Proceedings of the 1st Journal Ahead Workshop (JAWs) at the International Conference on Software Engineering (ICSE) , year=

[9] [9]

2024 , eprint=

Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects , author=. 2024 , eprint=

2024

[10] [10]

experience: Evaluating the usability of code generation tools powered by large language models

Vaithilingam, Priyan and Zhang, Tianyi and Glassman, Elena L. , title =. Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems , articleno =. 2022 , isbn =. doi:10.1145/3491101.3519665 , abstract =

work page doi:10.1145/3491101.3519665 2022

[11] [11]

Model Context Protocol , howpublished=

[12] [12]

2024 , eprint=

Does Prompt Formatting Have Any Impact on LLM Performance? , author=. 2024 , eprint=

2024

[13] [13]

GitHub Innovation Graph , howpublished=

[14] [14]

Zhang and Mark Harman and Don Syme and Joost Noppen and Helen Yannakoudakis and Detlef Nauck , title =

Lukas Twist and Jie M. Zhang and Mark Harman and Don Syme and Joost Noppen and Helen Yannakoudakis and Detlef Nauck , title =. 2025 , eprint =

2025

[15] [15]

Kazi Amit Hasan and Jerin Yasmin and Huizi Hao and Yuan Tian and Safwat Hassan and Steven H. H. Ding , title =. 22nd. 2025 , url =. doi:10.1109/MSR66628.2025.00065 , timestamp =

work page doi:10.1109/msr66628.2025.00065 2025

[16] [16]

Proceedings of the 28th USENIX Conference on Security Symposium , pages =

Zimmermann, Markus and Staicu, Cristian-Alexandru and Tenny, Cam and Pradel, Michael , title =. Proceedings of the 28th USENIX Conference on Security Symposium , pages =. 2019 , isbn =

2019

[17] [17]

2026 , eprint=

Agentic Much? Adoption of Coding Agents on GitHub , author=. 2026 , eprint=

2026

[18] [18]

2026 , month =

Jude Gao , title =. 2026 , month =

2026

[19] [19]

2006 , url=

Software Reuse and Commercial Off-the-Shelf Software , author=. 2006 , url=

2006

[20] [20]

Computer , month = jan, pages =

Boehm, Barry and Abts, Chris , title =. Computer , month = jan, pages =. 1999 , issue_date =. doi:10.1109/2.738311 , abstract =

work page doi:10.1109/2.738311 1999

[21] [21]

2026 , howpublished =

Agent Skills Overview , author =. 2026 , howpublished =

2026