The Impact of Configuring Agentic AI Coding Tools on Build-vs-Buy Decisions: A Study Protocol
Pith reviewed 2026-06-28 08:40 UTC · model grok-4.3
The pith
Configuration mechanisms supplied to agentic AI coding tools measurably change whether the tools build functionality from scratch or import external libraries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish a reusable benchmark and analysis pipeline that manipulates configuration mechanisms supplied to agentic AI coding tools and measures resulting changes in library selection, disclosure completeness, and disclosure accuracy across identifiable build-versus-buy decision points in staged projects.
What carries the argument
The configuration mechanisms, including context files, Skills, MCP-enabled discovery, and permission controls, which are supplied to the agentic AI coding tools to alter their autonomous decisions on whether to implement code or import libraries.
If this is right
- Context files with explicit prohibitions will reduce unwanted library imports more than soft preferences alone.
- Permission controls will produce more accurate disclosures of newly introduced libraries than discovery tools or Skills.
- The released benchmark dataset will allow direct comparison of build-versus-buy behavior across additional agentic AI coding tools.
- Results will identify which configuration types most reliably steer tools toward preferred library choices or custom implementations.
Where Pith is reading between the lines
- Developers could use the most effective configurations as defaults in future agentic tools to reduce unintended dependency risks.
- The protocol's design could be adapted to test how the same configurations affect other autonomous decisions such as security scanning or test generation.
- If real projects contain more ambiguous decision points than the staged benchmarks, the measured effects might be smaller outside controlled settings.
Load-bearing premise
The staged benchmark projects contain identifiable build-versus-buy decision points that represent the situations agentic AI tools will encounter in real development workflows.
What would settle it
Executing the protocol and observing no statistically significant differences in library selection rates or disclosure accuracy across the tested configuration conditions would show that the mechanisms do not alter the decisions.
read the original abstract
Agentic AI coding tools write code with increasing autonomy and in doing so decide when to import a library and when to implement functionality from scratch. These decisions, whether to build functionality from scratch or buy into an external library, hereafter build-versus-buy, carry direct consequences for software security, licensing compliance, performance, and long-term maintainability. Yet no controlled experimental study has examined what governs build-versus-buy decisions in agentic AI coding tools. Configuration mechanisms, i.e., the means by which developers tailor agentic AI coding tool behavior to a project or workflow, are one of the primary means by which practitioners can influence these decisions. However, it is unclear which configuration mechanisms influence build-versus-buy decisions most effectively. We present a pre-registered protocol to study how configuration mechanisms alter build-versus-buy behavior in two popular agentic AI coding tools: Claude Code and OpenAI Codex. We will execute controlled programming tasks drawn from a benchmark of staged projects, each constructed around identifiable build-versus-buy points, and will manipulate the configuration supplied to each tool, ranging from no configuration, through context files with soft preferences and explicit prohibitions, to Skills (instructions that can be autonomously discovered), MCP-enabled library discovery tools, and permission controls, measuring which libraries the tool selects, whether it discloses newly introduced libraries, and whether those disclosures are complete and accurate. Nine pre-registered hypotheses structure the protocol. The resulting benchmark dataset and analysis pipeline will be released as a reusable artifact for evaluating build-versus-buy behavior in agentic AI coding tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a pre-registered protocol for a controlled study examining how configuration mechanisms (context files, Skills, MCP-enabled discovery, permission controls) affect build-versus-buy decisions in agentic AI coding tools (Claude Code and OpenAI Codex). Tasks are drawn from a benchmark of staged projects constructed around identifiable decision points; outcomes include library selection, disclosure presence, and disclosure accuracy. The protocol is organized around nine pre-registered hypotheses and includes plans to release the benchmark dataset and analysis pipeline as a reusable artifact.
Significance. If executed, the study would supply the first controlled evidence on configuration effects on autonomous library decisions in AI coding agents, with implications for security, licensing, and maintainability. Pre-registration of hypotheses, explicit manipulation of configuration mechanisms, and commitment to artifact release are strengths that support reproducibility and cumulative research.
major comments (1)
- [Abstract / benchmark construction] Abstract (benchmark construction paragraph): the protocol states that projects are 'constructed around identifiable build-versus-buy points' but supplies no method, criteria, or validation procedure (e.g., expert review, comparison to real project histories, or pilot testing) to confirm these points reflect authentic trade-offs in security, licensing, performance, or maintainability rather than artificial insertions. This assumption is load-bearing for the claim that measured changes in library selection and disclosure will generalize beyond the staged tasks.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on the protocol. The single major comment raises a valid point about the level of detail provided on benchmark construction. We respond point-by-point below and commit to revisions that strengthen the manuscript without altering its core claims.
read point-by-point responses
-
Referee: [Abstract / benchmark construction] Abstract (benchmark construction paragraph): the protocol states that projects are 'constructed around identifiable build-versus-buy points' but supplies no method, criteria, or validation procedure (e.g., expert review, comparison to real project histories, or pilot testing) to confirm these points reflect authentic trade-offs in security, licensing, performance, or maintainability rather than artificial insertions. This assumption is load-bearing for the claim that measured changes in library selection and disclosure will generalize beyond the staged tasks.
Authors: We agree that the abstract paragraph is brief and does not enumerate the construction method, criteria, or validation steps. The full protocol manuscript contains a methods subsection on benchmark construction that defines the decision-point identification process (drawing from recurring patterns in open-source repositories concerning security, licensing, and maintainability). However, the referee is correct that explicit validation procedures are not described. We will revise the manuscript to add a dedicated paragraph on construction criteria, pilot testing, and planned expert review; we will also insert a concise summary of these procedures into the abstract. These changes will be made in the next version. revision: yes
Circularity Check
No circularity: forward-looking protocol with no derivations or fitted results
full rationale
This document is a pre-registered study protocol that proposes future experiments on configuration effects in agentic AI tools. It states nine hypotheses but performs no derivations, fits no parameters, makes no predictions that reduce to author-defined quantities, and invokes no self-citations as load-bearing uniqueness theorems or ansatzes. The benchmark is described as 'staged' and 'constructed around identifiable build-versus-buy points' without claiming that representativeness follows from prior author results by construction. All claims remain testable hypotheses rather than self-referential outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Build-versus-buy decisions made by agentic AI coding tools have direct consequences for software security, licensing, performance, and maintainability.
- domain assumption The listed configuration mechanisms (context files, Skills, MCP tools, permission controls) can be supplied to the target tools in a controlled and comparable manner.
Reference graph
Works this paper leans on
-
[1]
2026 , url =
Edwin Ong and Alex Vikati , title =. 2026 , url =
2026
-
[2]
Configuring Agentic
Galster, Matthias and Mohsenimofidi, Seyedmoein and Lulla, Jai Lal and Abubakar, Muhammad Auwal and Treude, Christoph and Baltes, Sebastian , booktitle =. Configuring Agentic. 2026 , url =
2026
-
[3]
German, Ali Ouni, Takashi Ishio, and Katsuro Inoue
Kula, Raula Gaikovina and German, Daniel M. and Ouni, Ali and Ishio, Takashi and Inoue, Katsuro , year=. Do developers update their library dependencies?: An empirical study on the impact of security advisories on library migration , volume=. Empirical Software Engineering , publisher=. doi:10.1007/s10664-017-9521-5 , number=
-
[4]
Structure and Evolution of Package Dependency Networks , year=
Kikas, Riivo and Gousios, Georgios and Dumas, Marlon and Pfahl, Dietmar , booktitle=. Structure and Evolution of Package Dependency Networks , year=
-
[5]
A taxonomy of attacks on open-source software supply chains
Ladisa, Piergiorgio and Plate, Henrik and Martinez, Matias and Barais, Olivier , booktitle =. 2023 , volume =. doi:10.1109/SP46215.2023.10179304 , url =
-
[6]
Claude Code , howpublished =
-
[7]
OpenAI Codex , howpublished =
-
[8]
Proceedings of the 1st Journal Ahead Workshop (JAWs) at the International Conference on Software Engineering (ICSE) , year=
On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents , author=. Proceedings of the 1st Journal Ahead Workshop (JAWs) at the International Conference on Software Engineering (ICSE) , year=
-
[9]
2024 , eprint=
Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects , author=. 2024 , eprint=
2024
-
[10]
experience: Evaluating the usability of code generation tools powered by large language models
Vaithilingam, Priyan and Zhang, Tianyi and Glassman, Elena L. , title =. Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems , articleno =. 2022 , isbn =. doi:10.1145/3491101.3519665 , abstract =
-
[11]
Model Context Protocol , howpublished=
-
[12]
2024 , eprint=
Does Prompt Formatting Have Any Impact on LLM Performance? , author=. 2024 , eprint=
2024
-
[13]
GitHub Innovation Graph , howpublished=
-
[14]
Zhang and Mark Harman and Don Syme and Joost Noppen and Helen Yannakoudakis and Detlef Nauck , title =
Lukas Twist and Jie M. Zhang and Mark Harman and Don Syme and Joost Noppen and Helen Yannakoudakis and Detlef Nauck , title =. 2025 , eprint =
2025
-
[15]
Kazi Amit Hasan and Jerin Yasmin and Huizi Hao and Yuan Tian and Safwat Hassan and Steven H. H. Ding , title =. 22nd. 2025 , url =. doi:10.1109/MSR66628.2025.00065 , timestamp =
-
[16]
Proceedings of the 28th USENIX Conference on Security Symposium , pages =
Zimmermann, Markus and Staicu, Cristian-Alexandru and Tenny, Cam and Pradel, Michael , title =. Proceedings of the 28th USENIX Conference on Security Symposium , pages =. 2019 , isbn =
2019
-
[17]
2026 , eprint=
Agentic Much? Adoption of Coding Agents on GitHub , author=. 2026 , eprint=
2026
-
[18]
2026 , month =
Jude Gao , title =. 2026 , month =
2026
-
[19]
2006 , url=
Software Reuse and Commercial Off-the-Shelf Software , author=. 2006 , url=
2006
-
[20]
Computer , month = jan, pages =
Boehm, Barry and Abts, Chris , title =. Computer , month = jan, pages =. 1999 , issue_date =. doi:10.1109/2.738311 , abstract =
-
[21]
2026 , howpublished =
Agent Skills Overview , author =. 2026 , howpublished =
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.