Recognition: 2 theorem links
· Lean TheoremA Dataset of Agentic AI Coding Tool Configurations
Pith reviewed 2026-05-12 01:26 UTC · model grok-4.3
The pith
This paper presents a large public dataset of configuration artifacts for agentic AI coding tools collected from thousands of open-source repositories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors have systematically identified and compiled 15,591 configuration artifacts along with the full content of 18,167 associated configuration files and 148,519 AI-co-authored commits from 4,738 open-source repositories using a pipeline of metadata filtering, GPT-based classification, and automated detection of configuration mechanisms across five AI coding tools.
What carries the argument
The systematic detection of repository-level configuration artifacts (Context Files, Skills, Rules, and Hooks) in actively maintained GitHub repositories, after filtering and classifying projects with GPT-5.2.
If this is right
- Researchers can study adoption patterns of different AI coding tools across software projects.
- The data supports analysis of context engineering practices for multi-step coding tasks.
- Insights into human-AI collaboration can be drawn from the co-authored commits linked to these configurations.
- The public availability allows replication and extension of studies on AI tool usage.
Where Pith is reading between the lines
- Developers might use the dataset to discover effective configuration strategies that improve AI tool performance.
- Tool builders could analyze common patterns to design better default configurations or interfaces.
- Future work might link specific configurations to code quality outcomes in the associated commits.
Load-bearing premise
That the combination of metadata filtering and GPT-5.2 classification reliably selects only engineered software projects and that the detection of configuration artifacts accurately captures them without major omissions or errors.
What would settle it
An independent manual review of a random sample of the 36,710 classified repositories that finds a substantial portion are not engineered software projects or that many configuration files were missed in the detection process.
Figures
read the original abstract
Agentic AI coding tools such as Claude Code and OpenAI Codex execute multi-step coding tasks with limited human oversight. To steer these tools, developers create repository-level configuration artifacts (e.g., Markdown files) for configuration mechanisms such as Context Files, Skills, Rules, and Hooks. There is no curated dataset yet that captures these configurations at scale. This dataset, collected from open-source GitHub repositories, fills that gap. We selected 40,585 actively maintained repositories through metadata filtering, classified them using GPT-5.2 to identify 36,710 as belonging to engineered software projects, and systematically detected configuration artifacts in these repositories. The dataset covers 4,738 repositories across five tools (Claude Code, GitHub Copilot, OpenAI Codex, Cursor, Gemini) and eight configuration mechanisms. We collected 15,591 configuration artifacts, the full content of 18,167 configuration files associated with these configuration artifacts, and 148,519 AI-co-authored commits. The dataset and the construction pipeline are publicly available on Zenodo under CC BY 4.0. An interactive website allows researchers to browse and explore the data. This data supports research on context engineering, AI tool adoption patterns, and human-AI collaboration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a dataset of configuration artifacts for agentic AI coding tools collected from open-source GitHub repositories. It describes selecting 40,585 actively maintained repositories via metadata filtering, using GPT-5.2 to classify 36,710 as engineered software projects, systematically detecting artifacts across five tools and eight mechanisms in 4,738 repositories, and releasing 15,591 artifacts, 18,167 associated configuration files, and 148,519 AI-co-authored commits. The full dataset, construction pipeline, and an interactive exploration website are made publicly available on Zenodo under CC BY 4.0.
Significance. If the collection pipeline is reliable, this would be a valuable contribution as the first large-scale curated dataset of repository-level configurations for AI coding assistants. It directly enables empirical research on context engineering, tool adoption patterns, and human-AI collaboration in software development. The public release, reproducibility of the pipeline, and interactive website are explicit strengths that lower barriers for follow-on work.
major comments (2)
- [§3] §3 (Data Collection Pipeline): The headline counts (40,585 repositories filtered, 36,710 labeled engineered, 4,738 containing artifacts, 15,591 artifacts collected) rest on two automated steps—metadata filtering plus GPT-5.2 binary classification and heuristic-based detection of the eight configuration mechanisms—yet no precision, recall, error rates, or manual validation results are reported for either step. This is load-bearing for the central claim that the released dataset accurately represents configurations in engineered projects.
- [§4] §4 (Artifact Detection and Collection): The systematic detection logic (file-name patterns, directory heuristics, content signatures) is described at a high level but is not accompanied by any audit, inter-annotator agreement, or false-positive/false-negative estimates. Without these, it is impossible to assess whether the 15,591 artifacts and 18,167 files materially over- or under-count the true population.
minor comments (2)
- [§3.2] Clarify the exact GPT model version and prompting strategy used for classification; 'GPT-5.2' is non-standard and should be documented with the precise prompt template and temperature settings.
- The interactive website is mentioned but its features and data export options are not described in the text; adding a short subsection or figure would improve usability for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for quantitative validation of our automated pipeline steps. We agree these metrics are important to support the dataset's claims and will add them in the revision. We address each major comment below.
read point-by-point responses
-
Referee: [§3] §3 (Data Collection Pipeline): The headline counts (40,585 repositories filtered, 36,710 labeled engineered, 4,738 containing artifacts, 15,591 artifacts collected) rest on two automated steps—metadata filtering plus GPT-5.2 binary classification and heuristic-based detection of the eight configuration mechanisms—yet no precision, recall, error rates, or manual validation results are reported for either step. This is load-bearing for the central claim that the released dataset accurately represents configurations in engineered projects.
Authors: We agree that validation metrics are necessary to substantiate the headline counts. The metadata filter applied established criteria (activity, size, language) drawn from prior repository-mining literature, and the GPT-5.2 prompt was engineered with explicit definitions of engineered software projects. At this scale, exhaustive manual review was not performed initially. In the revised manuscript we will add a validation section reporting precision, recall, and F1-score from a manual audit of a stratified random sample of 150 repositories for the classification step, together with an error analysis of common misclassifications. revision: yes
-
Referee: [§4] §4 (Artifact Detection and Collection): The systematic detection logic (file-name patterns, directory heuristics, content signatures) is described at a high level but is not accompanied by any audit, inter-annotator agreement, or false-positive/false-negative estimates. Without these, it is impossible to assess whether the 15,591 artifacts and 18,167 files materially over- or under-count the true population.
Authors: We concur that false-positive and false-negative estimates are required to evaluate detection quality. The heuristics were derived from official tool documentation and preliminary manual inspection. The revised manuscript will include a new validation subsection describing a manual audit of 120 stratified repositories, with reported false-positive and false-negative rates for the overall detection process and inter-annotator agreement statistics for the manual labels. The detection scripts and validation annotations will be released to enable community verification. revision: yes
Circularity Check
No circularity: dataset construction is empirical collection from external GitHub sources
full rationale
The paper presents a data collection pipeline that selects repositories via metadata filtering, applies GPT-5.2 classification to label engineered projects, detects configuration artifacts via systematic search, and reports resulting counts (40,585 repositories filtered, 36,710 labeled, 4,738 with artifacts, 15,591 artifacts collected). No equations, predictions, fitted parameters, or derivations are claimed. The numbers are direct outputs of the described process applied to external GitHub data, not reductions of inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The work is self-contained as a dataset release effort with no mathematical chain that could exhibit circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption GitHub metadata filtering can select actively maintained repositories suitable for analysis.
- domain assumption GPT-5.2 classification accurately identifies engineered software projects.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We selected 40,585 actively maintained repositories through metadata filtering, classified them using GPT-5.2 to identify 36,710 as belonging to engineered software projects, and systematically detected configuration artifacts...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The dataset covers 4,738 repositories across five tools ... and eight configuration mechanisms.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Sebastian Baltes, Seyedmoein Mohsenimofidi, Levi Böhme, Jai Lal Lulla, Muham- mad Auwal Abubakar, Christoph Treude, and Matthias Galster. 2026. A Dataset of Agentic AI Coding Tool Configurations. doi:10.5281/zenodo.19375880
-
[2]
Sebastian Baltes, Seyedmoein Mohsenimofidi, Levi Böhme, Jai Lal Lulla, Muham- mad Auwal Abubakar, Christoph Treude, and Matthias Galster. 2026. A Dataset of Agentic AI Coding Tool Configurations (Pipeline). doi:10.5281/zenodo.19375429
-
[3]
Worawalan Chatlatanagulchai, Hao Li, Yutaro Kashiwa, Brittany Reid, Kundjana- sith Thonglek, Pattara Leelaprute, Arnon Rungsawang, Bundit Manaskasemsak, Bram Adams, Ahmed E. Hassan, and Hajimu Iida. 2025. Agent READMEs: An Empirical Study of Context Files for Agentic Coding. arXiv:2511.12884 [cs.SE] doi:10.48550/arXiv.2511.12884
-
[4]
Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling Projects in GitHub for MSR Studies. In18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021, Madrid, Spain, May 17-19, 2021. IEEE, Madrid, Spain, 560–564. doi:10.1109/MSR52588.2021.00074
-
[5]
Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. 2024. Self-Collaboration Code Generation via ChatGPT.ACM Trans. Softw. Eng. Methodol.33, 7 (2024), 189:1– 189:38. doi:10.1145/3672459
-
[6]
Matthias Galster, Seyedmoein Mohsenimofidi, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, and Sebastian Baltes. 2026. Configuring Agentic AI Coding Tools: An Exploratory Study. arXiv:2602.14690 [cs.SE] doi:10.48550/arXiv. 2602.14690 To appear at the 3rd ACM International Conference on AI-powered Software (AIware 2026)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2026
-
[7]
Hassan, Hao Li, Dayi Lin, Bram Adams, Tse-Hsun Chen, Yutaro Kashiwa, and Dong Qiu
Ahmed E. Hassan, Hao Li, Dayi Lin, Bram Adams, Tse-Hsun Chen, Yutaro Kashiwa, and Dong Qiu. 2025. Agentic Software Engineering: Foundational Pillars and a Research Roadmap. arXiv:2509.06216 [cs.SE] doi:10.48550/arXiv.2509.06216
-
[8]
Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, and Bogdan Vasilescu. 2026. Speed at the Cost of Quality: How Cursor AI Increases Short-Term Velocity and Long-Term Complexity in Open-Source Projects. arXiv:2511.04427 [cs.SE] doi:10.48550/arXiv.2511.04427 To appear at the 23rd IEEE/ACM International Conference on Mining Software Repositories (MSR 2026)
-
[9]
Kosei Horikawa, Hao Li, Yutaro Kashiwa, Bram Adams, Hajimu Iida, and Ahmed E. Hassan. 2025. Agentic Refactoring: An Empirical Study of AI Coding Agents. arXiv:2511.04824 [cs.SE] doi:10.48550/arXiv.2511.04824
-
[10]
Shaokang Jiang and Daye Nam. 2026. Beyond the Prompt: An Empirical Study of Cursor Rules. arXiv:2512.18925 [cs.SE] doi:10.48550/arXiv.2512.18925 To appear at the 23rd IEEE/ACM International Conference on Mining Software Repositories (MSR 2026)
-
[11]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. Germán, and Daniela E. Damian. 2014. The promises and perils of mining GitHub. In11th Working Conference on Mining Software Repositories, MSR 2014, Proceedings, May 31 - June 1, 2014, Hyderabad, India, Premkumar T. Devanbu, Sung Kim, and Martin Pinzger (Eds.). ACM, Hyderabad, Ind...
-
[12]
Hao Li, Haoxiang Zhang, and Ahmed E. Hassan. 2026. AIDev: Studying AI Coding Agents on GitHub. arXiv:2602.09185 [cs.SE] doi:10.48550/arXiv.2602.09185
-
[13]
Seyedmoein Mohsenimofidi, Matthias Galster, Christoph Treude, and Sebastian Baltes. 2026. Context Engineering for AI Agents in Open-Source Software. arXiv:2510.21413 [cs.SE] doi:10.48550/arXiv.2510.21413 To appear at the 23rd IEEE/ACM International Conference on Mining Software Repositories (MSR 2026)
-
[14]
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating GitHub for engineered software projects.Empir. Softw. Eng.22, 6 (2017), 3219–3253. doi:10.1007/S10664-017-9512-6
-
[15]
Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu, and David Lo. 2019. Categorizing the Content of GitHub README Files.Empir. Softw. Eng.24, 3 (2019), 1296–1327. doi:10.1007/S10664-018-9660-3
-
[16]
Agentic Much? Adoption of Coding Agents on GitHub
Romain Robbes, Théo Matricon, Thomas Degueule, André C. Hora, and Ste- fano Zacchiroli. 2026. Agentic Much? Adoption of Coding Agents on GitHub. arXiv:2601.18341 [cs.SE] doi:10.48550/arXiv.2601.18341
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.18341 2026
-
[17]
Ranjan Sapkota, Konstantinos I. Roumeliotis, and Manoj Karkee. 2025. AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges. arXiv:2505.10468 [cs.AI] doi:10.48550/arXiv.2505.10468
-
[18]
Stack Exchange Inc. 2025. Stack Overflow Developer Survey 2025: AI Agent out- of-the-box tools. https://survey.stackoverflow.co/2025/ai/#3-ai-agent-out-of-the- box-tools
work page 2025
-
[19]
Miku Watanabe, Hao Li, Yutaro Kashiwa, Brittany Reid, Hajimu Iida, and Ahmed E. Hassan. 2025. On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub. arXiv:2509.14745 [cs.SE] doi:10.48550/arXiv.2509.14745
-
[20]
Tao Xiao, Youmei Fan, Fabio Calefato, Christoph Treude, Raula Gaikovina Kula, Hideaki Hata, and Sebastian Baltes. 2025. Self-Admitted GenAI Usage in Open- Source Software. arXiv:2507.10422 [cs.SE]
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [21]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.