AI Policy, Disclosure, and Human in the Loop: How Are Contribution Guidelines Adapting to GenAI?
Pith reviewed 2026-05-20 15:39 UTC · model grok-4.3
The pith
Most open source AI policies permit generative AI contributions when paired with disclosure and human review.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We analyzed 1,000 popular GitHub repositories and identified 118 AI policies for contributors. Our results show that 78% of the AI policies allow contributions generated with GenAI, while 22% explicitly discourage their use; 51% of the AI policies require the disclosure of AI-generated contributions; and 74% of the AI policies require a human in the loop during contribution. Overall, we find that the majority of the analyzed AI policies are positive regarding the usage of GenAI. However, AI disclosure and human in the loop are fundamental in the contribution process.
What carries the argument
Empirical extraction and classification of AI usage policies from contributor guidelines of popular open source repositories
If this is right
- Contributors using GenAI should plan to disclose that use to comply with the most common policy stance.
- Project maintainers can expect to incorporate explicit human review steps for AI-assisted submissions.
- Developers gain clearer guidance on acceptable AI practices when joining or contributing to open source projects.
- Researchers studying software development practices now have baseline data on how guidelines are evolving around AI.
Where Pith is reading between the lines
- If GenAI output quality rises, projects may shift from requiring human review to requiring only automated checks or provenance metadata.
- The permissive stance observed here could encourage closed-source teams to adopt similar disclosure norms for internal code.
- Smaller or domain-specific repositories outside the top 1,000 may show stricter or more permissive patterns than the popular ones studied.
Load-bearing premise
The 118 AI policies were correctly identified and classified from the contribution guidelines of the 1,000 selected repositories without significant selection or interpretation bias.
What would settle it
A replication that samples a different set of repositories and finds a majority of policies discouraging or banning GenAI contributions would falsify the reported distribution.
Figures
read the original abstract
Generative AI (GenAI) has recently transformed software development. Due to the ease of generating code, open source projects are experiencing a growth in contributions. To address the rise of GenAI, open source projects have begun implementing policies for AI usage in contributions. However, the extent to which open source specifies whether AI-assisted contributions are allowed or prohibited, along with the best practices for contributors, remains unclear. This paper provides an initial empirical study to explore how open source projects are adapting to GenAI contributions. We analyzed 1,000 popular GitHub repositories and identified 118 AI policies for contributors. Our results show that (1) 78% of the AI policies allow contributions generated with GenAI, while 22% explicitly discourage their use; (2) 51% of the AI policies require the disclosure of AI-generated contributions; and (3) 74% of the AI policies require a human in the loop during contribution. Overall, we find that the majority of the analyzed AI policies are positive regarding the usage of GenAI. However, AI disclosure and human in the loop are fundamental in the contribution process. Finally, we conclude by discussing implications for developers and researchers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports an empirical analysis of 1,000 popular GitHub repositories, from which 118 AI-related policies were identified in contribution guidelines. The key findings are that 78% of these policies permit GenAI-generated contributions (with 22% discouraging them), 51% mandate disclosure of AI use, and 74% require human involvement in the contribution process. The authors conclude that the majority of policies are positive toward GenAI usage while underscoring the roles of disclosure and human oversight.
Significance. If the classifications hold, this work supplies an early large-scale descriptive account of how open-source projects are adapting contribution guidelines to generative AI. The examination of 1,000 repositories yields concrete prevalence figures that can inform both practitioner guidelines and follow-on research on policy evolution. The explicit discussion of implications for developers and researchers is a constructive element.
major comments (2)
- [Methods] Methods section (data collection and policy identification): The criteria used to locate and select the 118 AI policies from the 1,000 repositories are not specified, including any keyword lists, file names searched (e.g., CONTRIBUTING.md), or decision rules for inclusion. Because the reported percentages rest directly on this extraction step, the absence of a reproducible protocol prevents verification of the 78 % / 51 % / 74 % figures.
- [Results] Results section (policy categorization): No coding scheme, inter-rater agreement statistic, or procedure for resolving ambiguous statements is reported when mapping policies to the allow/discourage, disclosure, and human-in-the-loop categories. Small shifts in how borderline cases (e.g., “AI tools may be used provided the contributor reviews the output”) are classified could materially change the headline proportions and the “majority positive” claim.
minor comments (2)
- [Abstract] Abstract: The percentages are presented without any mention of the underlying sample (118 policies) or the classification process; adding a brief qualifier would improve immediate readability.
- [Discussion] Discussion: The implications section could usefully reference prior empirical studies on contribution guidelines to situate the GenAI-specific findings.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The comments identify key areas where additional transparency can strengthen the manuscript. We address each major comment below and have revised the manuscript to improve reproducibility and clarity of the empirical process.
read point-by-point responses
-
Referee: [Methods] Methods section (data collection and policy identification): The criteria used to locate and select the 118 AI policies from the 1,000 repositories are not specified, including any keyword lists, file names searched (e.g., CONTRIBUTING.md), or decision rules for inclusion. Because the reported percentages rest directly on this extraction step, the absence of a reproducible protocol prevents verification of the 78 % / 51 % / 74 % figures.
Authors: We agree that the original submission did not provide sufficient detail on the data collection and policy identification process. In the revised manuscript we have expanded the Methods section to specify the keyword lists used (e.g., 'AI', 'generative AI', 'LLM', 'ChatGPT', 'Copilot', 'artificial intelligence'), the files examined in each repository (CONTRIBUTING.md, README.md, and other contribution guideline files), and the explicit decision rules for inclusion (any policy text that directly addresses the use of generative AI tools for code or documentation contributions). These additions make the identification of the 118 policies reproducible. revision: yes
-
Referee: [Results] Results section (policy categorization): No coding scheme, inter-rater agreement statistic, or procedure for resolving ambiguous statements is reported when mapping policies to the allow/discourage, disclosure, and human-in-the-loop categories. Small shifts in how borderline cases (e.g., “AI tools may be used provided the contributor reviews the output”) are classified could materially change the headline proportions and the “majority positive” claim.
Authors: We acknowledge that the original manuscript omitted a detailed coding scheme. We have added a dedicated subsection describing the classification rules for each of the three categories, with explicit criteria and examples. Borderline statements requiring contributor review of AI output were classified as mandating human involvement. The coding was performed primarily by one author with team discussion to resolve uncertainties. We have documented this procedure and provided illustrative examples in the revision. A formal inter-rater agreement statistic was not computed because the study used a single-primary-coder workflow rather than independent multi-coder annotation; we therefore report the process qualitatively rather than quantitatively. revision: partial
Circularity Check
No circularity: direct empirical counts from repository inspection
full rationale
The paper conducts an empirical survey by selecting 1,000 popular GitHub repositories, identifying 118 AI policies from their contribution guidelines, and reporting straightforward percentages (78% allow GenAI, 22% discourage, 51% require disclosure, 74% require human-in-the-loop). These are observational tallies with no equations, fitted parameters, model predictions, or derivation steps. No self-citations are invoked as load-bearing premises, no uniqueness theorems are imported, and no ansatz or renaming of known results occurs. The results follow directly from the data collection and classification process without any reduction to inputs by construction, satisfying the self-contained empirical case.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 1,000 popular GitHub repositories form a suitable sample for observing AI policy adaptation in open source.
Reference graph
Works this paper leans on
-
[1]
Large language models for software engineering: Survey and open problems,
A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang, “Large language models for software engineering: Survey and open problems,” inInternational Conference on Software Engineering: Future of Software Engineering. IEEE, 2023, pp. 31–53
work page 2023
-
[2]
Large language models for software engi- neering: A systematic literature review,
X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engi- neering: A systematic literature review,”ACM Transactions on Software Engineering and Methodology, 2023
work page 2023
-
[3]
Promises, perils, and (timely) heuristics for mining coding agent activity,
R. Robbes, T. Matricon, T. Degueule, A. Hora, and S. Zacchiroli, “Promises, perils, and (timely) heuristics for mining coding agent activity,” inInternational Conference on Mining Software Repositories, 2026
work page 2026
-
[4]
Agentic Much? Adoption of Coding Agents on GitHub
——, “Agentic Much? Adoption of Coding Agents on GitHub,”arXiv preprint arXiv:2601.18341, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
F. Song, A. Agarwal, and W. Wen, “The impact of generative ai on collaborative open-source software development: Evidence from github copilot,”arXiv preprint arXiv:2410.02091, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[6]
Why agentic-prs get rejected: A comparative study of coding agents,
S. Nakashima, Y. Ishimoto, M. Kondo, S. Mclntosh, and Y. Kamei, “Why agentic-prs get rejected: A comparative study of coding agents,” arXiv preprint arXiv:2602.04226, 2026
-
[7]
Beyond Banning AI: A First Look at GenAI Governance in Open Source Software Communities,
W. Yang, R. He, and M. Zhou, “Beyond Banning AI: A First Look at GenAI Governance in Open Source Software Communities,”arXiv preprint arXiv:2603.26487, 2026
-
[8]
S. Baltes, M. Cheong, and C. Treude, “"An Endless Stream of AI Slop": The Growing Burden of AI-Assisted Software Development,” arXiv preprint arXiv:2603.27249, 2026
-
[9]
LLVM: Our AI policy vs code of conduct and vs reality, https://discou rse.llvm.org/t/our-ai-policy-vs-code-of-conduct-and-vs-reality/88300
-
[10]
LLVM AI tool policy: start small, no slop, https://discourse.llvm.org/t /rfc-llvm-ai-tool-policy-start-small-no-slop/88476
-
[11]
Policy: Empower reviewers to reject burdensome PRs, https://github.c om/rust-lang/compiler-team/issues/893
-
[12]
Slop is the new name for unwanted AI-generated content , https://simo nwillison.net/2024/May/8/slop
work page 2024
-
[13]
AI Slop and the Software Commons
S. Baltes, M. Cheong, and C. Treude, “AI Slop and the Software Commons,”arXiv preprint arXiv:2604.16754, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[14]
Linux: AI Coding Assistants, https://docs.kernel.org/process/coding-ass istants.html
-
[15]
Fedora AI-Assisted Contributions Policy, https://docs.fedoraproject.org /en-US/council/policy/ai-contribution-policy
-
[16]
LLVM AI Tool Use Policy, https://llvm.org/docs/AIToolPolicy.html
-
[17]
AI policy of huggingface/transformers, https://github.com/huggingface /transformers/blob/c472755e79aac54d675845bff5e5c821c21260af/CON TRIBUTING.md
-
[18]
AI policy of 9001/copyparty, https://github.com/9001/copyparty/blob/6 e25d648a900f65a4546a1b17a9761c0f1e9e3cb/CONTRIBUTING.md
-
[19]
AI policy dataset, https://doi.org/10.5281/zenodo.20214334
-
[20]
Sampling Projects in GitHub for MSR Studies,
O. Dabic, E. Aghajani, and G. Bavota, “Sampling Projects in GitHub for MSR Studies,” inInternational Conference on Mining Software Repositories. IEEE, 2021, pp. 560–564
work page 2021
-
[21]
Understanding the factors that impact the popularity of GitHub repositories,
H. Borges, A. Hora, and M. T. Valente, “Understanding the factors that impact the popularity of GitHub repositories,” inInternational Conference on Software Maintenance and Evolution, 2016, pp. 334– 344
work page 2016
-
[22]
What’s in a GitHub star? understanding repository starring practices in a social coding platform,
H. Silva and M. T. Valente, “What’s in a GitHub star? understanding repository starring practices in a social coding platform,”Journal of Systems and Software, vol. 146, pp. 112–129, 2018
work page 2018
-
[23]
What do contribution guidelines say about software testing?
B. Falcucci, F. Gomide, and A. Hora, “What do contribution guidelines say about software testing?” inInternational Conference on Mining Software Repositories. IEEE, 2025, pp. 434–438
work page 2025
-
[24]
AI policy of github/spec-kit, https://github.com/github/spec-kit/blob/17 1b65ac33a3bf51c23b9f7a5287032ed1ae72ba/CONTRIBUTING.md
-
[25]
AI policy of ohmyzsh/ohmyzsh, https://github.com/ohmyzsh/ohmyzsh /blob/349b9e49ced7682e27927ffb34b6522f011f3e74/CONTRIBUTIN G.md
-
[26]
AI policy of microsoft/typescript, https://github.com/microsoft/typescr ipt/blob/55423abe4d029017f19b6e4c32097591994836b4/CONTRIBU TING.md
-
[27]
AI policy of SillyTavern/SillyTavern, https://github.com/SillyTavern/Sil lyTavern/blob/004f1336e6e59d476c1043f1dc94c92d028ac5d0/CONTR IBUTING.md
-
[28]
AI policy of freqtrade/freqtrade, https://github.com/freqtrade/freqtrade /blob/af1de46cd4dd968482dee15a5deebd57005d8691/CONTRIBUTIN G.md
-
[29]
AI policy of gohugoio/hugo, https://github.com/gohugoio/hugo/blob/90 d8bf34aea897a8a329480bde54ff1c61c0c9b3/CONTRIBUTING.md
-
[30]
AI policy of mastodon/mastodon, https://github.com/mastodon/.github/ blob/main/AI_POLICY.md
-
[31]
AI policy of PostgREST/postgrest, https://github.com/PostgREST/postg rest/blob/c9253ed0569b24c925cc4e99b1453f94dc1a80ef/CONTRIBUT ING.md
-
[32]
AI policy of electron/electron, https://github.com/electron/governance/b lob/main/policy/ai.md
-
[33]
AI policy of eslint/eslint, https://github.com/eslint/eslint/blob/54080dad 4f77bb39a1a843933d4ff3a2b7c175e2/CONTRIBUTING.md
-
[34]
AI policy of envoyproxy/envoy, https://github.com/envoyproxy/envoy/ blob/fe746515871f06ba5b811714c9eeca8eaf582654/CONTRIBUTING .md
-
[35]
AI policy of ClickHouse/ClickHouse, https://github.com/ClickHouse/Cl ickHouse/blob/cf81ad1ddee2d5376f28f33fbd4afe750c850319/AI_POLI CY.md
-
[36]
AI policy of seleniumhq/selenium, https://github.com/seleniumhq/sele nium/blob/1ab572a13ddf5e47e8a0b82da3e37557270df391/CONTRIB UTING.md
-
[37]
AI policy of run-llama/llama_index, https://github.com/run-llama/llama _index/blob/0a6c90bfd610dcc66dcb89ed3e1d905c5e9bf6dc/CONTRIB UTING.md
-
[38]
AI policy of sipeed/picoclaw, https://github.com/sipeed/picoclaw/blo b/8d51d306b32ae2024e02b7a97c9d051b3d8b25be/CONTRIBUTING. md
-
[39]
AI policy of astral-sh/uv, https://github.com/astral-sh/.github/blob/mai n/AI_POLICY.md
-
[40]
AI policy of facebook/docusaurus, https://github.com/facebook/docusa urus/blob/9929ac0f6bc13c9ffbc6aa2cca419ff62234d3c6/CONTRIBUT ING.md
-
[41]
AI policy of hashicorp/terraform, https://github.com/hashicorp/terrafo rm/blob/844f216569901c0f8142136e9e47fe62e336b9ca/.github/CON TRIBUTING.md
-
[42]
AI policy of badlogic/pi-mono, https://github.com/badlogic/pi-mono/bl ob/0b271a2c4f539c24d33a6f3e8f8e9e7abe625823/CONTRIBUTING.m d
-
[43]
AI policy of anuken/mindustry, https://github.com/anuken/mindustry/bl ob/327464ba82fca614671d86b1357c99f182783493/CONTRIBUTING .md
-
[44]
Auto-closing pull requests, https://github.com/earendil-works/pi/issues /4390
-
[45]
Approved contributors, https://github.com/earendil-works/pi/pull/4409
-
[46]
Checking implementation, https://github.com/openclaw/openclaw/pull/ 80637
-
[47]
Limiting contributors, https://github.com/openclaw/openclaw/issues/38 283
-
[48]
AI policy of jesseduffield/lazygit, https://github.com/jesseduffield/lazyg it/blob/0ecf818a739606d4255e009b66309e6e508ce08a/CONTRIBUT ING.md
-
[49]
AI policy of pocketbase/pocketbase, https://github.com/pocketbase/pock etbase/blob/83e44a7cfbc7312c3f5e3d2f643da45512fcbc52/CONTRIB UTING.md
-
[50]
Influence of social and technical factors for evaluating contribution in github,
J. Tsay, L. Dabbish, and J. Herbsleb, “Influence of social and technical factors for evaluating contribution in github,” inInternational Confer- ence on Software Engineering, 2014, pp. 356–366
work page 2014
-
[51]
Do as I do, not as I say: Do contribution guidelines match the GitHub contribution process?
O. Elazhary, M.-A. Storey, N. Ernst, and A. Zaidman, “Do as I do, not as I say: Do contribution guidelines match the GitHub contribution process?” inInternational Conference on Software Maintenance and Evolution, 2019, pp. 286–290
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.