The Prevalence and Impact of Licenses in Open Software Projects
Pith reviewed 2026-06-26 07:23 UTC · model grok-4.3
The pith
Moving from restrictive to permissive licenses reduces activity in C projects but increases it in Python ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Most projects contain no license. Among licensed projects, permissive licenses dominate and their share is rising over time, though restrictive licenses are retained more often. Language ecosystems differ sharply, with C strongly favoring restrictive licenses. Comparing activity levels in the year after a license change versus the year before shows that a shift from restrictive to permissive licensing is linked to reduced activity in C ecosystems and increased activity in Python.
What carries the argument
One-year activity comparison before and after license transitions, broken down by language ecosystem.
If this is right
- Permissive licenses are becoming more common while restrictive ones persist in certain ecosystems.
- C-language projects show lower activity after adopting permissive licenses.
- Python projects show higher activity after adopting permissive licenses.
- License type prevalence has shifted dramatically across time.
Where Pith is reading between the lines
- Ecosystem maintainers might consider language-specific licensing guidance when projects consider changing terms.
- Short-term activity metrics could be tracked after license updates to anticipate contributor response.
- Unlicensed projects may face reuse barriers that licensed ones avoid, potentially limiting their reach.
Load-bearing premise
License detection across 100 million projects is accurate and the one-year activity window measures the effect of the license change without other factors interfering.
What would settle it
Re-running the activity comparison on the same projects and finding no measurable difference between license changers and similar non-changers, or discovering widespread errors in the automated license labels.
Figures
read the original abstract
The terms of how publicly available source code can be used are dictated by its license. The license (or its absence), in turn, affects what code the project may reuse and how its code can be (re)used and may also affect external participation and overall activity of the project. We aim to better understand the general state of license distribution overall and within language ecosystems and to investigate if license changes are associated with a noticeable variations of project output. To accomplish that we identify licenses and license types for over 100M software projects and find that most do not contain any license, that permissive licenses represent the bulk of most licenses, and that permissive licensing is representing an increasing proportion of all licenses over time. Restrictive licenses are more likely to be retained, however. There is a great variation among language ecosystems with C-language strongly favoring restrictive licenses. The analysis of license change impact comparing activity within one year of the adoption of the initial and final licenses shows that the change from restrictive to permissive license varies with the ecosystem. C-language ecosystems show reduced activity while Python shows increased activity when comparing restrictive to permissive license transition. Our results demonstrate dramatic changes in license type prevalence over time and find that the effects of license changes may have opposite effects depending on the language ecosystem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes license distribution and changes across over 100 million open source projects. It reports that most projects lack any license, permissive licenses form the majority and are increasing over time while restrictive licenses are more often retained, with substantial variation by language ecosystem (e.g., C strongly favoring restrictive licenses). The central empirical claim is that transitions from restrictive to permissive licenses are associated with ecosystem-dependent activity changes: reduced activity in C-language projects and increased activity in Python projects, based on comparing project output one year before versus after the license change.
Significance. The scale of the study (100M+ projects) offers potentially useful descriptive data on license prevalence trends if the detection methods are validated. The ecosystem-specific impact findings, if they survive controls for confounders, would be relevant to OSS governance and license choice. However, the before-after design without matching or regression controls for project-level factors limits the strength of causal inferences about license effects.
major comments (1)
- [License change impact analysis] License change impact analysis (as described in the abstract): the one-year before/after activity comparison reports reduced activity for C ecosystems and increased activity for Python after restrictive-to-permissive transitions, but provides no matching, regression controls, or stratification for project age, size, contributor count, or concurrent events at the transition time. These factors plausibly differ systematically across language ecosystems and could produce the observed activity differences independently of the license change.
minor comments (2)
- The abstract states that licenses were identified for over 100M projects but does not specify the data source, license detection algorithm, or accuracy validation; these details are required to assess the reliability of the prevalence statistics.
- The claim that 'permissive licensing is representing an increasing proportion of all licenses over time' would benefit from explicit time-series figures or tables showing the trend with confidence intervals or sample sizes per year.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our large-scale empirical study of licenses in open source projects. We address the major comment below and will make revisions to strengthen the manuscript accordingly.
read point-by-point responses
-
Referee: License change impact analysis (as described in the abstract): the one-year before/after activity comparison reports reduced activity for C ecosystems and increased activity for Python after restrictive-to-permissive transitions, but provides no matching, regression controls, or stratification for project age, size, contributor count, or concurrent events at the transition time. These factors plausibly differ systematically across language ecosystems and could produce the observed activity differences independently of the license change.
Authors: We agree that the before-after comparison provides associations rather than causal estimates and does not include matching or regression controls for project-level factors. The manuscript frames the results using 'associated with' and 'varies with the ecosystem' to reflect its observational nature, and the key contribution is documenting the opposite directional patterns across ecosystems (reduced activity in C, increased in Python) as a descriptive finding. We will revise the discussion and limitations sections to explicitly note the absence of controls for age, size, contributor count, and concurrent events, and to state that these ecosystem differences warrant further controlled studies. Full matching or stratification was not performed due to the scale of the dataset (>100M projects) and the focus on broad prevalence trends, but we accept that adding such caveats improves the interpretation. revision: partial
Circularity Check
No circularity: purely observational empirical analysis
full rationale
The paper reports license detection across >100M projects and before/after activity comparisons within language ecosystems. No equations, fitted parameters, predictions, ansatzes, or uniqueness theorems appear. All claims rest on direct data aggregation and simple temporal comparisons; no step reduces by construction to its own inputs or to a self-citation chain. The analysis is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
File-Level Copying Is an Implicit Dependency in Open Source
File-level copying acts as an implicit dependency in open source, removing provenance signals and concentrating security risks in vendored copies and license risks in direct source reuse.
Reference graph
Works this paper leans on
-
[1]
Reducibility among combinatorial problems
Emad Alamoudi, Rashid Mehmood, Wajdi Aljudaibi, Aiiad Albeshri, and Syed Hamid Hasan. 2020.Open Source and Open Data Licenses in the Smart Infrastructure Era: Review and License Selection Frameworks. Springer International Publishing, Cham, 537–559. https://doi.org/10.1007/978-3-030- 13705-2_22 Christian Bird, Nachiappan Nagappan, Harald Gall, Brendan Mur...
-
[2]
Andrea Capiluppi, Patricia Lago, and Maurizio Morisio
What’s in a github star? understanding repository starring practices in a social coding platform.Journal of Systems and Software146 (2018), 112–129. Andrea Capiluppi, Patricia Lago, and Maurizio Morisio
2018
-
[3]
Jorge Colazo and Yulin Fang
On the untriviality of trivial packages: An empirical study of npm javascript packages.IEEE Transactions on Software Engineering48, 8 (2021), 2695–2708. Jorge Colazo and Yulin Fang
2021
-
[4]
Xing Cui, Jingzheng Wu, Yanjun Wu, Xu Wang, Tianyue Luo, Sheng Qu, Xiang Ling, and Mutian Yang
Impact of license choice on open source software development activity.Journal of the American Society for Information Science and Technology60, 5 (2009), 997–1011. Xing Cui, Jingzheng Wu, Yanjun Wu, Xu Wang, Tianyue Luo, Sheng Qu, Xiang Ling, and Mutian Yang
2009
-
[5]
Melanie Dulong de Rosnay
How do firms make use of open source communities?Long range planning41, 6 (2008), 629–649. Melanie Dulong de Rosnay
2008
-
[6]
Brian Fitzgerald
Open source software: Motivation and restrictive licensing.International Economics and Economic Policy4 (2007), 209–225. Brian Fitzgerald
2007
-
[7]
http://www.jstor.org/stable/25148740 Tanner Fry, Tapajit Dey, Andrey Karnauch, and Audris Mockus
The Transformation of Open Source Software.MIS Quarterly30, 3 (2006), 587–598. http://www.jstor.org/stable/25148740 Tanner Fry, Tapajit Dey, Andrey Karnauch, and Audris Mockus
arXiv 2006
-
[8]
https://doi.org/10.1016/S0048-7333(03)00061-1 Mahmoud Jahanshahi and Audris Mockus
Profiting from voluntary information spillovers: how users benefit by freely revealing their innovations.Research Policy32, 10 (2003), 1753–1769. https://doi.org/10.1016/S0048-7333(03)00061-1 Mahmoud Jahanshahi and Audris Mockus
-
[9]
In2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)
Cracks in the stack: Hidden vulnerabilities and licensing risks in llm pre-training datasets. In2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code). IEEE, 104–111. Mahmoud Jahanshahi, David Reid, Adam McDaniel, and Audris Mockus. 2025b. Oss license identification at scale: A comprehensive dataset using world of code. In2025 I...
2025
-
[10]
Georgia M Kapitsaki, Nikolaos D Tselikas, Kyriakos-Ioannis D Kyriakou, and Maria Papoutsoglou
Modeling and recommending open source licenses with findOSSLicense.IEEE Transactions on Software Engineering47, 5 (2019), 919–935. Georgia M Kapitsaki, Nikolaos D Tselikas, Kyriakos-Ioannis D Kyriakou, and Maria Papoutsoglou
2019
-
[11]
Maria Kechagia, Diomidis Spinellis, and Stephanos Androutsellis-Theotokis
Help me with this: A categorization of open source software problems.Information and Software Technology152 (2022), 107034. Maria Kechagia, Diomidis Spinellis, and Stephanos Androutsellis-Theotokis
2022
-
[12]
Hemank Lamba, Asher Trockman, Daniel Armanios, Christian Kästner, Heather Miller, and Bogdan Vasilescu
Effort, co-operation and co-ordination in an open source software project: GNOME.Information Systems Journal 12, 1 (2002), 27–42. Hemank Lamba, Asher Trockman, Daniel Armanios, Christian Kästner, Heather Miller, and Bogdan Vasilescu
2002
-
[13]
Some Simple Economics of Open Source.The Journal of Industrial Economics50, 2 (Jun 2002), 197–234. http: //www.jstor.org/stable/3569837 Manuscript submitted to ACM 20 Mahmoud Jahanshahi, Bogdan Vasilescu, and Audris Mockus Josh Lerner and Jean Tirole
arXiv 2002
-
[14]
Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus
The economics of technology sharing: Open source and beyond.Journal of Economic Perspectives19, 2 (2005), 99–120. Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus
2005
-
[15]
Yuxing Ma, Audris Mockus, Russel Zaretzki, Randy Bradley, and Bogdan Bichescu
World of code: enabling a research workflow for mining and analyzing the universe of open source VCS data.Empirical Software Engineering26 (2021), 1–42. Yuxing Ma, Audris Mockus, Russel Zaretzki, Randy Bradley, and Bogdan Bichescu
2021
-
[16]
A Methodology for Analyzing Uptake of Software Technologies Among Developers.IEEE Transactions on Software Engineering48, 2 (2022), 485–501. https://doi.org/10.1109/TSE.2020.2993758 Addi Malviya-Thakur, Audris Mockus, Russell Zaretzki, Bogdan Bichescu, and Randy Bradley
-
[17]
In2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
How R Developers explain their Package Choice: A Survey. In2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12. https://doi.org/10.1109/ ESEM56168.2023.10304869 Audris Mockus
arXiv 2023
-
[18]
InFirst International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS’07: ICSE Workshops 2007)
Large-scale code reuse in open source software. InFirst International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS’07: ICSE Workshops 2007). IEEE, 7–7. Audris Mockus, Diomidis Spinellis, Zoe Kotti, and Gabriel John Dusing
2007
-
[19]
Sonali K
Determinants of the choice of open source software license.Journal of Management Information Systems25, 3 (2008), 207–240. Sonali K. Shah
2008
-
[20]
Zakariyah Shoroye, Waheeb Yaqub, Azhar Ahmed Mohammed, Zeyar Aung, and Davor Svetinovic
Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development.Management Science52, 7 (July 2006), 1000–1014. Zakariyah Shoroye, Waheeb Yaqub, Azhar Ahmed Mohammed, Zeyar Aung, and Davor Svetinovic
2006
-
[21]
Jason Tsay, Laura Dabbish, and James Herbsleb
Impacts of license choice and organizational sponsorship on user interest and development activity in open source software projects.Information Systems Research17, 2 (2006), 126–144. Jason Tsay, Laura Dabbish, and James Herbsleb
2006
-
[22]
Carrots and Rainbows: Motivation and Social Practice in Open Source Software Development.MIS Quarterly36, 2 (2012), 649–676. http://www.jstor.org/stable/41703471 Patrick Wagstrom, James D Herbsleb, Robert E Kraut, and Audris Mockus
arXiv 2012
-
[23]
Jiaqi Wu, Lingfeng Bao, Xiaohu Yang, Xin Xia, and Xing Hu
Open source license inconsistencies on github.ACM Transactions on Software Engineering and Methodology32, 5 (2023), 1–23. Jiaqi Wu, Lingfeng Bao, Xiaohu Yang, Xin Xia, and Xing Hu
2023
-
[24]
Weiwei Xu, Kai Gao, Hao He, and Minghui Zhou
Lidetector: License incompatibility detection for open source software.ACM Transactions on Software Engineering and Methodology32, 1 (2023), 1–28. Weiwei Xu, Kai Gao, Hao He, and Minghui Zhou
2023
-
[25]
Inflow and retention in oss communities with commercial involvement: A case study of three hybrid projects.ACM Transactions on Software Engineering and Methodology (TOSEM)25, 2 (2016),
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.