Patterns of Effort Contribution and Demand and User Classification based on Participation Patterns in NPM Ecosystem

Audris Mockus; Tapajit Dey; Yuxing Ma

arxiv: 1907.06538 · v1 · pith:TBTGHZVTnew · submitted 2019-07-15 · 💻 cs.SE · cs.CY

Patterns of Effort Contribution and Demand and User Classification based on Participation Patterns in NPM Ecosystem

Tapajit Dey , Yuxing Ma , Audris Mockus This is my paper

Pith reviewed 2026-05-24 21:28 UTC · model grok-4.3

classification 💻 cs.SE cs.CY

keywords NPM ecosystemeffort contributionsupply chainuser clusteringopen source participationcompany affiliationdependency analysisissue tracking

0 comments

The pith

Developers in NPM primarily contribute to and demand effort from direct dependencies rather than transitive ones

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines participation patterns across the NPM ecosystem using data on 1.3 million issues and pull requests plus commit histories for 272,000 users across 4,400 popular packages. It establishes that most contributions and demands target packages a user depends on directly, with only a tiny share reaching transitive dependencies and a notable share falling outside the visible supply chain. Clustering reveals three distinct groups by demand patterns and two by contribution patterns, while random forest classification predicts company affiliation from these patterns at AUC-ROC 0.68. These findings matter because they map where effort actually flows in a large open-source dependency network and point to where visibility may be insufficient.

Core claim

Users contribute and demand effort primarily from packages that they depend on directly with only a tiny fraction of contributions and demand going to transitive dependencies. A significant portion of demand goes into packages outside the users' respective supply chains (constructed based on publicly visible version control data). Three and two different groups of users are observed based on the effort demand and effort contribution patterns respectively. The Random Forest model used for identifying the company affiliation of the users gives a AUC-ROC value of 0.68.

What carries the argument

Fuzzy c-means clustering applied to per-user effort contribution and demand vectors built from issues, pull requests, and commit activity linked to direct and transitive NPM dependencies.

If this is right

Most maintainer effort can focus on direct rather than transitive relationships.
A meaningful share of demand originates outside any single user's visible supply chain and may require separate support mechanisms.
User groups defined by demand versus contribution patterns may need distinct engagement or tooling approaches.
Participation patterns carry a detectable signal for inferring commercial versus volunteer status.
Increasing upstream visibility is needed to better align demand with supply across the ecosystem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same clustering and prediction approach could be tested on other package registries to check whether direct-dependency concentration is general.
Incomplete public supply chains may systematically understate the true reach of demand, suggesting a need for hybrid public-private dependency mapping.
If company affiliation can be inferred from patterns, ecosystems might use similar signals to measure commercial involvement without self-reporting.
The existence of demand outside supply chains implies potential sustainability risks for packages that receive effort without reciprocal contribution links.

Load-bearing premise

The supply chains constructed from public version control data accurately reflect real dependencies and the public commit records of issue creators capture their full participation without substantial private or missing activity.

What would settle it

A dataset that includes private repositories and finds large fractions of contributions or demands going to transitive dependencies would falsify the concentration on direct dependencies.

Figures

Figures reproduced from arXiv: 1907.06538 by Audris Mockus, Tapajit Dey, Yuxing Ma.

**Figure 2.** Figure 2: The 7 predictors we selected for our final model were (in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Background: Open source requires participation of volunteer and commercial developers (users) in order to deliver functional high-quality components. Developers both contribute effort in the form of patches and demand effort from the component maintainers to resolve issues reported against it. Aim: Identify and characterize patterns of effort contribution and demand throughout the open source supply chain and investigate if and how these patterns vary with developer activity; identify different groups of developers; and predict developers' company affiliation based on their participation patterns. Method: 1,376,946 issues and pull-requests created for 4433 NPM packages with over 10,000 monthly downloads and full (public) commit activity data of the 272,142 issue creators is obtained and analyzed and dependencies on NPM packages are identified. Fuzzy c-means clustering algorithm is used to find the groups among the users based on their effort contribution and demand patterns, and Random Forest is used as the predictive modeling technique to identify their company affiliations. Result: Users contribute and demand effort primarily from packages that they depend on directly with only a tiny fraction of contributions and demand going to transitive dependencies. A significant portion of demand goes into packages outside the users' respective supply chains (constructed based on publicly visible version control data). Three and two different groups of users are observed based on the effort demand and effort contribution patterns respectively. The Random Forest model used for identifying the company affiliation of the users gives a AUC-ROC value of 0.68. Conclusion: Our results give new insights into effort demand and supply at different parts of the supply chain of the NPM ecosystem and its users and suggests the need to increase visibility further upstream.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Large NPM dataset shows effort stays mostly direct with some outside leakage and a few user clusters, but supply-chain construction and 0.68 AUC prediction are the weak points.

read the letter

The paper's main contribution is a large-scale measurement of contribution and demand flows across 4433 popular NPM packages using 1.3 million issues and commit histories from 272k users. It reports that activity concentrates on direct dependencies, with only tiny shares reaching transitive ones, a noticeable slice of demand landing outside the visible supply chain, three demand clusters and two contribution clusters via fuzzy c-means, and a Random Forest that reaches 0.68 AUC-ROC on company affiliation. The scale of the data and the concrete numbers on direct-versus-transitive splits are the parts that feel new and potentially useful to people tracking open-source supply chains. Standard clustering and classification applied to fresh ecosystem data is straightforward but still fills a gap when the numbers are this big. The soft spots sit in the construction steps and the model performance. The outside-demand result rests on how completely the supply chains were recovered from public version-control and dependency data; any under-count of versions, ranges, or packages outside the 4433 set inflates that bucket by design, and the abstract gives little detail on the exact matching rules. An AUC of 0.68 is modest at best for affiliation prediction, so that piece reads more like an exploratory add-on than a strong result. The choice of cluster count also lacks visible justification or stability checks. This work is aimed at empirical software-engineering researchers who study OSS maintenance and ecosystems. The dataset size and the practical questions make it worth sending to referees, provided they can examine the dependency-resolution and supply-chain code in detail.

Referee Report

2 major / 2 minor

Summary. The paper analyzes patterns of effort contribution (via commits) and demand (via issues/PRs) in the NPM ecosystem using data on 1,376,946 issues/PRs across 4433 high-download packages and full public commit histories for their 272,142 creators. It reports that both contribution and demand are overwhelmingly directed at direct dependencies (with negligible shares to transitive dependencies), that a substantial fraction of demand targets packages outside each user's inferred supply chain, that fuzzy c-means clustering identifies three distinct demand-pattern groups and two contribution-pattern groups, and that a Random Forest classifier predicts company affiliation from these patterns with AUC-ROC 0.68.

Significance. If the supply-chain inferences hold, the work supplies large-scale empirical evidence on how effort flows through real OSS dependency graphs and documents heterogeneity in developer behavior, which could inform maintenance prioritization and tooling for upstream visibility. The dataset scale (hundreds of thousands of users and over a million events) is a clear strength; the modest classifier performance and the reliance on public data alone are acknowledged limitations in the text.

major comments (2)

[Method] Method section (dependency and supply-chain construction): the headline claim that a significant portion of demand falls outside users' supply chains rests on the accuracy of inferring each of the 272k users' direct and transitive dependencies solely from public commit histories and the dependency graph among the 4433 packages. The description does not specify how exact versions or version ranges are resolved, whether packages outside the 4433-set are considered, or how private/unlisted dependencies are handled; any systematic under-approximation would inflate the 'outside' bucket by construction and directly undermine the central supply-chain result.
[Results] Results (clustering): the reported three demand groups and two contribution groups are obtained via fuzzy c-means with the number of clusters treated as a free parameter; no cluster-validity indices, stability analysis across random seeds, or sensitivity checks are described, yet these groupings are presented as a primary finding characterizing user heterogeneity.

minor comments (2)

[Abstract / Method] The abstract and Method section would benefit from one additional sentence on the exact procedure used to map issues to packages and to build per-user supply chains (e.g., whether dependency metadata was taken from package.json at the time of the issue or from the latest version).
[Results] Table or figure presenting the cluster centroids or feature distributions would make the 'three and two groups' claim easier to interpret without requiring the reader to reconstruct the patterns from text alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where additional methodological detail and validation will strengthen the paper. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Method] Method section (dependency and supply-chain construction): the headline claim that a significant portion of demand falls outside users' supply chains rests on the accuracy of inferring each of the 272k users' direct and transitive dependencies solely from public commit histories and the dependency graph among the 4433 packages. The description does not specify how exact versions or version ranges are resolved, whether packages outside the 4433-set are considered, or how private/unlisted dependencies are handled; any systematic under-approximation would inflate the 'outside' bucket by construction and directly undermine the central supply-chain result.

Authors: We agree that the current method description is insufficient to allow full assessment of the supply-chain inferences. In the revised manuscript we will add a dedicated subsection detailing the dependency extraction process: parsing of package.json files from the public commit histories of the 272,142 users, resolution of version ranges against the NPM registry snapshot available at data collection time, explicit restriction to the 4433-package dependency graph, and a clear statement of the limitation that private or unlisted dependencies cannot be observed. This expanded description will also discuss the potential for under-approximation and its implications for the 'outside supply chain' category, allowing readers to evaluate the robustness of the central result. revision: yes
Referee: [Results] Results (clustering): the reported three demand groups and two contribution groups are obtained via fuzzy c-means with the number of clusters treated as a free parameter; no cluster-validity indices, stability analysis across random seeds, or sensitivity checks are described, yet these groupings are presented as a primary finding characterizing user heterogeneity.

Authors: We accept that the clustering results require additional validation to be presented as robust characterizations of user heterogeneity. The revised version will include (i) computation of standard fuzzy cluster validity indices (Xie-Beni and fuzzy silhouette) to support the choice of three demand clusters and two contribution clusters, (ii) stability analysis by repeating fuzzy c-means across multiple random seeds and reporting variation in cluster assignments, and (iii) sensitivity checks on the fuzziness parameter m. These additions will be placed in the results section alongside the existing group descriptions. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical data analysis

full rationale

The paper collects public NPM issue/PR and commit data for 4433 packages and 272k users, constructs supply chains from visible dependencies, applies fuzzy c-means clustering to identify user groups, and trains a Random Forest to predict company affiliation (AUC-ROC 0.68). No equations, derivations, or self-citations reduce any claim to its inputs by construction. All results are direct empirical outputs from standard statistical and ML methods applied to observed data; the supply-chain step is a one-time methodological preprocessing step whose outputs are then measured, not redefined. This is self-contained observational analysis with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The study relies on empirical data collection and standard ML techniques; the main assumptions are about data representativeness.

free parameters (2)

Number of clusters (demand) = 3
Determined by fuzzy c-means algorithm on effort demand patterns.
Number of clusters (contribution) = 2
Determined by fuzzy c-means algorithm on effort contribution patterns.

axioms (1)

domain assumption Public commit activity data accurately represents user participation and supply chain dependencies.
Used to construct supply chains and analyze 272,142 issue creators.

pith-pipeline@v0.9.0 · 5833 in / 1227 out tokens · 38142 ms · 2026-05-24T21:28:34.321128+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Christopher J Alberts, Audrey J Dorofee, Rita Creel, Robert J Ellison, and Carol Woody. 2011. A systemic approach for assessing software supply-chain risk. In 2011 44th Hawaii International Conference on System Sciences . IEEE, 1–8

work page 2011
[2]

Sadika Amreen, Bogdan Bichescu, Randy Bradley, Tapajit Dey, Yuxing Ma, Audris Mockus, Sara Mousavi, and Russell Zaretzki. 2019. A Methodology for Measuring FLOSS Ecosystems. In Towards Engineering Free/Libre Open Source Software (FLOSS) Ecosystems for Impact and Sustainability . Springer, Singapore, 1–29

work page 2019
[3]

James C Bezdek, Robert Ehrlich, and William Full. 1984. FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 10, 2-3 (1984), 191–203

work page 1984
[4]

Barry W. Boehm. 1991. Software risk management: principles and practices.IEEE software 8, 1 (1991), 32–41

work page 1991
[5]

Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2016. How to break an API: Cost negotiation and community values in three software ecosystems. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 109–120

work page 2016
[6]

Gerardo Canfora, Luigi Cerulo, Marta Cimitile, and Massimiliano Di Penta. 2011. Social interactions around cross-system bug fixings: the case of FreeBSD and OpenBSD. In Proceedings of the 8th working conference on mining software reposi- tories. ACM, 143–152

work page 2011
[7]

Patrick YK Chau and Kar Yan Tam. 1997. Factors affecting the adoption of open systems: an exploratory study. MIS quarterly (1997), 1–24

work page 1997
[8]

Malgorzata Ciesielska and Ann Westenholz. 2016. Dilemmas within commer- cial involvement in open source software. Journal of Organizational Change Management 29, 3 (2016), 344–360

work page 2016
[9]

Kevin Crowston and James Howison. 2003. The social structure of open source software development teams. (2003)

work page 2003
[10]

Alexandre Decan, Tom Mens, and Maëlick Claes. 2017. An empirical comparison of dependency issues in OSS packaging ecosystems. In2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) . IEEE, 2– 12

work page 2017
[11]

Alexandre Decan, Tom Mens, Maëlick Claes, and Philippe Grosjean. 2016. When GitHub meets CRAN: An analysis of inter-repository package dependency prob- lems. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 493–504

work page 2016
[12]

Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) . IEEE, 181–191

work page 2018
[13]

Tapajit Dey and Audris Mockus. 2018. Are software dependency supply chain metrics useful in predicting change of popularity of npm packages?. InProceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering. ACM, 66–69

work page 2018
[14]

Tapajit Dey and Audris Mockus. 2018. Modeling Relationship between Post- Release Faults and Usage in Mobile Software. In Proceedings of the 14th Interna- tional Conference on Predictive Models and Data Analytics in Software Engineering . ACM, 56–65

work page 2018
[15]

Hui Ding, Wanwangying Ma, Lin Chen, Yuming Zhou, and Baowen Xu. 2017. An empirical study on downstream workarounds for cross-project bugs. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC) . IEEE, 318–327

work page 2017
[16]

Nicolas Ducheneaut. 2005. Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work (CSCW) 14, 4 (2005), 323–368

work page 2005
[17]

Eugene Glynn, Brian Fitzgerald, and Chris Exton. 2005. Commercial adoption of open source software: an empirical study. In 2005 International Symposium on Empirical Software Engineering, 2005. IEEE, 10–pp

work page 2005
[18]

Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13) . IEEE Press, Piscataway, NJ, USA, 233–236. http://dl.acm.org/citation.cfm?id=2487085. 2487132

work page 2013
[19]

Karim R Lakhani and Eric Von Hippel. 2004. How open source software works:âĂĲfreeâĂİ user-to-user assistance. In Produktentwicklung mit virtuellen Communities. Springer, 303–339

work page 2004
[20]

Amanda Lee and Jeffrey C Carver. 2017. Are one-time contributors different? a comparison to core and periphery developers in floss repositories. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Mea- surement (ESEM). IEEE, 1–10

work page 2017
[21]

Amanda Lee, Jeffrey C Carver, and Amiangshu Bosu. 2017. Understanding the impressions, motivations, and barriers of one time code contributors to FLOSS projects: a survey. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 187–197

work page 2017
[22]

Wanwangying Ma, Lin Chen, Xiangyu Zhang, Yuming Zhou, and Baowen Xu

work page
[23]

In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)

How do developers fix cross-project correlated bugs? a case study on the GitHub scientific Python ecosystem. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) . IEEE, 381–392

work page 2017
[24]

Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus

work page
[25]

In IEEE Working Conference on Mining Software Repositories

World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data. In IEEE Working Conference on Mining Software Repositories . papers/ WoC.pdf

work page
[26]

Audris Mockus, Roy T Fielding, and James D Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology (TOSEM) 11, 3 (2002), 309–346

work page 2002
[27]

Peter C Rigby, Yue Cai Zhu, Samuel M Donadelli, and Audris Mockus. 2016. Quantifying and mitigating turnover-induced knowledge loss: case studies of Chrome and a project at Avaya. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) . IEEE, 1006–1016

work page 2016
[28]

Marat Valiev, Bogdan Vasilescu, and James Herbsleb. 2018. Ecosystem-level determinants of sustained activity in open-source projects: a case study of the pypi ecosystem. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 644–655

work page 2018
[29]

Eric Von Hippel. 2001. Learning from open-source software. MIT Sloan manage- ment review 42, 4 (2001), 82–86

work page 2001
[30]

Laurie Voss. 2016. how many npm users are there? (2016). https://blog.npmjs. org/post/143451680695/how-many-npm-users-are-there

work page arXiv 2016
[31]

Patrick Wagstrom, Corey Jergensen, and Anita Sarma. 2012. Roles in a networked software development ecosystem: A case study in GitHub. (2012)

work page 2012
[32]

Linda Wallace, Mark Keil, and Arun Rai. 2004. Understanding software project risk: a cluster analysis. Information & management 42, 1 (2004), 115–125

work page 2004
[33]

Erik Wittern, Philippe Suter, and Shriram Rajagopalan. 2016. A look at the dynamics of the JavaScript package ecosystem. In Mining Software Repositories (MSR), 2016 IEEE/ACM 13th Working Conference on . IEEE, 351–361

work page 2016
[34]

Jialiang Xie, Minghui Zhou, and Audris Mockus. 2013. Impact of triage: a study of mozilla and gnome. In 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement . IEEE, 247–250

work page 2013
[35]

Rodrigo Elizalde Zapata, Raula Gaikovina Kula, Bodin Chinthanet, Takashi Ishio, Kenichi Matsumoto, and Akinori Ihara. 2018. Towards smoother library migra- tions: A look at vulnerable dependency migrations at function level for npm JavaScript packages. In 2018 IEEE International Conference on Software Mainte- nance and Evolution (ICSME) . IEEE, 559–563

work page 2018
[36]

Ahmed Zerouali, Eleni Constantinou, Tom Mens, Gregorio Robles, and Jesús González-Barahona. 2018. An empirical analysis of technical lag in npm package dependencies. In International Conference on Software Reuse . Springer, 95–110

work page 2018
[37]

Minghui Zhou, Audris Mockus, Xiujuan Ma, Lu Zhang, and Hong Mei. 2016. Inflow and retention in oss communities with commercial involvement: A case study of three hybrid projects. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 2 (2016), 13

work page 2016

[1] [1]

Christopher J Alberts, Audrey J Dorofee, Rita Creel, Robert J Ellison, and Carol Woody. 2011. A systemic approach for assessing software supply-chain risk. In 2011 44th Hawaii International Conference on System Sciences . IEEE, 1–8

work page 2011

[2] [2]

Sadika Amreen, Bogdan Bichescu, Randy Bradley, Tapajit Dey, Yuxing Ma, Audris Mockus, Sara Mousavi, and Russell Zaretzki. 2019. A Methodology for Measuring FLOSS Ecosystems. In Towards Engineering Free/Libre Open Source Software (FLOSS) Ecosystems for Impact and Sustainability . Springer, Singapore, 1–29

work page 2019

[3] [3]

James C Bezdek, Robert Ehrlich, and William Full. 1984. FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 10, 2-3 (1984), 191–203

work page 1984

[4] [4]

Barry W. Boehm. 1991. Software risk management: principles and practices.IEEE software 8, 1 (1991), 32–41

work page 1991

[5] [5]

Christopher Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2016. How to break an API: Cost negotiation and community values in three software ecosystems. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 109–120

work page 2016

[6] [6]

Gerardo Canfora, Luigi Cerulo, Marta Cimitile, and Massimiliano Di Penta. 2011. Social interactions around cross-system bug fixings: the case of FreeBSD and OpenBSD. In Proceedings of the 8th working conference on mining software reposi- tories. ACM, 143–152

work page 2011

[7] [7]

Patrick YK Chau and Kar Yan Tam. 1997. Factors affecting the adoption of open systems: an exploratory study. MIS quarterly (1997), 1–24

work page 1997

[8] [8]

Malgorzata Ciesielska and Ann Westenholz. 2016. Dilemmas within commer- cial involvement in open source software. Journal of Organizational Change Management 29, 3 (2016), 344–360

work page 2016

[9] [9]

Kevin Crowston and James Howison. 2003. The social structure of open source software development teams. (2003)

work page 2003

[10] [10]

Alexandre Decan, Tom Mens, and Maëlick Claes. 2017. An empirical comparison of dependency issues in OSS packaging ecosystems. In2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) . IEEE, 2– 12

work page 2017

[11] [11]

Alexandre Decan, Tom Mens, Maëlick Claes, and Philippe Grosjean. 2016. When GitHub meets CRAN: An analysis of inter-repository package dependency prob- lems. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 493–504

work page 2016

[12] [12]

Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) . IEEE, 181–191

work page 2018

[13] [13]

Tapajit Dey and Audris Mockus. 2018. Are software dependency supply chain metrics useful in predicting change of popularity of npm packages?. InProceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering. ACM, 66–69

work page 2018

[14] [14]

Tapajit Dey and Audris Mockus. 2018. Modeling Relationship between Post- Release Faults and Usage in Mobile Software. In Proceedings of the 14th Interna- tional Conference on Predictive Models and Data Analytics in Software Engineering . ACM, 56–65

work page 2018

[15] [15]

Hui Ding, Wanwangying Ma, Lin Chen, Yuming Zhou, and Baowen Xu. 2017. An empirical study on downstream workarounds for cross-project bugs. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC) . IEEE, 318–327

work page 2017

[16] [16]

Nicolas Ducheneaut. 2005. Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work (CSCW) 14, 4 (2005), 323–368

work page 2005

[17] [17]

Eugene Glynn, Brian Fitzgerald, and Chris Exton. 2005. Commercial adoption of open source software: an empirical study. In 2005 International Symposium on Empirical Software Engineering, 2005. IEEE, 10–pp

work page 2005

[18] [18]

Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13) . IEEE Press, Piscataway, NJ, USA, 233–236. http://dl.acm.org/citation.cfm?id=2487085. 2487132

work page 2013

[19] [19]

Karim R Lakhani and Eric Von Hippel. 2004. How open source software works:âĂĲfreeâĂİ user-to-user assistance. In Produktentwicklung mit virtuellen Communities. Springer, 303–339

work page 2004

[20] [20]

Amanda Lee and Jeffrey C Carver. 2017. Are one-time contributors different? a comparison to core and periphery developers in floss repositories. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Mea- surement (ESEM). IEEE, 1–10

work page 2017

[21] [21]

Amanda Lee, Jeffrey C Carver, and Amiangshu Bosu. 2017. Understanding the impressions, motivations, and barriers of one time code contributors to FLOSS projects: a survey. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 187–197

work page 2017

[22] [22]

Wanwangying Ma, Lin Chen, Xiangyu Zhang, Yuming Zhou, and Baowen Xu

work page

[23] [23]

In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)

How do developers fix cross-project correlated bugs? a case study on the GitHub scientific Python ecosystem. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) . IEEE, 381–392

work page 2017

[24] [24]

Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus

work page

[25] [25]

In IEEE Working Conference on Mining Software Repositories

World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data. In IEEE Working Conference on Mining Software Repositories . papers/ WoC.pdf

work page

[26] [26]

Audris Mockus, Roy T Fielding, and James D Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology (TOSEM) 11, 3 (2002), 309–346

work page 2002

[27] [27]

Peter C Rigby, Yue Cai Zhu, Samuel M Donadelli, and Audris Mockus. 2016. Quantifying and mitigating turnover-induced knowledge loss: case studies of Chrome and a project at Avaya. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) . IEEE, 1006–1016

work page 2016

[28] [28]

Marat Valiev, Bogdan Vasilescu, and James Herbsleb. 2018. Ecosystem-level determinants of sustained activity in open-source projects: a case study of the pypi ecosystem. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 644–655

work page 2018

[29] [29]

Eric Von Hippel. 2001. Learning from open-source software. MIT Sloan manage- ment review 42, 4 (2001), 82–86

work page 2001

[30] [30]

Laurie Voss. 2016. how many npm users are there? (2016). https://blog.npmjs. org/post/143451680695/how-many-npm-users-are-there

work page arXiv 2016

[31] [31]

Patrick Wagstrom, Corey Jergensen, and Anita Sarma. 2012. Roles in a networked software development ecosystem: A case study in GitHub. (2012)

work page 2012

[32] [32]

Linda Wallace, Mark Keil, and Arun Rai. 2004. Understanding software project risk: a cluster analysis. Information & management 42, 1 (2004), 115–125

work page 2004

[33] [33]

Erik Wittern, Philippe Suter, and Shriram Rajagopalan. 2016. A look at the dynamics of the JavaScript package ecosystem. In Mining Software Repositories (MSR), 2016 IEEE/ACM 13th Working Conference on . IEEE, 351–361

work page 2016

[34] [34]

Jialiang Xie, Minghui Zhou, and Audris Mockus. 2013. Impact of triage: a study of mozilla and gnome. In 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement . IEEE, 247–250

work page 2013

[35] [35]

Rodrigo Elizalde Zapata, Raula Gaikovina Kula, Bodin Chinthanet, Takashi Ishio, Kenichi Matsumoto, and Akinori Ihara. 2018. Towards smoother library migra- tions: A look at vulnerable dependency migrations at function level for npm JavaScript packages. In 2018 IEEE International Conference on Software Mainte- nance and Evolution (ICSME) . IEEE, 559–563

work page 2018

[36] [36]

Ahmed Zerouali, Eleni Constantinou, Tom Mens, Gregorio Robles, and Jesús González-Barahona. 2018. An empirical analysis of technical lag in npm package dependencies. In International Conference on Software Reuse . Springer, 95–110

work page 2018

[37] [37]

Minghui Zhou, Audris Mockus, Xiujuan Ma, Lu Zhang, and Hong Mei. 2016. Inflow and retention in oss communities with commercial involvement: A case study of three hybrid projects. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 2 (2016), 13

work page 2016