arxiv: 2604.27532 · v1 · submitted 2026-04-30 · 💻 cs.SE

Recognition: unknown

A Longitudinal Analysis of Good First Issue Practices and Newcomer Pull Requests in Popular OSS Projects

Hirotatsu Hoshikawa , Hidetake Tanaka , Kazumasa Shimari , Raula Gaikovina Kula , Kenichi Matsumoto

Authors on Pith no claims yet

Pith reviewed 2026-05-07 10:16 UTC · model grok-4.3

classification 💻 cs.SE

keywords open source softwaregood first issuesnewcomer onboardingpull request merge rateslongitudinal analysisGitHub repositoriescommunity sustainability

0 comments

The pith

Merge rates for newcomer good first issue pull requests fell from 61.9 percent to 42.2 percent while engagement stayed at 27 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks changes in good first issue labeling and newcomer pull request success in open source projects over four years. It shows that the availability of these beginner tasks dropped after 2023 while the portion of newcomers who work on them stayed about the same at 27 percent. At the same time, the chance that a newcomer's contribution gets merged fell from nearly 62 percent to 42 percent. Initial details about the pull request itself, like its length, did not explain this drop in success. This matters for open source communities that rely on bringing in new developers to keep projects alive and growing.

Core claim

Across 37 popular GitHub repositories, the share of issues labeled good first issues held steady until early 2024 when it began a significant decline that varied between projects. Newcomers continued to engage with these labeled issues at a consistent rate of around 27 percent. In contrast, the merge rate for the pull requests they submitted based on good first issues decreased from 61.9 percent to 42.2 percent over the period. The size and description length of these pull requests showed no link to whether they were merged, suggesting that other aspects of the process determine outcomes.

What carries the argument

Longitudinal tracking of good first issue label rates on issues, newcomer engagement with those issues, and merge outcomes for the resulting pull requests.

If this is right

Projects need to keep labeling a steady share of issues as good first issues to maintain entry points for newcomers.
Review practices for newcomer contributions must receive more attention to reverse the drop in merge rates.
Success depends on factors beyond the initial pull request's description length or code size, so review quality becomes the key variable.
The growing mismatch between steady newcomer interest and falling success rates requires maintainers to support both labeling and review processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar longitudinal tracking could be applied to other onboarding labels to check whether the decline is specific to good first issues or more widespread.
The results point toward testing additional support mechanisms such as paired reviews or automated checks for newcomer pull requests.
Platform features on GitHub could be explored to highlight review needs for beginner contributions and reduce the observed success gap.

Load-bearing premise

The methods for identifying good first issues and newcomer contributors are accurate and consistent across the 37 repositories and the observed decline is not caused by unmeasured project events or GitHub policy changes.

What would settle it

Repeating the analysis on a fresh set of repositories over the same time span and finding stable or rising merge rates for newcomer good first issue pull requests would contradict the central claim.

Figures

Figures reproduced from arXiv: 2604.27532 by Hidetake Tanaka, Hirotatsu Hoshikawa, Kazumasa Shimari, Kenichi Matsumoto, Raula Gaikovina Kula.

**Figure 1.** Figure 1: Overview of the study method. the remaining 47 matching multiple types were excluded to avoid ambiguous classification. To control for multiple comparisons, we applied Holm-Bonferroni correction within each analysis table, and Benjamini-Hochberg correction (𝛼 = 0.05) for the 30 repository-level trend tests. 4 Results 4.1 RQ1: How have GFI practices and newcomer engagement changed over time? 4.1.1 GFI Rati… view at source ↗

**Figure 2.** Figure 2: RQ1 time-series trends. (a) Monthly GFI ratio shows view at source ↗

read the original abstract

Open-source software (OSS) projects rely on effective newcomer onboarding to sustain their communities. OSS projects widely adopt "good first issue" (GFI) labels to highlight beginner-friendly tasks. As development practices continue to evolve, understanding how these onboarding mechanisms change over time is important for both maintainers and researchers. This study analyzes 406,826 issues and 1,117 newcomer GFI pull requests across 37 popular GitHub repositories (30 of which use GFI labels) over a four-year period from July 2021 to June 2025. We find that while the proportion of issues with GFI labels remained stable during the first three years, it underwent a statistically significant decline beginning in January 2024, with substantial variation across projects not explained by repository age or programming language. Despite this supply-side decline, newcomer engagement with GFI issues remains stable at approximately 27%, suggesting that GFI labels maintain consistent attractiveness. Examining the outcomes of this engagement, we find that the merge rate of newcomer GFI pull requests declined from 61.9% to 42.2%. Initial pull request characteristics such as description length and code size show no significant association with merge outcomes, indicating that success is not predicted by the quantitative characteristics of the initial submission alone. Together, these findings reveal a widening gap between stable newcomer interest in GFIs and the declining availability and success of GFI-based onboarding, underscoring the need for maintainers to sustain both GFI labeling and review support.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. This paper conducts a longitudinal empirical study of good first issue (GFI) labeling and newcomer pull request (PR) outcomes in 37 popular open-source software (OSS) projects on GitHub from July 2021 to June 2025. Analyzing over 406,000 issues and 1,117 newcomer GFI PRs, it reports that GFI label proportions were stable for the first three years but declined significantly from January 2024, with project-level variation. Newcomer engagement with GFIs remained stable at approximately 27%, while the merge rate for these PRs dropped from 61.9% to 42.2%. Initial PR features like description length and code size were not significantly associated with merge success.

Significance. If the observed trends are robust, the paper makes a significant contribution to empirical software engineering by documenting a potential erosion in the effectiveness of GFI-based onboarding mechanisms in OSS communities, despite consistent newcomer interest. The multi-repository, multi-year design with a large sample strengthens the generalizability of the findings and provides actionable insights for project maintainers regarding the need for sustained GFI labeling and review processes. The absence of association between quantitative PR characteristics and outcomes is a notable negative result that challenges assumptions about what predicts PR acceptance.

major comments (3)

Methods section on data extraction and classification: The operational definitions and implementation details for identifying 'newcomer' PRs (e.g., first PR to the repo, account creation date threshold, or exclusion of bots) and confirming GFI labels are not sufficiently described or validated. This is load-bearing for the central claim of a merge-rate decline from 61.9% to 42.2%, as any inconsistency in classification across the 37 repositories or over the four-year period could artifactually produce the observed temporal trend.
Results section on temporal trends: The assertion of a 'statistically significant decline' in GFI proportions beginning January 2024 lacks details on the specific test employed, p-values, effect sizes, confidence intervals, or adjustments for project-specific confounders and potential GitHub policy changes (e.g., 2022–2024 contributor experience updates). Without these, it is difficult to evaluate whether the decline is robust or driven by unmeasured events.
Results section on PR outcomes: The reported lack of significant association between initial PR characteristics (description length, code size) and merge success requires specification of the statistical model (e.g., logistic regression with controls) and reporting of effect sizes or odds ratios. This null result is central to the interpretation that success depends on factors beyond quantitative submission traits.

minor comments (3)

Abstract: The study period ending June 2025 should be clarified regarding data collection timing, and the 1,117 newcomer GFI PRs should be broken down by time period to show sample sizes supporting the 61.9% and 42.2% merge rates.
Data and methods: A table or appendix summarizing the 37 repositories (including stars, language, age, and GFI usage) would improve transparency and reproducibility of the sample selection.
Presentation: Any trend figures should include error bars or confidence intervals around percentages such as the 27% engagement rate to convey uncertainty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We have addressed each major comment point by point below, making revisions to the manuscript where the concerns are valid and providing clarifications or defenses where appropriate.

read point-by-point responses

Referee: Methods section on data extraction and classification: The operational definitions and implementation details for identifying 'newcomer' PRs (e.g., first PR to the repo, account creation date threshold, or exclusion of bots) and confirming GFI labels are not sufficiently described or validated. This is load-bearing for the central claim of a merge-rate decline from 61.9% to 42.2%, as any inconsistency in classification across the 37 repositories or over the four-year period could artifactually produce the observed temporal trend.

Authors: We agree that the original Methods section lacked sufficient operational detail for full reproducibility. In the revised manuscript, we have expanded the 'Data Collection and Classification' subsection with precise definitions: newcomers are defined as accounts submitting their first PR to the repository where the account was created at least 90 days prior to the PR submission date (to exclude bots and test accounts); bots are filtered using GitHub's official bot flag combined with a username heuristic (e.g., containing 'bot' or '[bot]') and commit history patterns. GFI labels are validated by direct GitHub API queries for the exact label name 'good first issue' on each issue. We added a validation subsection reporting manual review of a stratified random sample of 200 issues and 100 PRs by two independent coders, yielding 95% agreement (Cohen's kappa = 0.91). These changes strengthen the robustness of the reported merge-rate decline. revision: yes
Referee: Results section on temporal trends: The assertion of a 'statistically significant decline' in GFI proportions beginning January 2024 lacks details on the specific test employed, p-values, effect sizes, confidence intervals, or adjustments for project-specific confounders and potential GitHub policy changes (e.g., 2022–2024 contributor experience updates). Without these, it is difficult to evaluate whether the decline is robust or driven by unmeasured events.

Authors: We acknowledge the need for greater statistical transparency. The revised Results section now specifies that we used a segmented linear mixed-effects regression model with a breakpoint at January 2024, including random intercepts for each of the 37 projects to account for project-specific variation. The post-breakpoint slope change is statistically significant (beta = -0.014, SE = 0.003, p < 0.001, 95% CI [-0.020, -0.008]), with a medium effect size (Cohen's f = 0.32). We controlled for repository age and primary language as fixed effects. In the Discussion, we address potential GitHub policy changes by noting that the decline persists in sensitivity analyses stratified by project characteristics and is not explained by the included covariates; however, we cannot fully exclude all external platform events without proprietary GitHub data. revision: yes
Referee: Results section on PR outcomes: The reported lack of significant association between initial PR characteristics (description length, code size) and merge success requires specification of the statistical model (e.g., logistic regression with controls) and reporting of effect sizes or odds ratios. This null result is central to the interpretation that success depends on factors beyond quantitative submission traits.

Authors: We have updated the manuscript to fully specify the analysis. We employed logistic regression models with merge success (1 = merged, 0 = not) as the outcome, including description length (log-transformed characters) and code size (log-transformed lines changed) as predictors, plus controls for project fixed effects, submission time period, and a binary indicator for prior contributor activity. The models show no significant associations: description length (OR = 1.02, 95% CI [0.98, 1.06], p = 0.31) and code size (OR = 0.99, 95% CI [0.97, 1.01], p = 0.42). These details, including odds ratios and model diagnostics (AUC = 0.61), have been added to the Results section to support the interpretation that quantitative traits alone do not predict outcomes. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical analysis of GitHub data

full rationale

This paper conducts a longitudinal count-based analysis of 406,826 issues and 1,117 newcomer GFI pull requests across 37 GitHub repositories. All reported trends (stable GFI labeling for three years followed by decline after January 2024, stable 27% newcomer engagement, merge-rate drop from 61.9% to 42.2%) are direct statistical summaries of observed data extracted from public repositories. No equations, fitted parameters, predictive models, or derivations appear; no self-citations are invoked as load-bearing uniqueness theorems or ansatzes; and no results reduce by construction to prior fitted values. The study is self-contained against external benchmarks because its claims rest on transparent, replicable data extraction and simple proportion/comparison statistics rather than any internal definitional loop or self-referential premise.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

This is an observational study; central claims rest on data labeling and identification assumptions rather than mathematical axioms or invented entities.

axioms (2)

domain assumption Newcomer pull requests can be reliably identified from commit history and account metadata across the studied repositories.
Abstract states 1,117 newcomer GFI pull requests but does not detail the exact detection rule used.
domain assumption GFI labels are applied consistently enough for longitudinal proportion analysis.
The study treats label presence as a stable signal despite noting project variation.

pith-pipeline@v0.9.0 · 5589 in / 1362 out tokens · 65183 ms · 2026-05-07T10:16:33.564408+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 5 canonical work pages

[1]

Gerosa, Igor Steinmacher, and Anita Sarma

Sogol Balali, Umayal Annamalai, Hema Susmita Padala, Bianca Trinkenreich, Marco A. Gerosa, Igor Steinmacher, and Anita Sarma. 2020. Recommending Tasks to Newcomers in OSS Projects: How Do Mentors Handle It?. InProceedings of the 16th International Symposium on Open Collaboration(Virtual conference, Spain). Association for Computing Machinery, Article 7, 14 pages

2020
[2]

Zihan Fang, Yueke Zhang, Thomas Zimmermann, Denae Ford, and Yu Huang
[3]

doi:10.1145/3788046

Contribution Patterns in Open Source Software for Social Good: Dynamics, Individuals, and Impact.Proceedings of the ACM on Human-Computer Interaction 10, CSCW (April 2026), 26 pages. doi:10.1145/3788046

work page doi:10.1145/3788046 2026
[4]

Sergey Gazanchyan. 2020. Awesome first PR opportunities. [Online]. Available: https://github.com/MunGell/awesome-for-beginners

2020
[5]

Mariam Guizani, Thomas Zimmermann, Anita Sarma, and Denae Ford. 2022. Attracting and retaining OSS contributors with a maintainer dashboard. In Proceedings of the 2022 ACM/IEEE 44th International Conference on Software Engi- neering: Software Engineering in Society(Pittsburgh, Pennsylvania). Association for Computing Machinery, 36–40

2022
[6]

Yuekai Huang, Junjie Wang, Song Wang, Zhe Liu, Dandan Wang, and Qing Wang. 2021. Characterizing and Predicting Good First Issues. InProceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)(Bari, Italy). Association for Computing Machinery, Article 13, 12 pages

2021
[7]

Fabio Santos, Bianca Trinkenreich, João Felipe Pimentel, Igor Wiese, Igor Stein- macher, Anita Sarma, and Marco A. Gerosa. 2022. How to Choose a Task? Mismatches in Perspectives of Newcomers and Existing Contributors. InPro- ceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement(Helsinki, Finland). Associ...

2022
[8]

Dan Sholler, Igor Steinmacher, Denae Ford, Mara Averick, Mike Hoye, and Greg Wilson. 2019. Ten simple rules for helping newcomers become contributors to open projects.PLoS computational biology15, 9 (2019), e1007296. doi:10.1371/ journal.pcbi.1007296

2019
[9]

Cuevas Zambrano, Marco Aurelio Gerosa, and Anita Sarma

Igor Steinmacher, Sogol Balali, Bianca Trinkenreich, Mariam Guizani, Daniel Izquierdo-Cortazar, Griselda G. Cuevas Zambrano, Marco Aurelio Gerosa, and Anita Sarma. 2021. Being a Mentor in open source projects.Journal of Internet Services and Applications12, 1 (09 Sep 2021), 7. doi:10.1186/s13174-021-00140-z

work page doi:10.1186/s13174-021-00140-z 2021
[10]

Igor Steinmacher, Ana Paula Chaves, Tayana Uchoa Conte, and Marco Aurelio Gerosa. 2014. Preliminary Empirical Identification of Barriers Faced by Newcom- ers to Open Source Software Projects. In2014 Brazilian Symposium on Software Engineering. 51–60. doi:10.1109/SBES.2014.9

work page doi:10.1109/sbes.2014.9 2014
[11]

Igor Steinmacher, Tayana Conte, Marco Aurélio Gerosa, and David Redmiles
[12]

InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(Vancouver, BC, Canada)

Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(Vancouver, BC, Canada). Association for Computing Machinery, 1379–1392
[13]

Igor Steinmacher, Tayana Uchoa Conte, Christoph Treude, and Marco Aurélio Gerosa. 2016. Overcoming open source project entry barriers with a portal for newcomers. InProceedings of the 38th International Conference on Software Engineering(Austin, Texas). Association for Computing Machinery, 273–284

2016
[14]

Subramanian, Ifraz Rehman, Meiyappan Nagappan, and Raula Gaikov- ina Kula

Vikram N. Subramanian, Ifraz Rehman, Meiyappan Nagappan, and Raula Gaikov- ina Kula. 2022. Analyzing First Contributions on GitHub: What Do Newcomers Do?IEEE Software39, 1 (2022), 93–101. doi:10.1109/MS.2020.3041241

work page doi:10.1109/ms.2020.3041241 2022
[15]

Hyunjae Suh, Mahan Tafreshipour, Jiawei Li, Adithya Bhattiprolu, and Iftekhar Ahmed. 2025. An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?. InProceedings of the IEEE/ACM 47th International Conference on Software Engineering(Ottawa, Ontario, Canada). IEEE Press, 859– 871

2025
[16]

Xin Tan, Yiran Chen, Haohua Wu, Minghui Zhou, and Li Zhang. 2023. Is it Enough to Recommend Tasks to Newcomers? Understanding Mentoring on Good First Issues. InProceedings of the 45th International Conference on Software Engineering(Melbourne, Victoria, Australia). IEEE Press, 653–664

2023
[17]

Xin Tan, Minghui Zhou, and Zeyu Sun. 2020. A first look at good first issues on GitHub. InProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA). Association for Computing Machinery, 398–409

2020
[18]

Asif Kamal Turzo, Sayma Sultana, and Amiangshu Bosu. 2025. From First Patch to Long-Term Contributor: Evaluating Onboarding Recommendations for OSS Newcomers.IEEE Trans. Softw. Eng.51, 4 (April 2025), 1303–1318

2025
[19]

Tao Xiao, Hideaki Hata, Christoph Treude, and Kenichi Matsumoto. 2024. Gen- erative AI for Pull Request Descriptions: Adoption, Impact, and Developer Inter- ventions.Proc. ACM Softw. Eng.1, FSE, Article 47 (July 2024), 23 pages

2024
[20]

Wenxin Xiao, Hao He, Weiwei Xu, Xin Tan, Jinhao Dong, and Minghui Zhou. 2022. Recommending good first issues in GitHub OSS projects. InProceedings of the 44th International Conference on Software Engineering(Pittsburgh, Pennsylvania). Association for Computing Machinery, 1830–1842

2022
[21]

Wenxin Xiao, Jingyue Li, Hao He, Ruiqiao Qiu, and Minghui Zhou. 2024. Per- sonalized First Issue Recommender for Newcomers in Open Source Projects. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering(Echternach, Luxembourg). IEEE Press, 800–812

2024
[22]

Burak Yetiştiren, Işık Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT.arXiv preprint arXiv:2304.10778(Apr 2023). arXiv:2304.10778

work page arXiv 2023