Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes

Giuseppe Destefanis; Matteo Vaccargiu; Roberto Tonelli; Ronnie de Souza Santos; Sabrina Aufiero; Silvia Bartolucci

arxiv: 2603.24501 · v3 · pith:2BSX2Z77new · submitted 2026-03-25 · 💻 cs.SE

Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes

Matteo Vaccargiu , Sabrina Aufiero , Silvia Bartolucci , Ronnie de Souza Santos , Roberto Tonelli , Giuseppe Destefanis This is my paper

Pith reviewed 2026-05-22 10:08 UTC · model grok-4.3

classification 💻 cs.SE

keywords label-diff congruencepull request reviewscontributor experienceKubernetesGitHub labelscode review collaborationopen source developmentdiscussion characteristics

0 comments

The pith

In Kubernetes pull requests, alignment between labels and code changes leads to fewer review participants for experienced developers but more for newcomers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how labels on pull requests align with the actual files modified in the Kubernetes project. It introduces label-diff congruence as the match between assigned labels and code diffs, then tracks its frequency and effects across more than 18,000 pull requests. The work finds that this alignment is common, stable over a decade, and often corrected during review, yet it does not change merge times. Instead, statistical models show it shapes discussion volume in opposite ways depending on contributor experience: core developers see quieter reviews while one-time contributors see greater engagement. The authors conclude that maintaining label accuracy can support efficient collaboration for experts and greater visibility for newcomers in similar projects.

Core claim

Label-diff congruence, the alignment between pull request labels and modified files, occurs with perfect match in 46.6 percent of cases, remains stable from 2014 to 2025, and is maintained through corrections in 9.2 percent of pull requests. Quantile regression and negative binomial models stratified by experience level show no link to time-to-merge, but higher congruence predicts 18 percent fewer review participants among core developers (81 percent of the sample) and 28 percent more participants among one-time contributors. The result is that label-diff congruence influences how collaboration unfolds, supporting efficiency for experienced developers and visibility for newcomers.

What carries the argument

Label-diff congruence, the measure of alignment between the area labels on a pull request and the files appearing in its diff.

If this is right

Label-diff congruence does not affect how quickly pull requests merge.
Higher congruence is linked to quieter reviews with 18 percent fewer participants among core developers.
Higher congruence is linked to more engaged reviews with 28 percent more participants among one-time contributors.
Label corrections occur routinely during review to maintain alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Other open-source projects that rely on area labels for triage may observe similar experience-dependent patterns in review participation.
Automated checks that flag low label-diff congruence could help maintainers reduce coordination issues before reviews begin.
The opposite effects on discussion volume suggest that label systems may serve different coordination roles for different contributor groups.

Load-bearing premise

That statistical models stratified by contributor experience can isolate the effect of label-diff congruence on discussion characteristics from other unmeasured factors in the review process.

What would settle it

A re-analysis that adds controls for pull request size, number of files changed, or technical complexity and finds the associations between congruence and participant count disappear.

Figures

Figures reproduced from arXiv: 2603.24501 by Giuseppe Destefanis, Matteo Vaccargiu, Roberto Tonelli, Ronnie de Souza Santos, Sabrina Aufiero, Silvia Bartolucci.

**Figure 1.** Figure 1: Overview of the methodology. Data from the Kubernetes repository is processed to construct label–diff congruence [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Quarterly median congruence with Theil–Sen ro [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Congruence effects by contributor tier (IRRs with 95% CIs). Left: comments. Right: participants. Gray dashed line [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Labels on platforms such as GitHub support triage and coordination, yet little is known about how well they align with code modifications or how such alignment affects collaboration across contributor experience levels. We present a case study of the Kubernetes project, introducing label-diff congruence - the alignment between pull request labels and modified files - and examining its prevalence, stability, behavioral validation, and relationship to collaboration outcomes across contributor tiers. We analyse 18,020 pull requests (2014--2025) with area labels and complete file diffs, validate alignment through analysis of over one million review comments and label corrections, and test associations with time-to-merge and discussion characteristics using quantile regression and negative binomial models stratified by contributor experience. Congruence is prevalent (46.6\% perfect alignment), stable over years, and routinely maintained (9.2\% of PRs corrected during review). It does not predict merge speed but shapes discussion: among core developers (81\% of the sample), higher congruence predicts quieter reviews (18\% fewer participants), whereas among one-time contributors it predicts more engagement (28\% more participants). Label-diff congruence influences how collaboration unfolds during review, supporting efficiency for experienced developers and visibility for newcomers. For projects with similar labeling conventions, monitoring alignment can help detect coordination friction and provide guidance when labels and code diverge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines label-diff congruence and reports it shapes review participation differently for core vs. one-time Kubernetes contributors, but the observational models leave room for unmeasured PR traits to drive the results.

read the letter

The main takeaway is that better label-code alignment in Kubernetes PRs is linked to fewer review participants among core developers but more among one-timers. They introduce label-diff congruence as the measure and test its ties to discussion volume and merge time using a large sample stratified by experience level. The work does a solid job on scale and validation. Eighteen thousand PRs over a decade, a check for stability across years, and cross-referencing against a million review comments plus label corrections give the descriptive claims real grounding. That validation step shows the metric tracks something reviewers actually act on, which is more than many alignment studies manage. The modeling uses quantile regression and negative binomial regressions split by contributor tier, which is a reasonable way to look for differential patterns. The soft spot is the lack of detail on controls. The abstract does not mention adjusting for PR size, file count, or the label assignment process itself. If one-time contributors disproportionately touch files outside standard label areas, that could produce the higher participant counts without congruence being the active factor. The same issue could operate inside the core-developer group. Because the design is observational and limited to one project, the reported associations stay vulnerable to those kinds of correlated traits. This is the sort of paper that would interest empirical software engineering researchers who study coordination in large open-source projects or people building review tools. A reader focused on newcomer onboarding or label-based triage could extract practical signals from the experience-stratified results. I would send it to peer review. The data volume and validation work give it enough substance for referees to engage with, even if the authors need to tighten the controls and discuss generalizability in revisions.

Referee Report

1 major / 1 minor

Summary. The paper conducts a case study on the Kubernetes project, introducing the concept of label-diff congruence as the alignment between pull request labels and the files modified in the PR. Analyzing 18,020 PRs from 2014-2025, it finds that congruence is prevalent (46.6% perfect alignment), stable over time, and frequently maintained through corrections during review (9.2% of PRs). Using quantile regression and negative binomial models stratified by contributor experience, the study shows that congruence does not affect time-to-merge but influences discussion characteristics: higher congruence is associated with 18% fewer participants in reviews for core developers, while for one-time contributors it predicts 28% more participants. The authors conclude that label-diff congruence supports efficiency for experienced developers and visibility for newcomers.

Significance. If the reported associations are robust to potential confounders, this work offers important insights into coordination mechanisms in large open-source projects. The large sample size spanning over a decade, combined with validation through analysis of over one million review comments and label corrections, provides strong descriptive evidence for the prevalence and stability of label-diff congruence. The stratification by contributor tiers and the differential effects observed highlight practical implications for how labeling practices can be monitored to improve collaboration dynamics.

major comments (1)

[Modeling approach (as described for quantile regression and negative binomial models)] The quantile regression and negative binomial models used to test associations between label-diff congruence and discussion characteristics (described in the abstract) are stratified by contributor experience (core vs. one-time) but provide no indication of controls for PR-level covariates such as size, number of files changed, or file entropy. This raises the possibility that the reported effects (18% fewer participants for high-congruence core PRs; 28% more for one-time contributors) are driven by correlated PR traits rather than congruence per se, particularly since one-time contributors may submit PRs whose file sets naturally diverge from existing labels. This is load-bearing for the central claim that congruence shapes collaboration outcomes.

minor comments (1)

[Introduction/Methods] The operational definition of label-diff congruence would benefit from an explicit formula or pseudocode example early in the manuscript to support exact replication of the 46.6% perfect-alignment figure and the congruence measure used in the regressions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we respond point-by-point to the major comment and describe the revisions we will undertake.

read point-by-point responses

Referee: The quantile regression and negative binomial models used to test associations between label-diff congruence and discussion characteristics (described in the abstract) are stratified by contributor experience (core vs. one-time) but provide no indication of controls for PR-level covariates such as size, number of files changed, or file entropy. This raises the possibility that the reported effects (18% fewer participants for high-congruence core PRs; 28% more for one-time contributors) are driven by correlated PR traits rather than congruence per se, particularly since one-time contributors may submit PRs whose file sets naturally diverge from existing labels. This is load-bearing for the central claim that congruence shapes collaboration outcomes.

Authors: We appreciate the referee's concern about potential confounding. Our models are stratified by contributor experience to capture heterogeneity in collaboration patterns, which is central to our research question. However, we acknowledge that the current specifications do not include explicit controls for PR-level covariates such as the number of files changed, PR size, or file entropy. In the revised version we will augment both the quantile regression and negative binomial models with these covariates (along with any other relevant PR characteristics available in our dataset). We will present the updated coefficient estimates and discuss whether the reported associations with review participation remain robust. This addition will directly address the possibility that the observed effects are driven by correlated PR traits rather than label-diff congruence itself. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical analysis with no derivations or self-referential reductions

full rationale

The paper is a case study performing statistical analysis (quantile regression and negative binomial models) on 18,020 observed Kubernetes pull requests to report associations between label-diff congruence and discussion characteristics, stratified by contributor experience. No derivation chain exists that reduces predictions or results to fitted inputs by construction, no self-definitional steps, and no load-bearing self-citations or uniqueness theorems are invoked to justify the central claims. The reported effects (e.g., 18% fewer participants for high-congruence core PRs) are direct outputs of the applied models on external data, not tautological renamings or ansatzes smuggled via prior work. The analysis is self-contained against the observed dataset and does not rely on any internal equivalence that would constitute circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the empirical construction of label-diff congruence from PR labels and file diffs, plus the validity of experience-based stratification and regression models. Limited information available from abstract only.

axioms (2)

domain assumption Analysis restricted to PRs that have area labels and complete file diffs
Abstract states the sample is limited to such PRs (18,020 total).
domain assumption Contributor experience can be meaningfully stratified into tiers such as core developers and one-time contributors
Models are described as stratified by contributor experience.

invented entities (1)

label-diff congruence no independent evidence
purpose: Quantify alignment between pull request labels and modified files
Newly introduced metric whose prevalence, stability, and behavioral effects are the focus of the study.

pith-pipeline@v0.9.0 · 5784 in / 1247 out tokens · 38042 ms · 2026-05-22T10:08:52.433383+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce label–diff congruence... test associations with time-to-merge and discussion characteristics using quantile regression and negative binomial models stratified by contributor experience.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Congruence is prevalent (46.6% perfect alignment), stable over years... higher congruence predicts quieter reviews (18% fewer participants) for core developers.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

[1]

Amirali Alami, Kasia MacLean, Emad Shihab, and Foutse Khomh. 2025. The Role of Intrinsic Drivers and the Impact of LLMs in Code Review: Examining Accountability for Code Quality.ACM Transactions on Software Engineering and Methodology34, 8, Article 233 (2025). doi:10.1145/3721127

work page doi:10.1145/3721127 2025
[2]

John Anvik, Lyndon Hiew, and Gail C Murphy. 2006. Who should fix this bug?. In Proceedings of the 28th international conference on Software engineering. 361–370

work page 2006
[3]

Sabrina Aufiero, Matteo Vaccargiu, Silvia Bartolucci, Fabio Caccioli, and Giuseppe Destefanis. 2026. Coordination at Scale in Large Distributed Development: The Case of Kubernetes. In23rd International Conference on Mining Software Reposito- ries (MSR ’26). 1–12. doi:10.1145/3793302.3793342

work page doi:10.1145/3793302.3793342 2026
[4]

Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and chal- lenges of modern code review. InProceedings of the 2013 International Conference on Software Engineering (ICSE ’13). 712–721. doi:10.1109/ICSE.2013.6606617

work page doi:10.1109/icse.2013.6606617 2013
[5]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Dis- covery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological)57, 1 (1995), 289–300. arXiv:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517- 6161.1995.tb02031.x doi:10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517- 1995
[6]

Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of Useful Code Reviews: An Empirical Study at Microsoft. In2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 146–156. doi:10.1109/MSR. 2015.21

work page doi:10.1109/msr 2015
[7]

Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, and Belén Rolandi

work page
[8]

In2015 IEEE 22nd International Conference on Software Analy- sis, Evolution, and Reengineering (SANER)

Exploring the Use of Labels to Categorize Issues in Open-Source Soft- ware Projects. In2015 IEEE 22nd International Conference on Software Analy- sis, Evolution, and Reengineering (SANER). IEEE, Montréal, Canada, 550–554. doi:10.1109/SANER.2015.7081846

work page doi:10.1109/saner.2015.7081846 2015
[9]

2013.Regression analysis of count data

Adrian Colin Cameron and Pravin K Trivedi. 2013.Regression analysis of count data. Number 53. Cambridge university press

work page 2013
[10]

Marcelo Cataldo, James D Herbsleb, and Kathleen M Carley. 2008. Socio-technical congruence: a framework for assessing the impact of technical and work de- pendencies on software development productivity. InProceedings of the Second ACM-IEEE international symposium on Empirical software engineering and mea- surement. 2–11

work page 2008
[11]

Kenneth Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography.Computational linguistics16, 1 (1990), 22–29

work page 1990
[12]

Paolo Ciancarini, Giancarlo Succi, Artem Kruglov, Evgeny Kovrigin, Witold Pedrycz, and Manuel Mazzara. 2023. Do social interactions affect code review in modern code development?Frontiers in Computer Science5, Article 1178040 (2023). doi:10.3389/fcomp.2023.1178040

work page doi:10.3389/fcomp.2023.1178040 2023
[13]

1999.Mathematical methods of statistics

Harald Cramér. 1999.Mathematical methods of statistics. Vol. 9. Princeton university press

work page 1999
[14]

Giuseppe Destefanis, Silvia Bartolucci, and Daniel Feitosa. 2026. Mining Kuber- netes Repositories: The Cloud was Not Built in a Day. InProceedings of the 23rd International Conference on Mining Software Repositories(Rio de Janeiro, Brazil) (MSR ’26). ACM, New York, NY, USA. doi:10.1145/3793302.3793322

work page doi:10.1145/3793302.3793322 2026
[15]

Nargis Fatima, Sumaira Nazir, and Suriayati Chuprat. 2019. Individual, Social and Personnel Factors Influencing Modern Code Review Process. In2019 IEEE Conference on Open Systems (ICOS). 40–45. doi:10.1109/ICOS47562.2019.8975708

work page doi:10.1109/icos47562.2019.8975708 2019
[16]

Shepherd, Igor Wiese, Christoph Treude, Marco Au- rélio Gerosa, and Igor Steinmacher

Felipe Fronchetti, David C. Shepherd, Igor Wiese, Christoph Treude, Marco Au- rélio Gerosa, and Igor Steinmacher. 2023. Do CONTRIBUTING Files Pro- vide Information about OSS Newcomers’ Onboarding Barriers?. InProceed- ings of the 31st ACM Joint European Software Engineering Conference and Sym- posium on the Foundations of Software Engineering (ESEC/FSE 20...

work page doi:10.1145/3611643.3616288 2023
[17]

Georgios Gousios, Andy Zaidman, Margaret-Anne Storey, and Arie Van Deursen

work page
[18]

In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol

Work practices and challenges in pull-based development: The integrator’s perspective. In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 358–368

work page
[19]

Herbsleb and Audris Mockus

James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development.IEEE Transactions on software engineering29, 6 (2003), 481–494

work page 2003
[20]

Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In2013 35th international conference on software engineering (ICSE). IEEE, 392–401

work page 2013
[21]

2013.Applied logistic regression

David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013.Applied logistic regression. John Wiley & Sons

work page 2013
[22]

Claus Hunsen, Janet Siegmund, and Sven Apel. 2020. On the fulfillment of coordination requirements in open-source software projects: An exploratory study.Empirical Software Engineering25, 6 (2020), 4379–4426

work page 2020
[23]

Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann. 2009. Improving bug triage with bug tossing graphs. InProceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 111–120

work page 2009
[24]

Joselito Jr., Lidia P. G. Nascimento, Alcemir Santos, and Ivan Machado. 2024. Issue Labeling Dynamics in Open-Source Projects: A Comprehensive Analysis. InXVIII Brazilian Symposium on Software Components, Architectures, and Reuse (SBCARS). Curitiba, PR, Brazil

work page 2024
[25]

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining github. In Proceedings of the 11th working conference on mining software repositories. 92–101

work page 2014
[26]

Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella

work page
[27]

In2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Ticket Tagger: Machine Learning Driven Issue Classification. In2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, Cleveland, OH, USA, 406–409. doi:10.1109/ICSME.2019.00070

work page doi:10.1109/icsme.2019.00070 2019
[28]

Maurice G Kendall. 1938. A new measure of rank correlation.Biometrika30, 1-2 (1938), 81–93

work page 1938
[29]

SayedHassan Khatoonabadi, Ahmad Abdellatif, Diego Elias Costa, and Emad Shihab. 2024. Predicting the First Response Latency of Maintainers and Contrib- utors in Pull Requests.IEEE Transactions on Software Engineering50, 10 (2024), 2529–2543. doi:10.1109/TSE.2024.3443741

work page doi:10.1109/tse.2024.3443741 2024
[30]

Jindae Kim and Seonah Lee. 2021. An Empirical Study on Using Multi-Labels for Issues in GitHub.IEEE Access9 (2021), 134984–134997. doi:10.1109/ACCESS. 2021.3116061

work page doi:10.1109/access 2021
[31]

Roger Koenker and Gilbert Bassett. 1978. Regression Quantiles.Econometrica46, 1 (1978), 33–50. http://www.jstor.org/stable/1913643

work page arXiv 1978
[32]

Kruskal and W

William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One- Criterion Variance Analysis.J. Amer. Statist. Assoc.47, 260 (1952), 583–621. arXiv:https://doi.org/10.1080/01621459.1952.10483441 doi:10.1080/01621459.1952. 10483441

work page doi:10.1080/01621459.1952.10483441 1952
[33]

H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101

work page arXiv 1947
[34]

Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2016. An empirical study of the impact of modern code review practices on software quality.Empirical Software Engineering21 (2016), 2146–2189. doi:10.1007/s10664- 015-9381-9

work page doi:10.1007/s10664- 2016
[35]

Audris Mockus, Roy T Fielding, and James D Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla.ACM Transactions on Software Engineering and Methodology (TOSEM)11, 3 (2002), 309–346

work page 2002
[36]

Gustavo Pinto, Igor Steinmacher, and Marco Aurélio Gerosa. 2016. Understanding the Impressions, Motivations, and Barriers of One-Time Code Contributors to FLOSS Projects: A Survey.IEEE Software33, 2 (2016), 187–194. doi:10.1109/MS. 2016.36

work page doi:10.1109/ms 2016
[37]

Shade Ruangwan, Patanamon Thongtanunam, Akinori Ihara, and Kenichi Mat- sumoto. 2019. The impact of human factors on the participation decision of reviewers in modern code review.Empirical Software Engineering24 (2019), 973–1016. doi:10.1007/s10664-018-9646-1

work page doi:10.1007/s10664-018-9646-1 2019
[38]

Fernanda Santos, Joseph Vargovich, Bianca Trinkenreich, et al. 2023. Tag that issue: applying API-domain labels in issue tracking systems.Empirical Software Engineering28 (2023), 116. doi:10.1007/s10664-023-10329-4

work page doi:10.1007/s10664-023-10329-4 2023
[39]

Italo Santos, Katia Romero Felizardo, Igor Steinmacher, and Marco A. Gerosa

work page
[40]

doi:10.1016/j.infsof.2024.107568

Software solutions for newcomers’ onboarding in software projects: A systematic literature review.Information and Software Technology177 (2025), 107568. doi:10.1016/j.infsof.2024.107568

work page doi:10.1016/j.infsof.2024.107568 2025
[41]

German, Igor Steinmacher, and Marco A

Italo Santos, Katia Romero Felizardo, Bianca Trinkenreich, Daniel M. German, Igor Steinmacher, and Marco A. Gerosa. 2025. Exploring the Untapped: Student Perceptions and Participation in OSS. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering(Clarion Hotel Trondheim, Trondheim, Norway)(FSE Companion ’25). Ass...

work page doi:10.1145/3696630.3727243 2025
[42]

Italo Santos, João Felipe Pimentel, Igor Wiese, Igor Steinmacher, Anita Sarma, and Marco A. Gerosa. 2023. Designing for Cognitive Diversity: Improving the GitHub Experience for Newcomers. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). 1–12. doi:10. 1109/ICSE-SEIS58686.2023.00007

work page arXiv 2023
[43]

Italo Santos, Igor Wiese, Igor Steinmacher, Anita Sarma, and Marco A. Gerosa

work page
[44]

In: Proceedings of the 29th Edition of the IEEE International Conference on Software Analysis, Evolu- tion and Reengineering, pp

Hits and Misses: Newcomers’ ability to identify Skills needed for OSS tasks. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 174–183. doi:10.1109/SANER53432.2022.00032

work page doi:10.1109/saner53432.2022.00032 2022
[45]

Pranab Kumar Sen. 1968. Estimates of the Regression Coefficient Based on Kendall’s Tau.J. Amer. Statist. Assoc.63, 324 (1968), 1379–1389. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1968.10480934 doi:10.1080/01621459.1968.10480934

work page doi:10.1080/01621459.1968.10480934 1968
[46]

Think local, retweet global: Retweeting by the geographically-vulnerable during hurricane sandy,

Igor Steinmacher, Tayana Uchôa Conte, Marco Aurélio Gerosa, and David F. Red- miles. 2015. Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW ’15). 1379–1392. doi:10.1145/2675133.2675215

work page doi:10.1145/2675133.2675215 2015
[47]

Devanbu, Christoph Treude, and Michael Pradel

M. Vaccargiu, S. Aufiero, C. Ba, S. Bartolucci, R. Clegg, D. Graziotin, R. Neykova, R. Tonelli, and G. Destefanis. 2025. Mining a Decade of Event Impacts on Contributor Dynamics in Ethereum: A Longitudinal Study. In2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). 552–563. doi:10.1109/MSR66628.2025.00088 EASE 2026, 9–12 Ju...

work page doi:10.1109/msr66628.2025.00088 2025
[48]

Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Rumyana Neykova, Roberto Tonelli, and Giuseppe Destefanis. 2026. Developer engagement in open-source software’s green transition.Communications Sustainability1 (2026), 41. doi:10. 1038/s44458-026-00050-w

work page 2026
[49]

Matteo Vaccargiu, Silvia Bartolucci, Nicole Novielli, Marco Ortu, Roberto Tonelli, and Giuseppe Destefanis. 2026. Emotional expression in open- source: How project function shapes communication.Information and Software Technology 191 (2026), 108003. doi:10.1016/j.infsof.2025.108003

work page doi:10.1016/j.infsof.2025.108003 2026
[50]

Matteo Vaccargiu, Riccardo Lai, Maria Ilaria Lunesu, Andrea Pinna, and Giuseppe Destefanis. 2026. Patterns of Bot Participation and Emotional Influence in Open- Source Development. In7th International Workshop on Bots and Agents in Software Engineering (BoatSE ’26). 1–7. doi:10.1145/3786161.3788455

work page doi:10.1145/3786161.3788455 2026
[51]

Matteo Vaccargiu, Rumyana Neykova, Nicole Novielli, Marco Ortu, and Giuseppe Destefanis. 2025. More Than Code: Technical and Emotional Dynamics in Solidity’s Development. In2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE). 260–271. doi:10.1109/CHASE66643.2025.00036

work page doi:10.1109/chase66643.2025.00036 2025
[52]

Roel Wieringa and Maya Daneva. 2015. Six strategies for generalizing software engineering theories.Science of Computer Programming101 (2015), 136–152. doi:10.1016/j.scico.2014.11.013 Towards general theories of software engineering

work page doi:10.1016/j.scico.2014.11.013 2015
[53]

Edwin B. Wilson. 1927. Probable Inference, the Law of Succession, and Statistical Inference.J. Amer. Statist. Assoc.22, 158 (1927), 209–212. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1927.10502953 doi:10.1080/01621459.1927.10502953

work page doi:10.1080/01621459.1927.10502953 1927
[54]

Ohlsson, Björn Regnell, and Anders Wesslén

Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012.Experimentation in Software Engineering. Vol. 236. Springer

work page 2012
[55]

Yue Yu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu, and Bogdan Vasilescu. 2015. Wait For It: Determinants of Pull Request Evaluation Latency on GitHub. In2015 12th Working Conference on Mining Software Repositories (MSR). IEEE, 367–371. doi:10.1109/MSR.2015.42

work page doi:10.1109/msr.2015.42 2015
[56]

Guoliang Zhao, Daniel Alencar da Costa, and Ying Zou. 2019. Improving the pull requests review process using learning-to-rank algorithms.Empirical Software Engineering24 (2019), 2140–2170. doi:10.1007/s10664-019-09696-8

work page doi:10.1007/s10664-019-09696-8 2019

[1] [1]

Amirali Alami, Kasia MacLean, Emad Shihab, and Foutse Khomh. 2025. The Role of Intrinsic Drivers and the Impact of LLMs in Code Review: Examining Accountability for Code Quality.ACM Transactions on Software Engineering and Methodology34, 8, Article 233 (2025). doi:10.1145/3721127

work page doi:10.1145/3721127 2025

[2] [2]

John Anvik, Lyndon Hiew, and Gail C Murphy. 2006. Who should fix this bug?. In Proceedings of the 28th international conference on Software engineering. 361–370

work page 2006

[3] [3]

Sabrina Aufiero, Matteo Vaccargiu, Silvia Bartolucci, Fabio Caccioli, and Giuseppe Destefanis. 2026. Coordination at Scale in Large Distributed Development: The Case of Kubernetes. In23rd International Conference on Mining Software Reposito- ries (MSR ’26). 1–12. doi:10.1145/3793302.3793342

work page doi:10.1145/3793302.3793342 2026

[4] [4]

Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and chal- lenges of modern code review. InProceedings of the 2013 International Conference on Software Engineering (ICSE ’13). 712–721. doi:10.1109/ICSE.2013.6606617

work page doi:10.1109/icse.2013.6606617 2013

[5] [5]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Dis- covery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological)57, 1 (1995), 289–300. arXiv:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517- 6161.1995.tb02031.x doi:10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517- 1995

[6] [6]

Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of Useful Code Reviews: An Empirical Study at Microsoft. In2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 146–156. doi:10.1109/MSR. 2015.21

work page doi:10.1109/msr 2015

[7] [7]

Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, and Belén Rolandi

work page

[8] [8]

In2015 IEEE 22nd International Conference on Software Analy- sis, Evolution, and Reengineering (SANER)

Exploring the Use of Labels to Categorize Issues in Open-Source Soft- ware Projects. In2015 IEEE 22nd International Conference on Software Analy- sis, Evolution, and Reengineering (SANER). IEEE, Montréal, Canada, 550–554. doi:10.1109/SANER.2015.7081846

work page doi:10.1109/saner.2015.7081846 2015

[9] [9]

2013.Regression analysis of count data

Adrian Colin Cameron and Pravin K Trivedi. 2013.Regression analysis of count data. Number 53. Cambridge university press

work page 2013

[10] [10]

Marcelo Cataldo, James D Herbsleb, and Kathleen M Carley. 2008. Socio-technical congruence: a framework for assessing the impact of technical and work de- pendencies on software development productivity. InProceedings of the Second ACM-IEEE international symposium on Empirical software engineering and mea- surement. 2–11

work page 2008

[11] [11]

Kenneth Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography.Computational linguistics16, 1 (1990), 22–29

work page 1990

[12] [12]

Paolo Ciancarini, Giancarlo Succi, Artem Kruglov, Evgeny Kovrigin, Witold Pedrycz, and Manuel Mazzara. 2023. Do social interactions affect code review in modern code development?Frontiers in Computer Science5, Article 1178040 (2023). doi:10.3389/fcomp.2023.1178040

work page doi:10.3389/fcomp.2023.1178040 2023

[13] [13]

1999.Mathematical methods of statistics

Harald Cramér. 1999.Mathematical methods of statistics. Vol. 9. Princeton university press

work page 1999

[14] [14]

Giuseppe Destefanis, Silvia Bartolucci, and Daniel Feitosa. 2026. Mining Kuber- netes Repositories: The Cloud was Not Built in a Day. InProceedings of the 23rd International Conference on Mining Software Repositories(Rio de Janeiro, Brazil) (MSR ’26). ACM, New York, NY, USA. doi:10.1145/3793302.3793322

work page doi:10.1145/3793302.3793322 2026

[15] [15]

Nargis Fatima, Sumaira Nazir, and Suriayati Chuprat. 2019. Individual, Social and Personnel Factors Influencing Modern Code Review Process. In2019 IEEE Conference on Open Systems (ICOS). 40–45. doi:10.1109/ICOS47562.2019.8975708

work page doi:10.1109/icos47562.2019.8975708 2019

[16] [16]

Shepherd, Igor Wiese, Christoph Treude, Marco Au- rélio Gerosa, and Igor Steinmacher

Felipe Fronchetti, David C. Shepherd, Igor Wiese, Christoph Treude, Marco Au- rélio Gerosa, and Igor Steinmacher. 2023. Do CONTRIBUTING Files Pro- vide Information about OSS Newcomers’ Onboarding Barriers?. InProceed- ings of the 31st ACM Joint European Software Engineering Conference and Sym- posium on the Foundations of Software Engineering (ESEC/FSE 20...

work page doi:10.1145/3611643.3616288 2023

[17] [17]

Georgios Gousios, Andy Zaidman, Margaret-Anne Storey, and Arie Van Deursen

work page

[18] [18]

In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol

Work practices and challenges in pull-based development: The integrator’s perspective. In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 358–368

work page

[19] [19]

Herbsleb and Audris Mockus

James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development.IEEE Transactions on software engineering29, 6 (2003), 481–494

work page 2003

[20] [20]

Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In2013 35th international conference on software engineering (ICSE). IEEE, 392–401

work page 2013

[21] [21]

2013.Applied logistic regression

David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013.Applied logistic regression. John Wiley & Sons

work page 2013

[22] [22]

Claus Hunsen, Janet Siegmund, and Sven Apel. 2020. On the fulfillment of coordination requirements in open-source software projects: An exploratory study.Empirical Software Engineering25, 6 (2020), 4379–4426

work page 2020

[23] [23]

Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann. 2009. Improving bug triage with bug tossing graphs. InProceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 111–120

work page 2009

[24] [24]

Joselito Jr., Lidia P. G. Nascimento, Alcemir Santos, and Ivan Machado. 2024. Issue Labeling Dynamics in Open-Source Projects: A Comprehensive Analysis. InXVIII Brazilian Symposium on Software Components, Architectures, and Reuse (SBCARS). Curitiba, PR, Brazil

work page 2024

[25] [25]

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining github. In Proceedings of the 11th working conference on mining software repositories. 92–101

work page 2014

[26] [26]

Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella

work page

[27] [27]

In2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Ticket Tagger: Machine Learning Driven Issue Classification. In2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, Cleveland, OH, USA, 406–409. doi:10.1109/ICSME.2019.00070

work page doi:10.1109/icsme.2019.00070 2019

[28] [28]

Maurice G Kendall. 1938. A new measure of rank correlation.Biometrika30, 1-2 (1938), 81–93

work page 1938

[29] [29]

SayedHassan Khatoonabadi, Ahmad Abdellatif, Diego Elias Costa, and Emad Shihab. 2024. Predicting the First Response Latency of Maintainers and Contrib- utors in Pull Requests.IEEE Transactions on Software Engineering50, 10 (2024), 2529–2543. doi:10.1109/TSE.2024.3443741

work page doi:10.1109/tse.2024.3443741 2024

[30] [30]

Jindae Kim and Seonah Lee. 2021. An Empirical Study on Using Multi-Labels for Issues in GitHub.IEEE Access9 (2021), 134984–134997. doi:10.1109/ACCESS. 2021.3116061

work page doi:10.1109/access 2021

[31] [31]

Roger Koenker and Gilbert Bassett. 1978. Regression Quantiles.Econometrica46, 1 (1978), 33–50. http://www.jstor.org/stable/1913643

work page arXiv 1978

[32] [32]

Kruskal and W

William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One- Criterion Variance Analysis.J. Amer. Statist. Assoc.47, 260 (1952), 583–621. arXiv:https://doi.org/10.1080/01621459.1952.10483441 doi:10.1080/01621459.1952. 10483441

work page doi:10.1080/01621459.1952.10483441 1952

[33] [33]

H. B. Mann and D. R. Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18, 1 (1947), 50–60. http://www.jstor.org/stable/2236101

work page arXiv 1947

[34] [34]

Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2016. An empirical study of the impact of modern code review practices on software quality.Empirical Software Engineering21 (2016), 2146–2189. doi:10.1007/s10664- 015-9381-9

work page doi:10.1007/s10664- 2016

[35] [35]

Audris Mockus, Roy T Fielding, and James D Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla.ACM Transactions on Software Engineering and Methodology (TOSEM)11, 3 (2002), 309–346

work page 2002

[36] [36]

Gustavo Pinto, Igor Steinmacher, and Marco Aurélio Gerosa. 2016. Understanding the Impressions, Motivations, and Barriers of One-Time Code Contributors to FLOSS Projects: A Survey.IEEE Software33, 2 (2016), 187–194. doi:10.1109/MS. 2016.36

work page doi:10.1109/ms 2016

[37] [37]

Shade Ruangwan, Patanamon Thongtanunam, Akinori Ihara, and Kenichi Mat- sumoto. 2019. The impact of human factors on the participation decision of reviewers in modern code review.Empirical Software Engineering24 (2019), 973–1016. doi:10.1007/s10664-018-9646-1

work page doi:10.1007/s10664-018-9646-1 2019

[38] [38]

Fernanda Santos, Joseph Vargovich, Bianca Trinkenreich, et al. 2023. Tag that issue: applying API-domain labels in issue tracking systems.Empirical Software Engineering28 (2023), 116. doi:10.1007/s10664-023-10329-4

work page doi:10.1007/s10664-023-10329-4 2023

[39] [39]

Italo Santos, Katia Romero Felizardo, Igor Steinmacher, and Marco A. Gerosa

work page

[40] [40]

doi:10.1016/j.infsof.2024.107568

Software solutions for newcomers’ onboarding in software projects: A systematic literature review.Information and Software Technology177 (2025), 107568. doi:10.1016/j.infsof.2024.107568

work page doi:10.1016/j.infsof.2024.107568 2025

[41] [41]

German, Igor Steinmacher, and Marco A

Italo Santos, Katia Romero Felizardo, Bianca Trinkenreich, Daniel M. German, Igor Steinmacher, and Marco A. Gerosa. 2025. Exploring the Untapped: Student Perceptions and Participation in OSS. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering(Clarion Hotel Trondheim, Trondheim, Norway)(FSE Companion ’25). Ass...

work page doi:10.1145/3696630.3727243 2025

[42] [42]

Italo Santos, João Felipe Pimentel, Igor Wiese, Igor Steinmacher, Anita Sarma, and Marco A. Gerosa. 2023. Designing for Cognitive Diversity: Improving the GitHub Experience for Newcomers. In2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). 1–12. doi:10. 1109/ICSE-SEIS58686.2023.00007

work page arXiv 2023

[43] [43]

Italo Santos, Igor Wiese, Igor Steinmacher, Anita Sarma, and Marco A. Gerosa

work page

[44] [44]

In: Proceedings of the 29th Edition of the IEEE International Conference on Software Analysis, Evolu- tion and Reengineering, pp

Hits and Misses: Newcomers’ ability to identify Skills needed for OSS tasks. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 174–183. doi:10.1109/SANER53432.2022.00032

work page doi:10.1109/saner53432.2022.00032 2022

[45] [45]

Pranab Kumar Sen. 1968. Estimates of the Regression Coefficient Based on Kendall’s Tau.J. Amer. Statist. Assoc.63, 324 (1968), 1379–1389. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1968.10480934 doi:10.1080/01621459.1968.10480934

work page doi:10.1080/01621459.1968.10480934 1968

[46] [46]

Think local, retweet global: Retweeting by the geographically-vulnerable during hurricane sandy,

Igor Steinmacher, Tayana Uchôa Conte, Marco Aurélio Gerosa, and David F. Red- miles. 2015. Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects. InProceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW ’15). 1379–1392. doi:10.1145/2675133.2675215

work page doi:10.1145/2675133.2675215 2015

[47] [47]

Devanbu, Christoph Treude, and Michael Pradel

M. Vaccargiu, S. Aufiero, C. Ba, S. Bartolucci, R. Clegg, D. Graziotin, R. Neykova, R. Tonelli, and G. Destefanis. 2025. Mining a Decade of Event Impacts on Contributor Dynamics in Ethereum: A Longitudinal Study. In2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). 552–563. doi:10.1109/MSR66628.2025.00088 EASE 2026, 9–12 Ju...

work page doi:10.1109/msr66628.2025.00088 2025

[48] [48]

Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Rumyana Neykova, Roberto Tonelli, and Giuseppe Destefanis. 2026. Developer engagement in open-source software’s green transition.Communications Sustainability1 (2026), 41. doi:10. 1038/s44458-026-00050-w

work page 2026

[49] [49]

Matteo Vaccargiu, Silvia Bartolucci, Nicole Novielli, Marco Ortu, Roberto Tonelli, and Giuseppe Destefanis. 2026. Emotional expression in open- source: How project function shapes communication.Information and Software Technology 191 (2026), 108003. doi:10.1016/j.infsof.2025.108003

work page doi:10.1016/j.infsof.2025.108003 2026

[50] [50]

Matteo Vaccargiu, Riccardo Lai, Maria Ilaria Lunesu, Andrea Pinna, and Giuseppe Destefanis. 2026. Patterns of Bot Participation and Emotional Influence in Open- Source Development. In7th International Workshop on Bots and Agents in Software Engineering (BoatSE ’26). 1–7. doi:10.1145/3786161.3788455

work page doi:10.1145/3786161.3788455 2026

[51] [51]

Matteo Vaccargiu, Rumyana Neykova, Nicole Novielli, Marco Ortu, and Giuseppe Destefanis. 2025. More Than Code: Technical and Emotional Dynamics in Solidity’s Development. In2025 IEEE/ACM 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE). 260–271. doi:10.1109/CHASE66643.2025.00036

work page doi:10.1109/chase66643.2025.00036 2025

[52] [52]

Roel Wieringa and Maya Daneva. 2015. Six strategies for generalizing software engineering theories.Science of Computer Programming101 (2015), 136–152. doi:10.1016/j.scico.2014.11.013 Towards general theories of software engineering

work page doi:10.1016/j.scico.2014.11.013 2015

[53] [53]

Edwin B. Wilson. 1927. Probable Inference, the Law of Succession, and Statistical Inference.J. Amer. Statist. Assoc.22, 158 (1927), 209–212. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1927.10502953 doi:10.1080/01621459.1927.10502953

work page doi:10.1080/01621459.1927.10502953 1927

[54] [54]

Ohlsson, Björn Regnell, and Anders Wesslén

Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012.Experimentation in Software Engineering. Vol. 236. Springer

work page 2012

[55] [55]

Yue Yu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu, and Bogdan Vasilescu. 2015. Wait For It: Determinants of Pull Request Evaluation Latency on GitHub. In2015 12th Working Conference on Mining Software Repositories (MSR). IEEE, 367–371. doi:10.1109/MSR.2015.42

work page doi:10.1109/msr.2015.42 2015

[56] [56]

Guoliang Zhao, Daniel Alencar da Costa, and Ying Zou. 2019. Improving the pull requests review process using learning-to-rank algorithms.Empirical Software Engineering24 (2019), 2140–2170. doi:10.1007/s10664-019-09696-8

work page doi:10.1007/s10664-019-09696-8 2019