pith. sign in

arxiv: 2306.09665 · v1 · submitted 2023-06-16 · 💻 cs.SE

State-Of-The-Practice in Quality Assurance in Java-Based Open Source Software Development

Pith reviewed 2026-05-24 08:35 UTC · model grok-4.3

classification 💻 cs.SE
keywords quality assuranceopen source softwareJavaGitHubsoftware testingcode reviewstatic analysiscontinuous integration
0
0 comments X

The pith

Popular Java open source projects typically do not combine quality assurance practices at high intensity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how testing, modern code review, automated static analysis, and build automation are used together across 1,454 popular Java projects on GitHub. It establishes that projects rarely apply all these practices with high intensity simultaneously and that correlations between the practices remain weak. More mature projects increase intensity in automated static analysis and code review but show little change in continuous integration usage. This mapping of current usage patterns fills a gap left by prior studies that examined each practice in isolation.

Core claim

In 1,454 popular open source Java projects on GitHub, quality assurance approaches are not typically followed all together with high intensity; only weak correlations appear among some practices. More mature projects apply the practices more intensely, with greater focus on automated static analysis tool usage and code reviewing, yet show no strong change in continuous integration usage.

What carries the argument

Intensity metrics and proxies for each quality assurance practice (testing, code review, ASAT, CI) measured across the full sample of GitHub repositories and contrasted by project maturity.

If this is right

  • Projects tend to adopt quality assurance practices selectively rather than comprehensively.
  • Project maturity drives increased use of automated static analysis and code review.
  • Continuous integration adoption stays roughly constant as projects mature.
  • The weak observed correlations imply that practices are often implemented independently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tooling that lowers the cost of combining currently weakly correlated practices could raise overall quality assurance intensity.
  • Maturity-related gains appear limited to certain practices, suggesting targeted interventions might be more effective than general process upgrades.
  • The findings could guide empirical comparisons with closed-source or non-Java open source projects.

Load-bearing premise

The selected metrics and proxies accurately reflect the real intensity of each quality assurance practice, and the 1,454 popular GitHub Java projects represent broader Java-based open source development.

What would settle it

A new sample of Java projects in which multiple quality assurance practices show both high intensity and strong correlations with one another.

Figures

Figures reproduced from arXiv: 2306.09665 by Ali Khatami, Andy Zaidman.

Figure 1
Figure 1. Figure 1: An overview of data collection steps. (last commit after August 2021). We have settled on one programming lan￾guage to keep our data analysis pipeline simple, more specifically, we have chosen Java. We have also selected projects to minimally have 10 contrib￾utors, as our line of reasoning is that you need to have a sufficient number of developers to make optimal use of quality assurance practices like cod… view at source ↗
Figure 2
Figure 2. Figure 2: Status checks on a commit in the main page of a repository: 1. The combined [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: From this graph, we observe that ∼63% of projects could be built successfully out of the box. We do observe that that despite Gradle being a newer and more popular build system [57], among our selection of projects, it exhibits a relatively lower successful build percentage (56%) compared to Maven (66.6%). We analysed a selection of failed builds to better understand why builds fail. To this end, we first … view at source ↗
Figure 3
Figure 3. Figure 3: Summary of local build results based on build systems. [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Counts of status checks among projects following CI practice. [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Summary of CI usage in projects based on their local build results. [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Summary of CI build results based on local builds. [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of count of comments in projects’ last 20 merged PRs. [PITH_FULL_IMAGE:figures/full_fig_p033_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of count of reviews in projects’ last 20 merged PRs. [PITH_FULL_IMAGE:figures/full_fig_p033_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Overview of studied projects in building and testing. [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Branch coverage of projects with complete and partial coverage result. [PITH_FULL_IMAGE:figures/full_fig_p037_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Code review rates before and after removing outliers. [PITH_FULL_IMAGE:figures/full_fig_p039_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Relation between Testing and Code Reviewing with respect to usage of CI and [PITH_FULL_IMAGE:figures/full_fig_p042_12.png] view at source ↗
read the original abstract

To ensure the quality of software systems, software engineers can make use of a variety of quality assurance approaches, such as software testing, modern code review, automated static analysis, and build automation. Each of these quality assurance practices has been studied in depth in isolation, but there is a clear knowledge gap when it comes to our understanding of how these approaches are being used in conjunction or not. In our study, we broadly investigate whether and how these quality assurance approaches are being used in conjunction in the development of 1,454 popular open source software projects on GitHub. Our study indicates that typically projects do not follow all quality assurance practices together with high intensity. In fact, we only observe weak correlation among some quality assurance practices. In general, our study provides a deeper understanding of how existing quality assurance approaches are currently being used in Java-based open source software development. Besides, we specifically zoomed in on the more mature projects in our dataset, and generally, we observe that more mature projects are more intense in their application of the quality assurance practices, with more focus on their ASAT usage and code reviewing, but no strong change in their CI usage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript reports an empirical study of quality assurance (QA) practices—software testing, modern code review, automated static analysis tools (ASATs), and continuous integration (CI)—across 1,454 popular Java-based open source projects on GitHub. It claims that projects do not typically apply all practices together at high intensity, supported by observations of only weak correlations among the practices. It further reports that more mature projects apply QA practices more intensely overall, with increased focus on ASAT usage and code reviewing but no strong change in CI usage.

Significance. If the intensity metrics prove valid, the work addresses a genuine gap by examining the joint application of multiple QA practices rather than isolated studies, providing a broad observational snapshot of state-of-the-practice in Java OSS. The sample scale (1,454 projects) is a clear strength for descriptive claims. The maturity stratification adds a useful dimension. However, the absence of proxy validation directly limits how much weight the correlation and maturity findings can carry for the field.

major comments (2)
  1. [Section 3] Section 3 (Data Collection and Metric Definition): The intensity proxies for each QA practice (test file counts or coverage for testing, review comments per PR for code review, presence/absence of tool configuration files for ASATs, and workflow file presence for CI) are defined without reported validation against ground-truth usage (e.g., developer surveys, manual audits of a subsample, or trace data). Because the correlation matrix and maturity comparisons in Section 5 are computed directly from these proxies, any systematic mismatch between proxy and actual intensity renders the weak-correlation claim and the differential maturity effects uninterpretable.
  2. [Section 4.2] Section 4.2 (Maturity Stratification): The operationalization of 'maturity' (likely commit history, age, or contributor count) and the statistical test used to compare intensity across maturity strata are not accompanied by effect-size reporting or controls for confounding variables such as project size or domain. This makes the claim of 'more focus on ASAT usage and code reviewing, but no strong change in CI usage' difficult to evaluate as load-bearing evidence.
minor comments (2)
  1. [Abstract] Abstract: The abstract omits any description of the concrete metrics, statistical tests, or sample-selection criteria, forcing readers to reach the full text before assessing the headline claims.
  2. [Results tables] Tables reporting correlations: Include exact Pearson/Spearman coefficients, p-values, and sample sizes per cell so readers can judge the 'weak' characterization quantitatively rather than qualitatively.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, indicating where revisions will be incorporated.

read point-by-point responses
  1. Referee: [Section 3] Section 3 (Data Collection and Metric Definition): The intensity proxies for each QA practice (test file counts or coverage for testing, review comments per PR for code review, presence/absence of tool configuration files for ASATs, and workflow file presence for CI) are defined without reported validation against ground-truth usage (e.g., developer surveys, manual audits of a subsample, or trace data). Because the correlation matrix and maturity comparisons in Section 5 are computed directly from these proxies, any systematic mismatch between proxy and actual intensity renders the weak-correlation claim and the differential maturity effects uninterpretable.

    Authors: We agree that the proxies lack direct ground-truth validation such as surveys or manual audits, which is a limitation for interpreting the correlations and maturity effects. The proxies were selected as observable GitHub artifacts following conventions in prior large-scale empirical studies on OSS. We will revise the manuscript to expand the rationale for each proxy with additional literature citations, add an explicit limitations subsection discussing potential mismatches, and note the scale constraints that precluded subsample validation. This is a partial revision as full validation is not feasible post-hoc at this dataset size. revision: partial

  2. Referee: [Section 4.2] Section 4.2 (Maturity Stratification): The operationalization of 'maturity' (likely commit history, age, or contributor count) and the statistical test used to compare intensity across maturity strata are not accompanied by effect-size reporting or controls for confounding variables such as project size or domain. This makes the claim of 'more focus on ASAT usage and code reviewing, but no strong change in CI usage' difficult to evaluate as load-bearing evidence.

    Authors: We will clarify the precise definition and measurement of maturity (based on project age and commit history) in Section 4.2. The revised version will include effect sizes for all stratum comparisons and a discussion of potential confounders such as project size and domain, with additional analysis or controls where data permit. This directly addresses the evaluability concern. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical study

full rationale

This paper performs an observational analysis of 1,454 GitHub Java projects by measuring the presence and intensity of QA practices (testing, code review, ASATs, CI) via direct repository artifacts. No equations, derivations, fitted parameters, or predictions are claimed; the reported correlations and maturity stratifications are computed directly from the collected metrics. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify core results. The study is self-contained against its own data collection protocol with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into any additional parameters or assumptions in the full methods section; the primary assumption is the validity of the empirical setup.

axioms (1)
  • domain assumption The 1,454 selected projects and the chosen indicators for QA practice intensity provide a valid representation of the state-of-the-practice in Java OSS development.
    This assumption underpins the generalizability of the observed patterns and correlations.

pith-pipeline@v0.9.0 · 5732 in / 1278 out tokens · 36133 ms · 2026-05-24T08:35:43.507520+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages · 1 internal anchor

  1. [1]

    J. Patel, Software is still eating the world, https://techcrunch.com/2 016/06/07/software-is-eating-the-world-5-years-later/?gucc ounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guc e_referrer_sig=AQAAADIqx8LBuU1uKI03errh0RlYZjGsX_ZK76KVXqy 3KqGkv3xyyXVrxi-46rFMEmZaBV4Na7Cm2lYLUC_QcKfhx0-njTwVR8XKsj 55 krDvNC9CoaHj4L9SLucX6hkJUcxl-rhBjsxcATrgy0yFSp...

  2. [2]

    Jazayeri, The education of a software engineer, in: Proc

    M. Jazayeri, The education of a software engineer, in: Proc. Interna- tional Conference on Automated Software Engineering (ASE), IEEE, USA, 2004

  3. [3]

    A. J. Ko, B. Dosono, N. Duriseti, Thirty years of software problems in the news, in: Proceedings of the 7th International Workshop on Coop- erative and Human Aspects of Software Engineering (CHASE), ACM, 2014, pp. 32–39

  4. [4]

    Aniche, Effective Software Testing: A Developer’s Guide, Manning Publications, 2022

    M. Aniche, Effective Software Testing: A Developer’s Guide, Manning Publications, 2022

  5. [5]

    Kamei, E

    Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha, N. Ubayashi, A large-scale empirical study of just-in-time quality assur- ance, IEEE Trans. Software Eng. 39 (6) (2013) 757–773

  6. [6]

    Bacchelli, C

    A. Bacchelli, C. Bird, Expectations, outcomes, and challenges of modern code review, in: 35th International Conference on Software Engineering (ICSE), IEEE, 2013, pp. 712–721

  7. [7]

    Beller, R

    M. Beller, R. Bholanath, S. McIntosh, A. Zaidman, Analyzing the state of static analysis: A large-scale evaluation in open source software, in: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), IEEE, 2016, pp. 470–481

  8. [8]

    Vassallo, S

    C. Vassallo, S. Panichella, F. Palomba, S. Proksch, H. C. Gall, A. Zaid- man, How developers engage with static analysis tools in different con- texts, Empir. Softw. Eng. 25 (2) (2020) 1419–1457

  9. [9]

    Beller, G

    M. Beller, G. Gousios, A. Zaidman, Oops, my tests broke the build: an explorative analysis of Travis CI with GitHub, in: Proceedings of the International Conference on Mining Software Repositories (MSR), IEEE, 2017, pp. 356–367

  10. [10]

    Rahman, A

    A. Rahman, A. Partho, D. Meder, L. Williams, Which factors influence practitioners’ usage of build automation tools?, in: International Work- shop on Rapid Continuous Software Engineering (RCoSE), 2017, pp. 20–26. 56

  11. [11]

    Rausch, W

    T. Rausch, W. Hummer, P. Leitner, S. Schulte, An empirical analysis of build failures in the continuous integration workflows of java-based open- source software, in: Proceedings International Conference on Mining Software Repositories (MSR), IEEE, 2017, pp. 345–355

  12. [12]

    Beller, A

    M. Beller, A. Bacchelli, A. Zaidman, E. J¨ urgens, Modern code reviews in open-source projects: which problems do they fix?, in: 11th Working Conference on Mining Software Repositories (MSR), ACM, 2014, pp. 202–211

  13. [13]

    P. C. Rigby, D. M. Germ´ an, M. D. Storey, Open source software peer review practices: a case study of the apache server, in: International Conference on Software Engineering (ICSE), ACM, 2008, pp. 541–550

  14. [14]

    Hilton, T

    M. Hilton, T. Tunnell, K. Huang, D. Marinov, D. Dig, Usage, costs, and benefits of continuous integration in open-source projects, in: Proceed- ings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), ACM, 2016, pp. 426–437

  15. [15]

    Cassee, B

    N. Cassee, B. Vasilescu, A. Serebrenik, The silent helper: The impact of continuous integration on code reviews, in: Int’l Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2020, pp. 423– 434

  16. [16]

    Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, B. Vasilescu, The impact of continuous integration on other software development practices: a large- scale empirical study, in: Proceedings of the International Conference on Automated Software Engineering (ASE), IEEE, 2017, pp. 60–71

  17. [17]

    Zampetti, G

    F. Zampetti, G. Bavota, G. Canfora, M. D. Penta, A study on the interplay between pull request review and continuous integration builds, in: 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2019, pp. 38–48

  18. [18]

    Panichella, V

    S. Panichella, V. Arnaoudova, M. D. Penta, G. Antoniol, Would static analysis tools help developers with code reviews?, in: 22nd IEEE Inter- national Conference on Software Analysis, Evolution, and Reengineering (SANER), IEEE, 2015, pp. 161–170

  19. [19]

    G. S. Nery, D. A. da Costa, U. Kulesza, An empirical study of the re- lationship between continuous integration and test code evolution, in: 57 2019 IEEE International Conference on Software Maintenance and Evo- lution (ICSME), IEEE, 2019, pp. 426–436

  20. [20]

    M. V. M¨ antyl¨ a, C. Lassenius, What types of defects are really discovered in code reviews?, IEEE Transactions on Software Engineering 35 (3) (2009) 430–448

  21. [21]

    Borges, M

    H. Borges, M. Tulio Valente, What’s in a GitHub star? understand- ing repository starring practices in a social coding platform, Journal of Systems and Software 146 (2018) 112–129

  22. [22]

    Khatami, A

    A. Khatami, A. Zaidman, ”State-Of-The-Practice in Quality Assurance in Java-Based Open Source Software Development” Replication Package (Dec. 2022). doi:10.5281/zenodo.7404903. URL https://doi.org/10.5281/zenodo.7404903

  23. [23]

    Tufano, F

    M. Tufano, F. Palomba, G. Bavota, M. D. Penta, R. Oliveto, A. D. Lucia, D. Poshyvanyk, There and back again: Can you compile that snapshot?, J. Softw. Evol. Process. 29 (4) (2017)

  24. [24]

    Maes-Bermejo, M

    M. Maes-Bermejo, M. Gallego, F. Gort´ azar, G. Robles, J. M. Gonz´ alez- Barahona, Revisiting the building of past snapshots - a replication and reproduction study, Empir. Softw. Eng. 27 (3) (2022) 65. doi:10.100 7/s10664-022-10117-6

  25. [25]

    Hassan, S

    F. Hassan, S. Mostafa, E. S. L. Lam, X. Wang, Automatic building of java projects in software repositories: A study on feasibility and challenges, in: 2017 ACM/IEEE International Symposium on Empir- ical Software Engineering and Measurement (ESEM), IEEE, 2017, pp. 38–47

  26. [26]

    Hassan, X

    F. Hassan, X. Wang, Change-aware build prediction model for stall avoidance in continuous integration, in: International Symposium on Empirical Software Engineering and Measurement (ESEM), IEEE, 2017, pp. 157–162

  27. [27]

    Beller, G

    M. Beller, G. Gousios, A. Zaidman, Travistorrent: synthesizing Travis CI and GitHub for full-stack research on continuous integration, in: Proceedings of the 14th International Conference on Mining Software Repositories (MSR), IEEE, 2017, pp. 447–450. 58

  28. [28]

    X. Jin, F. Servant, A cost-efficient approach to building in continuous integration, in: Proc. International Conference on Software Engineering (ICSE), ACM, 2020, pp. 13–25

  29. [29]

    Y. Lou, Z. Chen, Y. Cao, D. Hao, L. Zhang, Understanding build issue resolution in practice: symptoms and fix patterns, in: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), ACM, 2020, pp. 617–628

  30. [30]

    McIntosh, Y

    S. McIntosh, Y. Kamei, B. Adams, A. E. Hassan, The impact of code re- view coverage and code review participation on software quality: a case study of the qt, vtk, and ITK projects, in: Proceedings of the 11th Work- ing Conference on Mining Software Repositories (MSR), ACM, 2014, pp. 192–201

  31. [31]

    M. M. Rahman, C. K. Roy, Impact of continuous integration on code reviews, in: Proc. International Conference on Mining Software Repos- itories (MSR), IEEE, 2017, pp. 499–502

  32. [32]

    M. E. Fagan, Design and code inspections to reduce errors in program development, IBM Syst. J. 15 (3) (1976) 182–211

  33. [33]

    Kononenko, O

    O. Kononenko, O. Baysal, M. W. Godfrey, Code review quality: how developers see it, in: Proceedings of the International Conference on Software Engineering (ICSE), ACM, 2016, pp. 1028–1038

  34. [34]

    Gopinath, C

    R. Gopinath, C. Jensen, A. Groce, Code coverage for suite evaluation by developers, in: 36th International Conference on Software Engineering (ICSE), ACM, 2014, pp. 72–82. doi:10.1145/2568225.2568278

  35. [35]

    Hilton, J

    M. Hilton, J. Bell, D. Marinov, A large-scale study of test coverage evo- lution, in: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), ACM, 2018, pp. 53–63

  36. [36]

    S. G. Elbaum, D. Gable, G. Rothermel, The impact of software evolution on code coverage information, in: International Conference on Software Maintenance (ICSM), IEEE, 2001, pp. 170–179

  37. [37]

    Zaidman, B

    A. Zaidman, B. Van Rompaey, A. van Deursen, S. Demeyer, Studying the co-evolution of production and test code in open source and indus- trial developer test processes through repository mining, Empir. Softw. Eng. 16 (3) (2011) 325–364. 59

  38. [38]

    Gousios, A

    G. Gousios, A. Zaidman, M. D. Storey, A. van Deursen, Work practices and challenges in pull-based development: The integrator’s perspective, in: 37th IEEE/ACM International Conference on Software Engineering (ICSE), IEEE, 2015, pp. 358–368

  39. [39]

    Beller, G

    M. Beller, G. Gousios, A. Panichella, S. Proksch, S. Amann, A. Zaidman, Developer testing in the IDE: patterns, beliefs, and behavior, IEEE Trans. Software Eng. 45 (3) (2019) 261–284

  40. [40]

    P. S. Kochhar, D. Lo, J. Lawall, N. Nagappan, Code coverage and postre- lease defects: A large-scale study on open source projects, IEEE Trans. Reliab. 66 (4) (2017) 1213–1228. doi:10.1109/TR.2017.2727062

  41. [41]

    Athanasiou, A

    D. Athanasiou, A. Nugroho, J. Visser, A. Zaidman, Test code quality and its relation to issue handling performance, IEEE Trans. Software Eng. 40 (11) (2014) 1100–1125

  42. [42]

    Vasilescu, Y

    B. Vasilescu, Y. Yu, H. Wang, P. T. Devanbu, V. Filkov, Quality and productivity outcomes relating to continuous integration in GitHub, in: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE), ACM, 2015, pp. 805–816

  43. [43]

    Sonarqube, https://www.sonarqube.org, last visited May 20th, 2022

  44. [44]

    Vassallo, F

    C. Vassallo, F. Palomba, A. Bacchelli, H. C. Gall, Continuous code quality: are we (really) doing that?, in: Proceedings of the International Conference on Automated Software Engineering (ASE), ACM, 2018, pp. 790–795

  45. [45]

    Hilton, N

    M. Hilton, N. Nelson, T. Tunnell, D. Marinov, D. Dig, Trade-offs in con- tinuous integration: assurance, security, and flexibility, in: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE), ACM, 2017, pp. 197–207. doi:10.1145/3106237.3106 270

  46. [46]

    Gautam, S

    A. Gautam, S. Vishwasrao, F. Servant, An empirical study of activ- ity, popularity, size, testing, and stability in continuous integration, in: Proceedings of the 14th International Conference on Mining Soft- ware Repositories (MSR), IEEE Computer Society, 2017, pp. 495–498. doi:10.1109/MSR.2017.38. 60

  47. [47]

    Gousios, M

    G. Gousios, M. Pinzger, A. van Deursen, An exploratory study of the pull-based software development model, in: 36th International Confer- ence on Software Engineering (ICSE), ACM, 2014, pp. 345–355

  48. [48]

    Gousios, A

    G. Gousios, A. Zaidman, A dataset for pull-based development research, in: Working Conf. on Mining Software Repositories (MSR), ACM, 2014, pp. 368–371

  49. [49]

    Zhang, A

    X. Zhang, A. Rastogi, Y. Yu, On the shoulders of giants: A new dataset for pull-based development research, in: MSR ’20: 17th International Conference on Mining Software Repositories, ACM, 2020, pp. 543–547

  50. [50]

    Kinsman, M

    T. Kinsman, M. S. Wessel, M. A. Gerosa, C. Treude, How do software developers use GitHub actions to automate their workflows?, in: In- ternational Conference on Mining Software Repositories (MSR), IEEE, 2021, pp. 420–431

  51. [51]

    Dabic, E

    O. Dabic, E. Aghajani, G. Bavota, Sampling projects in GitHub for MSR studies, in: International Conference on Mining Software Repositories (MSR), IEEE, 2021, pp. 560–564

  52. [52]

    A Quantitative Study of Java Software Buildability

    M. Sul´ ır, J. Porub¨ an, A quantitative study of java software buildability, CoRR abs/1712.01024 (2017). arXiv:1712.01024

  53. [53]

    Gallaba, M

    K. Gallaba, M. Lamothe, S. McIntosh, Lessons from eight years of op- erational data from a continuous integration service (2022)

  54. [54]

    Wessel, A

    M. Wessel, A. Serebrenik, I. Wiese, I. Steinmacher, M. A. Gerosa, Qual- ity gatekeepers: Investigating the effects of code review bots on pull request activities, Empirical Software Engineering To Appear

  55. [55]

    Kalliamvakou, G

    E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. Germ´ an, D. E. Damian, The promises and perils of mining github, in: P. T. Devanbu, S. Kim, M. Pinzger (Eds.), 11th Working Conference on Mining Software Repositories (MSR), ACM, 2014, pp. 92–101. doi: 10.1145/2597073.2597074

  56. [56]

    Horv´ ath, T

    F. Horv´ ath, T. Gergely, ´A. Besz´ edes, D. Tengeri, G. Balogh, T. Gyim´ othy, Code coverage differences of java bytecode and source code instrumentation tools, Softw. Qual. J. 27 (1) (2019) 79–123. doi:10.1007/s11219-017-9389-z . 61

  57. [57]

    McIntosh, M

    S. McIntosh, M. Nagappan, B. Adams, A. Mockus, A. E. Hassan, A large-scale empirical study of the relationship between build technology and build maintenance, Empir. Softw. Eng. 20 (6) (2015) 1587–1633

  58. [58]

    Thongtanunam, S

    P. Thongtanunam, S. McIntosh, A. E. Hassan, H. Iida, Review participa- tion in modern code review: An empirical study of the android, qt, and openstack projects (journal-first abstract), in: 25th International Con- ference on Software Analysis, Evolution and Reengineering (SANER), IEEE Computer Society, 2018, p. 475. doi:10.1109/SANER.2018.833 0241

  59. [59]

    T. L. Alves, C. Ypma, J. Visser, Deriving metric thresholds from bench- mark data, in: 26th IEEE International Conference on Software Main- tenance (ICSM), IEEE, 2010, pp. 1–10

  60. [60]

    H. Zhu, P. A. V. Hall, J. H. May, Software unit test coverage and ade- quacy, ACM Computing Surveys 29 (4) (1997)

  61. [61]

    Heitlager, T

    I. Heitlager, T. Kuipers, J. Visser, A practical model for measuring maintainability, in: Quality of Information and Communications Tech- nology, 6th International Conference on the Quality of Information and Communications Technology (QUATIC), IEEE Computer Society, 2007, pp. 30–39. doi:10.1109/QUATIC.2007.8

  62. [62]

    F. M. Dekking, C. Kraaikamp, H. P. Lopuha¨ a, L. E. Meester, A Modern Introduction to Probability and Statistics: Understanding why and how, Vol. 488, Springer, 2005

  63. [63]

    Sul´ ır, M

    M. Sul´ ır, M. Bac´ ıkov´ a, M. Madeja, S. Chodarev, J. Juh´ ar, Large-scale dataset of local java software build results, Data 5 (3) (2020) 86. doi: 10.3390/data5030086

  64. [64]

    Hassan, Tackling build failures in continuous integration, in: 34th IEEE/ACM International Conference on Automated Software Engineer- ing (ASE), IEEE, 2019, pp

    F. Hassan, Tackling build failures in continuous integration, in: 34th IEEE/ACM International Conference on Automated Software Engineer- ing (ASE), IEEE, 2019, pp. 1242–1245. doi:10.1109/ASE.2019.00150

  65. [65]

    Macho, S

    C. Macho, S. McIntosh, M. Pinzger, Automatically repairing dependency-related build breakage, in: 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE Com- puter Society, 2018, pp. 106–117. doi:10.1109/SANER.2018.8330201. 62

  66. [66]

    Barrak, E

    A. Barrak, E. E. Eghan, B. Adams, F. Khomh, Why do builds fail? - A conceptual replication study, J. Syst. Softw. 177 (2021) 110939. doi:10.1016/j.jss.2021.110939

  67. [67]

    Kononenko, O

    O. Kononenko, O. Baysal, L. Guerrouj, Y. Cao, M. W. Godfrey, In- vestigating code review quality: Do people and participation mat- ter?, in: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE Computer Society, 2015, pp. 111–120. doi:10.1109/ICSM.2015.7332457

  68. [68]

    Vassallo, S

    C. Vassallo, S. Panichella, F. Palomba, S. Proksch, A. Zaidman, H. C. Gall, Context is king: The developer perspective on the usage of static analysis tools, in: 25th International Conference on Software Analysis, Evolution and Reengineering, (SANER), IEEE Computer Society, 2018, pp. 38–49. doi:10.1109/SANER.2018.8330195

  69. [69]

    Zampetti, S

    F. Zampetti, S. Scalabrino, R. Oliveto, G. Canfora, M. D. Penta, How open source projects use static code analysis tools in continuous inte- gration pipelines, in: Proceedings of the 14th International Conference on Mining Software Repositories (MSR), IEEE Computer Society, 2017, pp. 334–344. doi:10.1109/MSR.2017.2

  70. [70]

    Golzadeh, A

    M. Golzadeh, A. Decan, T. Mens, On the rise and fall of CI services in GitHub, in: 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2022

  71. [71]

    J. H. Bernardo, D. A. da Costa, U. Kulesza, Studying the impact of adopting continuous integration on the delivery time of pull requests, in: Proceedings of the 15th International Conference on Mining Software Repositories (MSR), ACM, 2018, pp. 131–141

  72. [72]

    Rahman, A

    A. Rahman, A. Agrawal, R. Krishna, A. Sobran, Characteriz- ing the influence of continuous integration: empirical results from 250+ open source and proprietary projects, in: Proceedings of the 4th ACM SIGSOFT International Workshop on Software Analytics, SWAN@ESEC/SIGSOFT FSE, ACM, 2018, pp. 8–14. doi:10.114 5/3278142.3278149

  73. [73]

    A. Bosu, M. Greiler, C. Bird, Characteristics of useful code reviews: An empirical study at microsoft, in: 12th IEEE/ACM Working Conference on Mining Software Repositories (MSR), IEEE, 2015, pp. 146–156. 63

  74. [74]

    L. N. Q. Do, J. R. Wright, K. Ali, Why do software developers use static analysis tools? A user-centered study of developer needs and motivations, IEEE Trans. Software Eng. 48 (3) (2022) 835–847. doi: 10.1109/TSE.2020.3004525

  75. [75]

    Y. Wang, M. V. M¨ antyl¨ a, Z. Liu, J. Markkula, Test automation maturity improves product quality - quantitative study of open source projects using continuous integration, J. Syst. Softw. 188 (2022) 111259. doi: 10.1016/j.jss.2022.111259. URL https://doi.org/10.1016/j.jss.2022.111259 64