State-Of-The-Practice in Quality Assurance in Java-Based Open Source Software Development
Pith reviewed 2026-05-24 08:35 UTC · model grok-4.3
The pith
Popular Java open source projects typically do not combine quality assurance practices at high intensity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In 1,454 popular open source Java projects on GitHub, quality assurance approaches are not typically followed all together with high intensity; only weak correlations appear among some practices. More mature projects apply the practices more intensely, with greater focus on automated static analysis tool usage and code reviewing, yet show no strong change in continuous integration usage.
What carries the argument
Intensity metrics and proxies for each quality assurance practice (testing, code review, ASAT, CI) measured across the full sample of GitHub repositories and contrasted by project maturity.
If this is right
- Projects tend to adopt quality assurance practices selectively rather than comprehensively.
- Project maturity drives increased use of automated static analysis and code review.
- Continuous integration adoption stays roughly constant as projects mature.
- The weak observed correlations imply that practices are often implemented independently.
Where Pith is reading between the lines
- Tooling that lowers the cost of combining currently weakly correlated practices could raise overall quality assurance intensity.
- Maturity-related gains appear limited to certain practices, suggesting targeted interventions might be more effective than general process upgrades.
- The findings could guide empirical comparisons with closed-source or non-Java open source projects.
Load-bearing premise
The selected metrics and proxies accurately reflect the real intensity of each quality assurance practice, and the 1,454 popular GitHub Java projects represent broader Java-based open source development.
What would settle it
A new sample of Java projects in which multiple quality assurance practices show both high intensity and strong correlations with one another.
Figures
read the original abstract
To ensure the quality of software systems, software engineers can make use of a variety of quality assurance approaches, such as software testing, modern code review, automated static analysis, and build automation. Each of these quality assurance practices has been studied in depth in isolation, but there is a clear knowledge gap when it comes to our understanding of how these approaches are being used in conjunction or not. In our study, we broadly investigate whether and how these quality assurance approaches are being used in conjunction in the development of 1,454 popular open source software projects on GitHub. Our study indicates that typically projects do not follow all quality assurance practices together with high intensity. In fact, we only observe weak correlation among some quality assurance practices. In general, our study provides a deeper understanding of how existing quality assurance approaches are currently being used in Java-based open source software development. Besides, we specifically zoomed in on the more mature projects in our dataset, and generally, we observe that more mature projects are more intense in their application of the quality assurance practices, with more focus on their ASAT usage and code reviewing, but no strong change in their CI usage.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports an empirical study of quality assurance (QA) practices—software testing, modern code review, automated static analysis tools (ASATs), and continuous integration (CI)—across 1,454 popular Java-based open source projects on GitHub. It claims that projects do not typically apply all practices together at high intensity, supported by observations of only weak correlations among the practices. It further reports that more mature projects apply QA practices more intensely overall, with increased focus on ASAT usage and code reviewing but no strong change in CI usage.
Significance. If the intensity metrics prove valid, the work addresses a genuine gap by examining the joint application of multiple QA practices rather than isolated studies, providing a broad observational snapshot of state-of-the-practice in Java OSS. The sample scale (1,454 projects) is a clear strength for descriptive claims. The maturity stratification adds a useful dimension. However, the absence of proxy validation directly limits how much weight the correlation and maturity findings can carry for the field.
major comments (2)
- [Section 3] Section 3 (Data Collection and Metric Definition): The intensity proxies for each QA practice (test file counts or coverage for testing, review comments per PR for code review, presence/absence of tool configuration files for ASATs, and workflow file presence for CI) are defined without reported validation against ground-truth usage (e.g., developer surveys, manual audits of a subsample, or trace data). Because the correlation matrix and maturity comparisons in Section 5 are computed directly from these proxies, any systematic mismatch between proxy and actual intensity renders the weak-correlation claim and the differential maturity effects uninterpretable.
- [Section 4.2] Section 4.2 (Maturity Stratification): The operationalization of 'maturity' (likely commit history, age, or contributor count) and the statistical test used to compare intensity across maturity strata are not accompanied by effect-size reporting or controls for confounding variables such as project size or domain. This makes the claim of 'more focus on ASAT usage and code reviewing, but no strong change in CI usage' difficult to evaluate as load-bearing evidence.
minor comments (2)
- [Abstract] Abstract: The abstract omits any description of the concrete metrics, statistical tests, or sample-selection criteria, forcing readers to reach the full text before assessing the headline claims.
- [Results tables] Tables reporting correlations: Include exact Pearson/Spearman coefficients, p-values, and sample sizes per cell so readers can judge the 'weak' characterization quantitatively rather than qualitatively.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below, indicating where revisions will be incorporated.
read point-by-point responses
-
Referee: [Section 3] Section 3 (Data Collection and Metric Definition): The intensity proxies for each QA practice (test file counts or coverage for testing, review comments per PR for code review, presence/absence of tool configuration files for ASATs, and workflow file presence for CI) are defined without reported validation against ground-truth usage (e.g., developer surveys, manual audits of a subsample, or trace data). Because the correlation matrix and maturity comparisons in Section 5 are computed directly from these proxies, any systematic mismatch between proxy and actual intensity renders the weak-correlation claim and the differential maturity effects uninterpretable.
Authors: We agree that the proxies lack direct ground-truth validation such as surveys or manual audits, which is a limitation for interpreting the correlations and maturity effects. The proxies were selected as observable GitHub artifacts following conventions in prior large-scale empirical studies on OSS. We will revise the manuscript to expand the rationale for each proxy with additional literature citations, add an explicit limitations subsection discussing potential mismatches, and note the scale constraints that precluded subsample validation. This is a partial revision as full validation is not feasible post-hoc at this dataset size. revision: partial
-
Referee: [Section 4.2] Section 4.2 (Maturity Stratification): The operationalization of 'maturity' (likely commit history, age, or contributor count) and the statistical test used to compare intensity across maturity strata are not accompanied by effect-size reporting or controls for confounding variables such as project size or domain. This makes the claim of 'more focus on ASAT usage and code reviewing, but no strong change in CI usage' difficult to evaluate as load-bearing evidence.
Authors: We will clarify the precise definition and measurement of maturity (based on project age and commit history) in Section 4.2. The revised version will include effect sizes for all stratum comparisons and a discussion of potential confounders such as project size and domain, with additional analysis or controls where data permit. This directly addresses the evaluability concern. revision: yes
Circularity Check
No circularity: purely observational empirical study
full rationale
This paper performs an observational analysis of 1,454 GitHub Java projects by measuring the presence and intensity of QA practices (testing, code review, ASATs, CI) via direct repository artifacts. No equations, derivations, fitted parameters, or predictions are claimed; the reported correlations and maturity stratifications are computed directly from the collected metrics. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify core results. The study is self-contained against its own data collection protocol with no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 1,454 selected projects and the chosen indicators for QA practice intensity provide a valid representation of the state-of-the-practice in Java OSS development.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our study indicates that typically projects do not follow all quality assurance practices together with high intensity. In fact, we only observe weak correlation among some quality assurance practices.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
more mature projects are more intense in their application of the quality assurance practices, with more focus on their ASAT usage and code reviewing, but no strong change in their CI usage
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
J. Patel, Software is still eating the world, https://techcrunch.com/2 016/06/07/software-is-eating-the-world-5-years-later/?gucc ounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guc e_referrer_sig=AQAAADIqx8LBuU1uKI03errh0RlYZjGsX_ZK76KVXqy 3KqGkv3xyyXVrxi-46rFMEmZaBV4Na7Cm2lYLUC_QcKfhx0-njTwVR8XKsj 55 krDvNC9CoaHj4L9SLucX6hkJUcxl-rhBjsxcATrgy0yFSp...
work page 2022
-
[2]
Jazayeri, The education of a software engineer, in: Proc
M. Jazayeri, The education of a software engineer, in: Proc. Interna- tional Conference on Automated Software Engineering (ASE), IEEE, USA, 2004
work page 2004
-
[3]
A. J. Ko, B. Dosono, N. Duriseti, Thirty years of software problems in the news, in: Proceedings of the 7th International Workshop on Coop- erative and Human Aspects of Software Engineering (CHASE), ACM, 2014, pp. 32–39
work page 2014
-
[4]
Aniche, Effective Software Testing: A Developer’s Guide, Manning Publications, 2022
M. Aniche, Effective Software Testing: A Developer’s Guide, Manning Publications, 2022
work page 2022
- [5]
-
[6]
A. Bacchelli, C. Bird, Expectations, outcomes, and challenges of modern code review, in: 35th International Conference on Software Engineering (ICSE), IEEE, 2013, pp. 712–721
work page 2013
- [7]
-
[8]
C. Vassallo, S. Panichella, F. Palomba, S. Proksch, H. C. Gall, A. Zaid- man, How developers engage with static analysis tools in different con- texts, Empir. Softw. Eng. 25 (2) (2020) 1419–1457
work page 2020
- [9]
- [10]
- [11]
- [12]
-
[13]
P. C. Rigby, D. M. Germ´ an, M. D. Storey, Open source software peer review practices: a case study of the apache server, in: International Conference on Software Engineering (ICSE), ACM, 2008, pp. 541–550
work page 2008
- [14]
- [15]
-
[16]
Y. Zhao, A. Serebrenik, Y. Zhou, V. Filkov, B. Vasilescu, The impact of continuous integration on other software development practices: a large- scale empirical study, in: Proceedings of the International Conference on Automated Software Engineering (ASE), IEEE, 2017, pp. 60–71
work page 2017
-
[17]
F. Zampetti, G. Bavota, G. Canfora, M. D. Penta, A study on the interplay between pull request review and continuous integration builds, in: 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2019, pp. 38–48
work page 2019
-
[18]
S. Panichella, V. Arnaoudova, M. D. Penta, G. Antoniol, Would static analysis tools help developers with code reviews?, in: 22nd IEEE Inter- national Conference on Software Analysis, Evolution, and Reengineering (SANER), IEEE, 2015, pp. 161–170
work page 2015
-
[19]
G. S. Nery, D. A. da Costa, U. Kulesza, An empirical study of the re- lationship between continuous integration and test code evolution, in: 57 2019 IEEE International Conference on Software Maintenance and Evo- lution (ICSME), IEEE, 2019, pp. 426–436
work page 2019
-
[20]
M. V. M¨ antyl¨ a, C. Lassenius, What types of defects are really discovered in code reviews?, IEEE Transactions on Software Engineering 35 (3) (2009) 430–448
work page 2009
- [21]
-
[22]
A. Khatami, A. Zaidman, ”State-Of-The-Practice in Quality Assurance in Java-Based Open Source Software Development” Replication Package (Dec. 2022). doi:10.5281/zenodo.7404903. URL https://doi.org/10.5281/zenodo.7404903
- [23]
-
[24]
M. Maes-Bermejo, M. Gallego, F. Gort´ azar, G. Robles, J. M. Gonz´ alez- Barahona, Revisiting the building of past snapshots - a replication and reproduction study, Empir. Softw. Eng. 27 (3) (2022) 65. doi:10.100 7/s10664-022-10117-6
work page 2022
- [25]
- [26]
- [27]
-
[28]
X. Jin, F. Servant, A cost-efficient approach to building in continuous integration, in: Proc. International Conference on Software Engineering (ICSE), ACM, 2020, pp. 13–25
work page 2020
-
[29]
Y. Lou, Z. Chen, Y. Cao, D. Hao, L. Zhang, Understanding build issue resolution in practice: symptoms and fix patterns, in: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), ACM, 2020, pp. 617–628
work page 2020
-
[30]
S. McIntosh, Y. Kamei, B. Adams, A. E. Hassan, The impact of code re- view coverage and code review participation on software quality: a case study of the qt, vtk, and ITK projects, in: Proceedings of the 11th Work- ing Conference on Mining Software Repositories (MSR), ACM, 2014, pp. 192–201
work page 2014
-
[31]
M. M. Rahman, C. K. Roy, Impact of continuous integration on code reviews, in: Proc. International Conference on Mining Software Repos- itories (MSR), IEEE, 2017, pp. 499–502
work page 2017
-
[32]
M. E. Fagan, Design and code inspections to reduce errors in program development, IBM Syst. J. 15 (3) (1976) 182–211
work page 1976
-
[33]
O. Kononenko, O. Baysal, M. W. Godfrey, Code review quality: how developers see it, in: Proceedings of the International Conference on Software Engineering (ICSE), ACM, 2016, pp. 1028–1038
work page 2016
-
[34]
R. Gopinath, C. Jensen, A. Groce, Code coverage for suite evaluation by developers, in: 36th International Conference on Software Engineering (ICSE), ACM, 2014, pp. 72–82. doi:10.1145/2568225.2568278
- [35]
-
[36]
S. G. Elbaum, D. Gable, G. Rothermel, The impact of software evolution on code coverage information, in: International Conference on Software Maintenance (ICSM), IEEE, 2001, pp. 170–179
work page 2001
-
[37]
A. Zaidman, B. Van Rompaey, A. van Deursen, S. Demeyer, Studying the co-evolution of production and test code in open source and indus- trial developer test processes through repository mining, Empir. Softw. Eng. 16 (3) (2011) 325–364. 59
work page 2011
-
[38]
G. Gousios, A. Zaidman, M. D. Storey, A. van Deursen, Work practices and challenges in pull-based development: The integrator’s perspective, in: 37th IEEE/ACM International Conference on Software Engineering (ICSE), IEEE, 2015, pp. 358–368
work page 2015
- [39]
-
[40]
P. S. Kochhar, D. Lo, J. Lawall, N. Nagappan, Code coverage and postre- lease defects: A large-scale study on open source projects, IEEE Trans. Reliab. 66 (4) (2017) 1213–1228. doi:10.1109/TR.2017.2727062
-
[41]
D. Athanasiou, A. Nugroho, J. Visser, A. Zaidman, Test code quality and its relation to issue handling performance, IEEE Trans. Software Eng. 40 (11) (2014) 1100–1125
work page 2014
-
[42]
B. Vasilescu, Y. Yu, H. Wang, P. T. Devanbu, V. Filkov, Quality and productivity outcomes relating to continuous integration in GitHub, in: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE), ACM, 2015, pp. 805–816
work page 2015
-
[43]
Sonarqube, https://www.sonarqube.org, last visited May 20th, 2022
work page 2022
-
[44]
C. Vassallo, F. Palomba, A. Bacchelli, H. C. Gall, Continuous code quality: are we (really) doing that?, in: Proceedings of the International Conference on Automated Software Engineering (ASE), ACM, 2018, pp. 790–795
work page 2018
-
[45]
M. Hilton, N. Nelson, T. Tunnell, D. Marinov, D. Dig, Trade-offs in con- tinuous integration: assurance, security, and flexibility, in: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE), ACM, 2017, pp. 197–207. doi:10.1145/3106237.3106 270
-
[46]
A. Gautam, S. Vishwasrao, F. Servant, An empirical study of activ- ity, popularity, size, testing, and stability in continuous integration, in: Proceedings of the 14th International Conference on Mining Soft- ware Repositories (MSR), IEEE Computer Society, 2017, pp. 495–498. doi:10.1109/MSR.2017.38. 60
-
[47]
G. Gousios, M. Pinzger, A. van Deursen, An exploratory study of the pull-based software development model, in: 36th International Confer- ence on Software Engineering (ICSE), ACM, 2014, pp. 345–355
work page 2014
-
[48]
G. Gousios, A. Zaidman, A dataset for pull-based development research, in: Working Conf. on Mining Software Repositories (MSR), ACM, 2014, pp. 368–371
work page 2014
- [49]
-
[50]
T. Kinsman, M. S. Wessel, M. A. Gerosa, C. Treude, How do software developers use GitHub actions to automate their workflows?, in: In- ternational Conference on Mining Software Repositories (MSR), IEEE, 2021, pp. 420–431
work page 2021
- [51]
-
[52]
A Quantitative Study of Java Software Buildability
M. Sul´ ır, J. Porub¨ an, A quantitative study of java software buildability, CoRR abs/1712.01024 (2017). arXiv:1712.01024
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[53]
K. Gallaba, M. Lamothe, S. McIntosh, Lessons from eight years of op- erational data from a continuous integration service (2022)
work page 2022
- [54]
-
[55]
E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. Germ´ an, D. E. Damian, The promises and perils of mining github, in: P. T. Devanbu, S. Kim, M. Pinzger (Eds.), 11th Working Conference on Mining Software Repositories (MSR), ACM, 2014, pp. 92–101. doi: 10.1145/2597073.2597074
-
[56]
F. Horv´ ath, T. Gergely, ´A. Besz´ edes, D. Tengeri, G. Balogh, T. Gyim´ othy, Code coverage differences of java bytecode and source code instrumentation tools, Softw. Qual. J. 27 (1) (2019) 79–123. doi:10.1007/s11219-017-9389-z . 61
-
[57]
S. McIntosh, M. Nagappan, B. Adams, A. Mockus, A. E. Hassan, A large-scale empirical study of the relationship between build technology and build maintenance, Empir. Softw. Eng. 20 (6) (2015) 1587–1633
work page 2015
-
[58]
P. Thongtanunam, S. McIntosh, A. E. Hassan, H. Iida, Review participa- tion in modern code review: An empirical study of the android, qt, and openstack projects (journal-first abstract), in: 25th International Con- ference on Software Analysis, Evolution and Reengineering (SANER), IEEE Computer Society, 2018, p. 475. doi:10.1109/SANER.2018.833 0241
-
[59]
T. L. Alves, C. Ypma, J. Visser, Deriving metric thresholds from bench- mark data, in: 26th IEEE International Conference on Software Main- tenance (ICSM), IEEE, 2010, pp. 1–10
work page 2010
-
[60]
H. Zhu, P. A. V. Hall, J. H. May, Software unit test coverage and ade- quacy, ACM Computing Surveys 29 (4) (1997)
work page 1997
-
[61]
I. Heitlager, T. Kuipers, J. Visser, A practical model for measuring maintainability, in: Quality of Information and Communications Tech- nology, 6th International Conference on the Quality of Information and Communications Technology (QUATIC), IEEE Computer Society, 2007, pp. 30–39. doi:10.1109/QUATIC.2007.8
-
[62]
F. M. Dekking, C. Kraaikamp, H. P. Lopuha¨ a, L. E. Meester, A Modern Introduction to Probability and Statistics: Understanding why and how, Vol. 488, Springer, 2005
work page 2005
-
[63]
M. Sul´ ır, M. Bac´ ıkov´ a, M. Madeja, S. Chodarev, J. Juh´ ar, Large-scale dataset of local java software build results, Data 5 (3) (2020) 86. doi: 10.3390/data5030086
-
[64]
F. Hassan, Tackling build failures in continuous integration, in: 34th IEEE/ACM International Conference on Automated Software Engineer- ing (ASE), IEEE, 2019, pp. 1242–1245. doi:10.1109/ASE.2019.00150
-
[65]
C. Macho, S. McIntosh, M. Pinzger, Automatically repairing dependency-related build breakage, in: 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE Com- puter Society, 2018, pp. 106–117. doi:10.1109/SANER.2018.8330201. 62
-
[66]
A. Barrak, E. E. Eghan, B. Adams, F. Khomh, Why do builds fail? - A conceptual replication study, J. Syst. Softw. 177 (2021) 110939. doi:10.1016/j.jss.2021.110939
-
[67]
O. Kononenko, O. Baysal, L. Guerrouj, Y. Cao, M. W. Godfrey, In- vestigating code review quality: Do people and participation mat- ter?, in: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE Computer Society, 2015, pp. 111–120. doi:10.1109/ICSM.2015.7332457
-
[68]
C. Vassallo, S. Panichella, F. Palomba, S. Proksch, A. Zaidman, H. C. Gall, Context is king: The developer perspective on the usage of static analysis tools, in: 25th International Conference on Software Analysis, Evolution and Reengineering, (SANER), IEEE Computer Society, 2018, pp. 38–49. doi:10.1109/SANER.2018.8330195
-
[69]
F. Zampetti, S. Scalabrino, R. Oliveto, G. Canfora, M. D. Penta, How open source projects use static code analysis tools in continuous inte- gration pipelines, in: Proceedings of the 14th International Conference on Mining Software Repositories (MSR), IEEE Computer Society, 2017, pp. 334–344. doi:10.1109/MSR.2017.2
-
[70]
M. Golzadeh, A. Decan, T. Mens, On the rise and fall of CI services in GitHub, in: 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, 2022
work page 2022
-
[71]
J. H. Bernardo, D. A. da Costa, U. Kulesza, Studying the impact of adopting continuous integration on the delivery time of pull requests, in: Proceedings of the 15th International Conference on Mining Software Repositories (MSR), ACM, 2018, pp. 131–141
work page 2018
-
[72]
A. Rahman, A. Agrawal, R. Krishna, A. Sobran, Characteriz- ing the influence of continuous integration: empirical results from 250+ open source and proprietary projects, in: Proceedings of the 4th ACM SIGSOFT International Workshop on Software Analytics, SWAN@ESEC/SIGSOFT FSE, ACM, 2018, pp. 8–14. doi:10.114 5/3278142.3278149
-
[73]
A. Bosu, M. Greiler, C. Bird, Characteristics of useful code reviews: An empirical study at microsoft, in: 12th IEEE/ACM Working Conference on Mining Software Repositories (MSR), IEEE, 2015, pp. 146–156. 63
work page 2015
-
[74]
L. N. Q. Do, J. R. Wright, K. Ali, Why do software developers use static analysis tools? A user-centered study of developer needs and motivations, IEEE Trans. Software Eng. 48 (3) (2022) 835–847. doi: 10.1109/TSE.2020.3004525
-
[75]
Y. Wang, M. V. M¨ antyl¨ a, Z. Liu, J. Markkula, Test automation maturity improves product quality - quantitative study of open source projects using continuous integration, J. Syst. Softw. 188 (2022) 111259. doi: 10.1016/j.jss.2022.111259. URL https://doi.org/10.1016/j.jss.2022.111259 64
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.