Method-level Change-proneness: A Better Metric for Black-box Test Suite Minimization
Pith reviewed 2026-05-19 17:13 UTC · model grok-4.3
The pith
Method-level change-proneness provides a stronger guide than class-level metrics for shrinking black-box test suites.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that computing change-proneness for each method from version-control metadata, linking test cases to methods via test-code call-graph analysis, and scoring the associations with statistical measures such as average and geometric mean produces reduced test suites that retain higher accuracy and fault-detection capability than either class-level change-proneness or similarity-based selection.
What carries the argument
The MCTM process that ranks test cases by their statistical association with change-prone methods identified through call-graph dependencies.
If this is right
- Black-box test-suite reduction becomes feasible at scale without inspecting production source code.
- Test cases tied to change-prone methods are retained in preference to others.
- Fault detection remains high while the number of executed tests decreases.
- The method runs more efficiently than similarity-based alternatives on the evaluated projects.
Where Pith is reading between the lines
- The same call-graph linking step could be reused in other black-box reduction techniques that already collect execution traces.
- Projects with frequent small commits might see even stronger gains because method-level change signals would be more precise than class-level ones.
- Teams could combine the method scores with simple execution-time data to further trim suites without additional static analysis.
Load-bearing premise
The test-code call-graph accurately captures which methods each test case actually depends on or exercises.
What would settle it
Running MCTM on a fresh collection of projects with documented buggy versions and measuring whether the average fault-detection rate drops substantially below the levels reported for the original fifteen projects.
Figures
read the original abstract
Test Suite Minimization (TSM) reduces the size of test suites while preserving their fault detection capability. In black-box TSM, reduction is performed without analyzing production code. While several black-box TSM approaches have explored metrics like test logs or test similarity, those often suffer from scalability and efficiency issues. On the other hand, change-proneness (CP), recently emerged as an efficient and scalable alternative metric, has only been applied at class level. To accurately identify fault-revealing test cases, we propose CP at finer-grained method-level and implement Method-level Change-proneness based Test-suite Minimization (MCTM). MCTM first calculates CP for each method from version control metadata, then determines the dependency between test cases and methods by analyzing the test-code call-graph. Next, it scores the association between test cases and their invoked methods using statistical measures such as Average, Geometric Mean etc. Finally, test cases with the highest scores are selected to form the reduced suite. Evaluation on 15 open-source Java projects with 635 buggy versions shows MCTM achieves 0.93 accuracy and 0.94 fault detection rate on average, significantly outperforming class-level CP and similarity-based approaches while maintaining superior efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Method-level Change-proneness based Test-suite Minimization (MCTM) as an improvement over class-level change-proneness for black-box test suite minimization. MCTM computes method-level change-proneness from version-control metadata, links test cases to methods via analysis of the test-code call-graph, scores test-method associations with statistical measures (Average, Geometric Mean, etc.), and selects the highest-scoring tests. Evaluation on 15 open-source Java projects comprising 635 buggy versions reports average accuracy of 0.93 and fault detection rate of 0.94, with claims of outperforming class-level CP and similarity-based baselines while offering superior efficiency.
Significance. If the black-box property can be rigorously established and the reported performance gains hold under transparent methodology, MCTM would represent a practical advance in scalable test suite minimization for projects with version history, potentially reducing test execution costs without sacrificing fault detection. The use of real-world projects and a large number of buggy versions strengthens the empirical grounding relative to synthetic evaluations.
major comments (2)
- Abstract: The central premise that MCTM performs black-box TSM 'without analyzing production code' is placed in tension by the description of determining 'the dependency between test cases and methods by analyzing the test-code call-graph.' Static or dynamic construction of a call-graph that resolves production method signatures typically requires either source/bytecode access to the methods under test or execution traces exposing those signatures; this risks rendering the approach gray-box rather than black-box, which directly affects the claimed efficiency and scalability advantages over similarity-based methods that also consume execution data.
- Evaluation description (implied in abstract): The reported averages of 0.93 accuracy and 0.94 fault detection rate are presented without any indication of how these quantities are defined, how baselines were re-implemented, what data exclusions or parameter choices were applied, or whether statistical significance testing was performed across the 635 versions. Because these metrics are load-bearing for the superiority claim, their computation must be specified before the results can be assessed.
minor comments (2)
- The statistical measures used for scoring (Average, Geometric Mean, etc.) should be given explicit formulas or references in the method section to allow replication.
- Clarify whether the call-graph analysis is performed statically on test source only or requires dynamic execution, and state any assumptions about test-code structure.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments, which help us improve the clarity and rigor of the manuscript. We address each major comment below and will incorporate revisions to strengthen the presentation of our black-box claims and evaluation methodology.
read point-by-point responses
-
Referee: Abstract: The central premise that MCTM performs black-box TSM 'without analyzing production code' is placed in tension by the description of determining 'the dependency between test cases and methods by analyzing the test-code call-graph.' Static or dynamic construction of a call-graph that resolves production method signatures typically requires either source/bytecode access to the methods under test or execution traces exposing those signatures; this risks rendering the approach gray-box rather than black-box, which directly affects the claimed efficiency and scalability advantages over similarity-based methods that also consume execution data.
Authors: We appreciate the referee highlighting this important distinction. MCTM derives change-proneness exclusively from version-control metadata (commit history) without any static or dynamic inspection of production code internals. The test-code call-graph analysis is performed only on the test sources to extract call sites and the method signatures they reference; no production code is loaded, parsed, or executed for the purpose of minimization. This is distinct from gray-box approaches that profile production execution or analyze production dependencies. To eliminate ambiguity, we will revise the abstract and Section 3 to include a precise definition of the black-box property in this context and explicitly state the boundaries of the call-graph analysis. revision: yes
-
Referee: Evaluation description (implied in abstract): The reported averages of 0.93 accuracy and 0.94 fault detection rate are presented without any indication of how these quantities are defined, how baselines were re-implemented, what data exclusions or parameter choices were applied, or whether statistical significance testing was performed across the 635 versions. Because these metrics are load-bearing for the superiority claim, their computation must be specified before the results can be assessed.
Authors: We agree that the abstract alone does not convey these details and that the evaluation section would benefit from greater transparency. In the full manuscript, accuracy is defined as the fraction of test cases correctly classified as fault-revealing or non-fault-revealing, and fault detection rate is the proportion of known bugs still detected by the minimized suite. Baselines were re-implemented following the original papers' descriptions, using the same 15 projects and 635 buggy versions; we applied no data exclusions beyond requiring sufficient version history for change-proneness computation. We will add a new subsection (e.g., 4.3) that formally defines both metrics, documents re-implementation choices and parameter settings, lists any filtering criteria, and reports statistical significance results (Wilcoxon signed-rank test with p-values) comparing MCTM against class-level CP and similarity baselines across all versions. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes an empirical pipeline for MCTM: CP scores are computed directly from external version-control metadata, dependencies are extracted via test-code call-graph analysis, and association scores are computed with standard statistical aggregates (Average, Geometric Mean) before selecting top-ranked tests. No equations, fitted parameters, or self-referential definitions appear in which an output is forced to equal an input by construction. Evaluation is performed on 15 independent open-source projects with real buggy versions, rendering the central claims externally falsifiable rather than tautological.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Test-code call-graph analysis accurately identifies the methods invoked by each test case.
- domain assumption Method-level change-proneness derived from version control metadata serves as a reliable proxy for fault-revealing potential.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MCTM first calculates CP for each method from version control metadata, then determines the dependency between test cases and methods by analyzing the test-code call-graph. Next, it scores the association between test cases and their invoked methods using statistical measures such as Average, Geometric Mean etc.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ChgFreqM = Number of Changes M / Total CommitsM ; ChgExtM = CodeChurnM / Total CommitsM
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Regression testing minimization, selection and prioritization: a survey,
S. Yoo and M. Harman, “Regression testing minimization, selection and prioritization: a survey,”Softw. Test. Verification Reliab., vol. 22, no. 2, pp. 67–120, 2012. [Online]. Available: https://doi.org/10.1002/stv.430
-
[2]
S. U. R. Khan, S. P. Lee, N. Javaid, and W. Abdul, “A systematic review on test suite reduction: Approaches, experiment’s quality evaluation, and guidelines,”IEEE Access, vol. 6, pp. 11 816–11 841, 2018. [Online]. Available: https://doi.org/10.1109/ACCESS.2018.2809600
-
[3]
Frontiers in Astronomy and Space Sciences , keywords =
A. A. Philip, R. Bhagwan, R. Kumar, C. S. Maddila, and N. Nagappan, “Fastlane: test minimization for rapidly deployed large-scale online services,” inProceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, J. M. Atlee, T. Bultan, and J. Whittle, Eds. IEEE / ACM, 2019, pp. 408–418. [Online...
-
[4]
In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp
R. Pan, T. A. Ghaleb, and L. C. Briand, “ATM: black-box test case minimization based on test code similarity and evolutionary search,” in45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2023, pp. 1700–1711. [Online]. Available: https://doi.org/10.1109/ICSE48619. 2023.00146
-
[5]
Ltm: Scalable and black-box similarity-based test suite mini- mization based on language models,
——, “Ltm: Scalable and black-box similarity-based test suite mini- mization based on language models,”IEEE Transactions on Software Engineering, pp. 1–19, 2024
work page 2024
-
[6]
M. Siam, M. N. Fuad, and K. Sakib, “An exploratory study on the impact of change-proneness as a metric in black-box test suite minimization,” inIEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, Canada, March 4-7, 2025, 2025, pp. 855–860
work page 2025
-
[7]
Reconstructing fine-grained ver- sioning repositories with git for method-level bug prediction,
H. Hata, O. Mizuno, and T. Kikuno, “Reconstructing fine-grained ver- sioning repositories with git for method-level bug prediction,”IWESEP ‘10, pp. 27–32, 2010
work page 2010
-
[8]
——, “Bug prediction based on fine-grained module histories,” in34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland, M. Glinz, G. C. Murphy, and M. Pezz `e, Eds. IEEE Computer Society, 2012, pp. 200–210. [Online]. Available: https://doi.org/10.1109/ICSE.2012.6227193
-
[9]
C. Catal and B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem,” Information Sciences, vol. 179, no. 8, pp. 1040–1058, 2009
work page 2009
-
[10]
Comparing fine-grained source code changes and code churn for bug prediction,
E. Giger, M. Pinzger, and H. C. Gall, “Comparing fine-grained source code changes and code churn for bug prediction,” inProceedings of the 8th International Working Conference on Mining Software Repositories, MSR 2011 (Co-located with ICSE), Waikiki, Honolulu, HI, USA, May 21-28, 2011, Proceedings, A. van Deursen, T. Xie, and T. Zimmermann, Eds. ACM, 2011...
-
[11]
Software metrics reduction for fault- proneness prediction of software modules,
Y . Luo, K. Ben, and L. Mi, “Software metrics reduction for fault- proneness prediction of software modules,” inIFIP International Con- ference on Network and Parallel Computing. Springer, 2010, pp. 432– 441
work page 2010
-
[12]
Test case prioritization using test case diversification and fault-proneness estima- tions,
M. Mahdieh, S.-H. Mirian-Hosseinabadi, and M. Mahdieh, “Test case prioritization using test case diversification and fault-proneness estima- tions,”Automated Software Engineering, vol. 29, no. 2, p. 50, 2022
work page 2022
-
[13]
Scalable approaches for test suite reduction,
E. Cruciani, B. Miranda, R. Verdecchia, and A. Bertolino, “Scalable approaches for test suite reduction,” inProceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, J. M. Atlee, T. Bultan, and J. Whittle, Eds. IEEE / ACM, 2019, pp. 419–429. [Online]. Available: https://doi.org/10.1109/ICSE...
-
[14]
A method for assessing class change proneness,
E. Arvanitou, A. Ampatzoglou, A. Chatzigeorgiou, and P. Avgeriou, “A method for assessing class change proneness,” inProceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, EASE 2017, Karlskrona, Sweden, June 15-16, 2017, E. Mendes, S. Counsell, and K. Petersen, Eds. ACM, 2017, pp. 186–195. [Online]. Availabl...
-
[15]
A survey on test suite reduction frameworks and tools,
S. U. R. Khan, S. P. Lee, R. W. Ahmad, A. Akhunzada, and V . Chang, “A survey on test suite reduction frameworks and tools,”Int. J. Inf. Manag., vol. 36, no. 6, pp. 963–975, 2016. [Online]. Available: https://doi.org/10.1016/j.ijinfomgt.2016.05.025
-
[16]
Scope-aided test prioritization, selection and minimization for software reuse,
B. Miranda and A. Bertolino, “Scope-aided test prioritization, selection and minimization for software reuse,”Journal of Systems and Software, vol. 131, pp. 528–549, 2017
work page 2017
-
[17]
An evaluation of test suite minimization techniques,
R. Noemmer and R. Haas, “An evaluation of test suite minimization techniques,” inInternational Conference on Software Quality. Springer, 2019, pp. 51–66
work page 2019
-
[18]
Ant colony optimization (aco-min) algorithm for test suite minimization,
S. Mohanty, S. K. Mohapatra, and S. F. Meko, “Ant colony optimization (aco-min) algorithm for test suite minimization,” inProgress in Comput- ing, Analytics and Networking: Proceedings of ICCAN 2019. Springer, 2020, pp. 55–63
work page 2019
-
[19]
Achieving scalable model- based testing through test case diversity,
H. Hemmati, A. Arcuri, and L. Briand, “Achieving scalable model- based testing through test case diversity,”ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 22, no. 1, pp. 1–42, 2013
work page 2013
-
[20]
Uncertainty-wise test case generation and minimization for cyber-physical systems,
M. Zhang, S. Ali, and T. Yue, “Uncertainty-wise test case generation and minimization for cyber-physical systems,”Journal of Systems and Software, vol. 153, pp. 1–21, 2019
work page 2019
-
[21]
User-session- based test cases optimization method based on agglutinate hierarchy clustering,
Y . Liu, K. Wang, W. Wei, B. Zhang, and H. Zhong, “User-session- based test cases optimization method based on agglutinate hierarchy clustering,” in2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing. IEEE, 2011, pp. 413–418
work page 2011
-
[22]
Clustering support for inadequate test suite reduction,
C. Coviello, S. Romano, G. Scanniello, A. Marchetto, G. Antoniol, and A. Corazza, “Clustering support for inadequate test suite reduction,” in2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2018, pp. 95–105
work page 2018
-
[23]
Pareto efficient multi-objective black-box test case selec- tion for simulation-based testing,
A. Arrieta, S. Wang, U. Markiegi, A. Arruabarrena, L. Etxeberria, and G. Sagardui, “Pareto efficient multi-objective black-box test case selec- tion for simulation-based testing,”Information and Software Technology, vol. 114, pp. 137–154, 2019
work page 2019
-
[24]
C.-T. Lin, K.-W. Tang, and G. M. Kapfhammer, “Test suite reduction methods that decrease regression testing costs by identifying irreplace- able tests,”Information and Software Technology, vol. 56, no. 10, pp. 1322–1344, 2014
work page 2014
-
[25]
Reducing the cost of regression testing by identifying irreplaceable test cases,
C.-T. Lin, K.-W. Tang, C.-D. Chen, and G. M. Kapfhammer, “Reducing the cost of regression testing by identifying irreplaceable test cases,” in2012 Sixth International Conference on Genetic and Evolutionary Computing. IEEE, 2012, pp. 257–260
work page 2012
-
[26]
Extensions of lipschitz mappings into a hilbert space,
W. B. Johnson, J. Lindenstrausset al., “Extensions of lipschitz mappings into a hilbert space,”Contemporary mathematics, vol. 26, no. 189-206, p. 1, 1984
work page 1984
-
[27]
From frequency to meaning: Vector space models of semantics,
P. D. Turney and P. Pantel, “From frequency to meaning: Vector space models of semantics,”Journal of artificial intelligence research, vol. 37, pp. 141–188, 2010
work page 2010
-
[28]
Revisiting method-level change prediction: A comparative evaluation at different granularities,
H. Sugimori and S. Hayashi, “Revisiting method-level change prediction: A comparative evaluation at different granularities,”CoRR, vol. abs/2502.17908, 2025. [Online]. Available: https://doi.org/10. 48550/arXiv.2502.17908
-
[29]
Empirical evaluation of fault localisation using code and change metrics,
J. Sohn and S. Yoo, “Empirical evaluation of fault localisation using code and change metrics,”IEEE Transactions on Software Engineering, vol. 47, no. 8, pp. 1605–1625, 2019
work page 2019
-
[30]
How well do change sequences predict defects? sequence learning from software changes,
M. Wen, R. Wu, and S.-C. Cheung, “How well do change sequences predict defects? sequence learning from software changes,”IEEE Trans- actions on Software Engineering, vol. 46, no. 11, pp. 1155–1175, 2018
work page 2018
-
[31]
A sequential comparative analysis of software change proneness prediction using machine learning,
R. Abbas and F. A. Albalooshi, “A sequential comparative analysis of software change proneness prediction using machine learning,”Int. J. Softw. Innov., vol. 10, no. 1, pp. 1–16, 2022. [Online]. Available: https://doi.org/10.4018/ijsi.297993
-
[32]
R. Koc ¸i, X. Franch, P. Jovanovic, and A. Abell ´o, “Web API change- proneness prediction,” inIEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024, Rovaniemi, Finland, March 12-15, 2024. IEEE, 2024, pp. 429–434. [Online]. Available: https://doi.org/10.1109/SANER60148.2024.00050
-
[33]
How bugs are born: a model to identify how bugs are introduced in software components,
G. Rodr ´ıguez-P´erez, G. Robles, A. Serebrenik, A. Zaidman, D. M. Germ´an, and J. M. Gonz ´alez-Barahona, “How bugs are born: a model to identify how bugs are introduced in software components,”Empir. Softw. Eng., vol. 25, no. 2, pp. 1294–1340, 2020. [Online]. Available: https://doi.org/10.1007/s10664-019-09781-y
-
[34]
Software fault prediction based on change metrics using hybrid algorithms: An empirical study,
W. Rhmann, B. Pandey, G. A. Ansari, and D. K. Pandey, “Software fault prediction based on change metrics using hybrid algorithms: An empirical study,”J. King Saud Univ. Comput. Inf. Sci., vol. 32, no. 4, pp. 419–424, 2020. [Online]. Available: https://doi.org/10.1016/j.jksuci.2019.03.006
-
[35]
Ownership, experience and defects: a fine- grained study of authorship,
F. Rahman and P. Devanbu, “Ownership, experience and defects: a fine- grained study of authorship,” inProceedings of the 33rd international conference on software engineering, 2011, pp. 491–500
work page 2011
-
[36]
L. Kumar, S. Lal, A. Goyal, and N. L. B. Murthy, “Change-proneness of object-oriented software using combination of feature selection techniques and ensemble learning techniques,” inProceedings of the 12th Innovations on Software Engineering Conference (formerly known as India Software Engineering Conference), ISEC 2019, Pune, India, February 14-16, 2019,...
-
[37]
Dynamic coupling measurement for object-oriented software,
E. Arisholm, L. C. Briand, and A. Føyen, “Dynamic coupling measurement for object-oriented software,”IEEE Trans. Software Eng., vol. 30, no. 8, pp. 491–506, 2004. [Online]. Available: https://doi.org/10.1109/TSE.2004.41
-
[38]
A. G. Koru and J. Tian, “Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products,”IEEE Trans. Software Eng., vol. 31, no. 8, pp. 625–642,
-
[39]
Available: https://doi.org/10.1109/TSE.2005.89
[Online]. Available: https://doi.org/10.1109/TSE.2005.89
-
[40]
Frankenstein: fast and lightweight call graph generation for software builds,
M. Keshani, G. Gousios, and S. Proksch, “Frankenstein: fast and lightweight call graph generation for software builds,”Empir. Softw. Eng., vol. 29, no. 1, p. 1, 2024. [Online]. Available: https://doi.org/10.1007/s10664-023-10388-7
-
[41]
R. Haas, R. N ¨ommer, E. Juergens, and S. Apel, “Optimization of automated and manual software tests in industrial practice: A survey and historical analysis,”IEEE Trans. Software Eng., vol. 50, no. 8, pp. 2005–2020, 2024. [Online]. Available: https: //doi.org/10.1109/TSE.2024.3418191
-
[42]
Robust multi-sensor fusion positioning based on gnss/imu using factor graph optimization,
E. Ahmadi, M. Elsanhoury, K. Selvan, P. V ¨alisuo, and H. Kuusniemi, “Robust multi-sensor fusion positioning based on gnss/imu using factor graph optimization,” in2025 IEEE/ION Position, Location and Naviga- tion Symposium (PLANS). IEEE, 2025, pp. 1247–1256
work page 2025
-
[43]
Robust statistics for outlier detection,
P. J. Rousseeuw and M. Hubert, “Robust statistics for outlier detection,” Wiley interdisciplinary reviews: Data mining and knowledge discovery, vol. 1, no. 1, pp. 73–79, 2011
work page 2011
-
[44]
Defects4j: A database of existing faults to enable controlled testing studies for java programs,
R. Just, D. Jalali, and M. D. Ernst, “Defects4j: A database of existing faults to enable controlled testing studies for java programs,” inPro- ceedings of the 2014 international symposium on software testing and analysis, 2014, pp. 437–440
work page 2014
-
[45]
Pydriller: Python framework for mining software repositories,
D. Spadini, M. Aniche, and A. Bacchelli, “Pydriller: Python framework for mining software repositories,” inProceedings of the 2018 26th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2018, pp. 908– 911
work page 2018
-
[46]
Systematic comparison of six open-source java call graph construction tools,
J. J ´asz, I. Siket, E. Pengo, Z. S ´agodi, and R. Ferenc, “Systematic comparison of six open-source java call graph construction tools,” inProceedings of the 14th International Conference on Software Technologies, ICSOFT 2019, Prague, Czech Republic, July 26-28, 2019, M. van Sinderen and L. A. Maciaszek, Eds. SciTePress, 2019, pp. 117–
work page 2019
-
[47]
Available: https://doi.org/10.5220/0007929201170128
[Online]. Available: https://doi.org/10.5220/0007929201170128
-
[48]
An exact test for population differentia- tion,
M. Raymond and F. Rousset, “An exact test for population differentia- tion,”Evolution, pp. 1280–1283, 1995
work page 1995
-
[49]
A. Arcuri and L. Briand, “A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering,”Software Testing, Verification and Reliability, vol. 24, no. 3, pp. 219–250, 2014
work page 2014
-
[50]
Anonymous, “Mctm replication package,” 2025, accessed: 2025-05-31. [Online]. Available: https://figshare.com/s/8276e5a92bdf39b08d93
work page 2025
-
[51]
E. Freeman and E. Freeman,Head first design patterns - your brain on design patterns. O’Reilly, 2004. [Online]. Available: http://www.oreilly.de/catalog/hfdesignpat/index.html
work page 2004
-
[52]
An evaluation of test suite minimization techniques,
R. Noemmer and R. Haas, “An evaluation of test suite minimization techniques,” inSoftware Quality: Quality Intelligence in Software and Systems Engineering - 12th International Conference, SWQD 2020, Vienna, Austria, January 14-17, 2020, Proceedings, ser. Lecture Notes in Business Information Processing, D. Winkler, S. Biffl, D. M ´endez, and J. Bergsmann...
-
[53]
A large-scale empirical com- parison of static and dynamic test case prioritization techniques,
Q. Luo, K. Moran, and D. Poshyvanyk, “A large-scale empirical com- parison of static and dynamic test case prioritization techniques,” in Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, 2016, pp. 559–570
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.