pith. sign in

arxiv: 2605.17611 · v1 · pith:IK4XATJ4new · submitted 2026-05-17 · 💻 cs.SE · cs.LG

A Feature-Driven Framework for Software Fault Prediction

Pith reviewed 2026-05-19 22:12 UTC · model grok-4.3

classification 💻 cs.SE cs.LG
keywords software fault predictionfeature selectionhyperparameter tuningmachine learningrandom forestcorrelation-based feature selectiongenetic algorithmsoftware quality
0
0 comments X

The pith

Combining correlation-based feature selection with genetic algorithm tuning reaches 88.4 percent accuracy for software fault prediction with random forest.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how feature selection methods such as correlation-based feature selection interact with hyperparameter tuning methods like genetic algorithms when applied to machine learning models for predicting faults in software modules. It reports that the strongest combination produces 88.40 percent accuracy on random forest, an 18 percent lift over untuned baselines, while also lowering the number of input features and keeping cross-validation results stable. A sympathetic reader would care because catching faults earlier can cut later maintenance costs and raise overall software quality. The work shows that attributes such as weighted methods per class and coupling between objects stand out after selection and tuning.

Core claim

The authors establish that the combined use of correlation-based feature selection and genetic algorithm tuning on random forest delivers the highest accuracy of 88.40 percent for software fault prediction, an 18 percent gain over models without selection or tuning. Feature selection shrinks the data space and surfaces key attributes including weighted methods per class and coupling between objects, while the tuning step aligns model settings to those reduced sets. The resulting models show low variability of plus or minus 1.0 percent in cross-validation and shorter training times especially with simpler selection methods.

What carries the argument

The pairing of correlation-based feature selection to shrink and rank input features with genetic algorithm tuning to optimize classifier parameters for random forest, logistic regression, and support vector machine models.

If this is right

  • Key code metrics such as weighted methods per class and coupling between objects emerge as reliable predictors once feature selection is applied.
  • Accuracy rises by up to 18 percent while cross-validation spread stays within plus or minus 1.0 percent.
  • Training time shortens notably when simpler selection methods like L1 regularization are used.
  • Early fault detection becomes more dependable, supporting better software quality and reduced maintenance effort.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same selection-plus-tuning steps could be tried on related tasks such as estimating defect density if comparable code metrics are available.
  • Checking the framework on larger industrial codebases would show whether the accuracy gains persist outside the studied settings.
  • Teams might add the highlighted features to coding standards to avoid fault-prone modules from the start.

Load-bearing premise

The performance improvements from these feature selection and tuning steps will hold for software projects and datasets outside the ones examined here.

What would settle it

Apply the same correlation-based feature selection and genetic algorithm process to an independent collection of software modules from different projects and measure whether accuracy remains near 88 percent or drops back to baseline levels.

Figures

Figures reproduced from arXiv: 2605.17611 by Ahmad Nauman Ghazi, Ashir Javeed, Fahed Alkhabbas, Khalid AlKharabsheh, Nagajyothi Devarapalli, Sadi Alawadi.

Figure 1
Figure 1. Figure 1: Working of the proposed framework D. Machine learning model 1) Random Forest: An effective method for determining whether or not a particular software module is defective is Random Forest (RF). For tasks involving binary classification, it is especially helpful [20]. Because it uses an ensemble of several decision trees, it can handle high-dimensional data and resists overfitting, which is one of its main … view at source ↗
Figure 4
Figure 4. Figure 4: Over-fitting analysis of ML models of three machine learning models, namely RF, LR, and SVM. For all models, the training accuracy is higher than the corresponding testing accuracy. The highest training accuracy (81.52%) and testing accuracy (70.21%) are achieved by RF, followed by SVM (75.24% training, 63.05% testing) and LR (73.67% training, 60.47% testing). This comparison highlights the difference in m… view at source ↗
Figure 2
Figure 2. Figure 2: ML models performance comparison based on F1 score [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Optimized ML models performance comparison based on accuracy [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Software fault prediction (SFP) is a critical task in software engineering, enabling early identification of faults in modules to improve software quality and reduce maintenance costs. This research investigates the combined effects of feature selection and parameter tuning on the performance of machine learning (ML) models for SFP. This study evaluates the interaction between feature selection methods, including correlation-based feature selection (CFS), recursive feature elimination (RFE), mutual information (MI), and L1 regularization, where hyperparameter tuning techniques such as grid search, randomized search, and genetic algorithm (GA) are used for optimization of ML algorithms, including random forest (RF), logistic regression (LR), and support vector machines (SVM) for optimized fault prediction performance. The combined application of CFS and GA yielded the highest accuracy, achieving 88.40% with RF, representing an improvement of 18% over baseline models without feature selection or tuning. Feature selection reduced dimensionality and identified critical attributes such as weighted methods per Class (WMC) and coupling between objects (CBO), while iterative parameter tuning optimized model alignment to these feature sets. Notably, the proposed methods demonstrated robustness, with minimal cross-validation variability (+-1.0%), and efficiency, reducing training times in univariate methods such as L1 regularization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a feature-driven framework for software fault prediction that integrates feature selection methods (CFS, RFE, MI, L1 regularization) with hyperparameter tuning techniques (grid search, randomized search, GA) across ML models (RF, LR, SVM). It reports that CFS combined with GA yields the highest accuracy of 88.40% using RF, an 18% improvement over baseline models without feature selection or tuning, while also noting dimensionality reduction, key features like WMC and CBO, low cross-validation variability, and efficiency gains.

Significance. If the reported performance gains are reproducible on representative public datasets with properly specified baselines and statistical validation, the work could provide actionable evidence for combining correlation-based selection with genetic algorithm tuning in SFP pipelines, potentially improving early fault detection in software engineering practice.

major comments (2)
  1. [Abstract] Abstract: the headline claim of 88.40% accuracy and an 18% improvement over baselines is presented without any dataset identities, sizes, class imbalance ratios, or public availability statements, rendering the central empirical result unverifiable and preventing assessment of whether the gain would hold under standard SFP benchmarks.
  2. [Abstract] Abstract and results: the baseline comparison ('models without feature selection or tuning') is not specified with exact configurations, default parameter values, or the raw baseline accuracy figures, so it is impossible to determine whether the 18% delta reflects a fair control or an untuned starting point that would be improved by routine tuning alone.
minor comments (1)
  1. [Abstract] The abstract states 'minimal cross-validation variability (+-1.0%)' without indicating the number of folds, the precise variability metric, or which models and feature-selection combinations were tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity and verifiability in the abstract and results. We address each point below and have planned revisions to strengthen the presentation of our empirical findings.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim of 88.40% accuracy and an 18% improvement over baselines is presented without any dataset identities, sizes, class imbalance ratios, or public availability statements, rendering the central empirical result unverifiable and preventing assessment of whether the gain would hold under standard SFP benchmarks.

    Authors: We agree that the abstract would be strengthened by including dataset context to support verifiability. The full manuscript details experiments on publicly available standard SFP benchmark datasets, with sizes, imbalance ratios, and availability statements provided in the Experimental Setup section. We will revise the abstract to incorporate these elements, enabling readers to assess the results against established SFP benchmarks. revision: yes

  2. Referee: [Abstract] Abstract and results: the baseline comparison ('models without feature selection or tuning') is not specified with exact configurations, default parameter values, or the raw baseline accuracy figures, so it is impossible to determine whether the 18% delta reflects a fair control or an untuned starting point that would be improved by routine tuning alone.

    Authors: We acknowledge the importance of clearly defining the baselines for a transparent comparison. The baselines refer to the ML models trained on the complete feature set using default hyperparameter values from the implementation libraries. We will revise the abstract and results sections to specify these default configurations and report the raw baseline accuracy values, clarifying the nature of the 18% improvement. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical ML evaluation

full rationale

The paper is a standard empirical comparison study that runs feature selection methods (CFS, RFE, MI, L1) and hyperparameter optimizers (grid search, random search, GA) on classifiers (RF, LR, SVM) for software fault prediction, then reports measured accuracies such as 88.40% for CFS+GA+RF. No mathematical derivation chain, first-principles result, or self-referential definition exists. The reported 18% lift is a direct experimental delta between the tuned/feature-selected runs and the untuned/no-FS baseline runs on the same data splits; it is not a quantity that is fitted or defined in terms of itself. No uniqueness theorems, ansatzes smuggled via self-citation, or renaming of known results are invoked. The work is therefore self-contained as an experimental report whose central numbers are produced by the described pipeline rather than presupposed by it.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the (unspecified) software datasets and the assumption that standard ML metrics like accuracy adequately capture fault prediction utility.

axioms (1)
  • domain assumption The chosen software project datasets are representative of real-world modules and the accuracy metric reflects practical fault prediction value.
    Implicit in reporting 88.40% accuracy and 18% improvement without dataset details.

pith-pipeline@v0.9.0 · 5780 in / 1204 out tokens · 38325 ms · 2026-05-19T22:12:38.119559+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Software defect association mining and defect correction effort prediction,

    Q. Song, M. Shepperd, M. Cartwright, and C. Mair, “Software defect association mining and defect correction effort prediction,”IEEE Trans- actions on software engineering, vol. 32, no. 2, pp. 69–82, 2006

  2. [2]

    Experimental study on software fault prediction using machine learning model,

    T. M. P. Ha, D. H. Tran, L. T. M. Hanh, and N. T. Binh, “Experimental study on software fault prediction using machine learning model,” in 2019 11th international conference on knowledge and systems engineer- ing (KSE). IEEE, 2019, pp. 1–5

  3. [3]

    Fault prediction for large scale projects using deep learning techniques,

    R. T. Selvi and P. Patchaiammal, “Fault prediction for large scale projects using deep learning techniques,” in2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS). IEEE, 2022, pp. 482–489

  4. [4]

    An empirical study of some software fault prediction techniques for the number of faults prediction,

    S. S. Rathore and S. Kumar, “An empirical study of some software fault prediction techniques for the number of faults prediction,”Soft Computing, vol. 21, no. 24, pp. 7417–7434, 2017

  5. [5]

    A promethee based evaluation of software defect predictors,

    R. Jimoh, A. Balogun, A. Bajeh, and S. Ajayi, “A promethee based evaluation of software defect predictors,”Journal of Computer Science and Its Application, vol. 25, no. 1, pp. 106–119, 2018

  6. [6]

    Predicting depression in older adults: A novel feature selection and neural network framework,

    A. Javeed, P. Anderberg, A. N. Ghazi, M. A. Saleem, and J. San- martin Berglund, “Predicting depression in older adults: A novel feature selection and neural network framework,”Neural Processing Letters, vol. 57, no. 3, pp. 1–21, 2025

  7. [7]

    Per- formance analysis of feature selection methods in software defect prediction: a search method approach,

    A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, “Per- formance analysis of feature selection methods in software defect prediction: a search method approach,”applied sciences, vol. 9, no. 13, p. 2764, 2019

  8. [8]

    Predicting dementia risk factors based on feature selection and neural networks,

    A. Javeed, A. L. Dallora, J. Sanmartin Berglund, A. Ali, P. Anderberg, and L. Ali, “Predicting dementia risk factors based on feature selection and neural networks,”Computers, Materials and Continua, vol. 75, no. 2, pp. 2491–2508, 2023

  9. [9]

    Hyper-parameter optimization of classifiers, using an artificial immune network and its application to software bug prediction,

    F. Khan, S. Kanwal, S. Alamri, and B. Mumtaz, “Hyper-parameter optimization of classifiers, using an artificial immune network and its application to software bug prediction,”Ieee Access, vol. 8, pp. 20 954– 20 964, 2020

  10. [10]

    The impact of automated parameter optimization on defect prediction models,

    C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “The impact of automated parameter optimization on defect prediction models,”IEEE Transactions on Software Engineering, vol. 45, no. 7, pp. 683–711, 2018

  11. [11]

    Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier,

    H. Alsghaier and M. Akour, “Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier,”Software: Practice and Experience, vol. 50, no. 4, pp. 407– 427, 2020

  12. [12]

    Estimate the performance of applying machine learning algorithms to predict defects in software using weka,

    F. AlShaikh and W. Elmedany, “Estimate the performance of applying machine learning algorithms to predict defects in software using weka,” in4th Smart Cities Symposium (SCS 2021), vol. 2021. IET, 2021, pp. 189–194

  13. [13]

    A feature selection-based k-nn model for fast soft- ware defect prediction,

    J. B. Awotunde, S. Misra, A. E. Adeniyi, M. K. Abiodun, M. Kaushik, and M. O. Lawrence, “A feature selection-based k-nn model for fast soft- ware defect prediction,” inInternational Conference on Computational Science and Its Applications. Springer, 2022, pp. 49–61

  14. [14]

    A comparison of machine learning algorithms on design smell detection using balanced and imbalanced dataset: A study of god class,

    K. Alkharabsheh, S. Alawadi, V . R. Kebande, Y . Crespo, M. Fern ´andez-Delgado, and J. A. Taboada, “A comparison of machine learning algorithms on design smell detection using balanced and imbalanced dataset: A study of god class,”Information and Software Technology, vol. 143, p. 106736, 2022. [Online]. Available: https://www.sciencedirect.com/science/ar...

  15. [15]

    Software defect prediction using machine learning,

    S. A. Ali, N. R. Roy, and D. Raj, “Software defect prediction using machine learning,” in2023 10th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 2023, pp. 639–642

  16. [16]

    Analysis of feature selection methods in software defect prediction models,

    M. Ali, T. Mazhar, T. Shahzad, Y . Y . Ghadi, S. M. Mohsin, S. M. A. Akber, and M. Ali, “Analysis of feature selection methods in software defect prediction models,”IEEE Access, vol. 11, pp. 145 954–145 974, 2023

  17. [17]

    Comparative performance of supervised learning models for software defect detection,

    T. Nandeesh and A. Mehta, “Comparative performance of supervised learning models for software defect detection,” in2024 International Conference on Information Science and Communications Technologies (ICISCT). IEEE, 2024, pp. 19–24

  18. [18]

    The state of machine learning methodology in software fault prediction,

    T. Hall and D. Bowes, “The state of machine learning methodology in software fault prediction,” in2012 11th international conference on machine learning and applications, vol. 2. IEEE, 2012, pp. 308–313

  19. [19]

    Cross-validation: what does it estimate and how well does it do it?

    S. Bates, T. Hastie, and R. Tibshirani, “Cross-validation: what does it estimate and how well does it do it?”Journal of the American Statistical Association, vol. 119, no. 546, pp. 1434–1445, 2024

  20. [20]

    Ensemble feature ranking approach for software fault prediction,

    B. Agrawalla, A. K. Shukla, D. Tripathi, K. K. Singh, and B. Ra- machandra Reddy, “Ensemble feature ranking approach for software fault prediction,”Journal of Intelligent & Fuzzy Systems, pp. JIFS– 219 431, 2024

  21. [21]

    Unveiling cancer: A data-driven approach for early identification and prediction using f-rus-rf model,

    A. Javeed, P. Anderberg, M. A. Saleem, A. N. Ghazi, and J. San- martin Berglund, “Unveiling cancer: A data-driven approach for early identification and prediction using f-rus-rf model,”International Journal of Imaging Systems and Technology, vol. 34, no. 6, p. e23221, 2024

  22. [22]

    Software metrics for fault prediction using machine learning approaches: A literature review with promise repository dataset,

    S. Karim, H. L. H. S. Warnars, F. L. Gaol, E. Abdurachman, B. Soewito et al., “Software metrics for fault prediction using machine learning approaches: A literature review with promise repository dataset,” in 2017 IEEE international conference on cybernetics and computational intelligence (CyberneticsCom). IEEE, 2017, pp. 19–23

  23. [23]

    Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study,

    S. S. Rathore and S. Kumar, “Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study,”Applied Intelligence, vol. 51, no. 12, pp. 8945–8960, 2021

  24. [24]

    Prediction of software failures through logistic regression,

    A. M. Salem, K. Rekab, and J. A. Whittaker, “Prediction of software failures through logistic regression,”Information and Software Technol- ogy, vol. 46, no. 12, pp. 781–789, 2004

  25. [25]

    A comparative analysis of svm and elm classification on software reliability prediction model,

    S. K. Rath, M. Sahu, S. P. Das, S. K. Bisoy, and M. Sain, “A comparative analysis of svm and elm classification on software reliability prediction model,”Electronics, vol. 11, no. 17, p. 2707, 2022

  26. [26]

    Automated parameter optimization of classification techniques for defect prediction models,

    C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “Automated parameter optimization of classification techniques for defect prediction models,” inProceedings of the 38th international conference on software engineering, 2016, pp. 321–332

  27. [27]

    Tuning for software analytics: Is it really necessary?

    W. Fu, T. Menzies, and X. Shen, “Tuning for software analytics: Is it really necessary?”Information and Software Technology, vol. 76, pp. 135–146, 2016

  28. [28]

    Machine learning-based test case prioritization using hyperparameter optimization,

    M. A. Khan, A. Azim, R. Liscano, K. Smith, Y .-K. Chang, Q. Tauseef, and G. Seferi, “Machine learning-based test case prioritization using hyperparameter optimization,” inProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024), 2024, pp. 125–135

  29. [29]

    Localization for v2x communication with noisy distance measurement,

    I. Javed, X. Tang, M. A. Saleem, A. Javed, M. A. Zia, and I. A. Shoukat, “Localization for v2x communication with noisy distance measurement,” International Journal of Intelligent Networks, vol. 4, pp. 355–360, 2023

  30. [30]

    From requirements to test cases: An nlp-based approach for high-performance ecu test case automation,

    N. Medeshetty, A. N. Ghazi, S. Alawadi, and F. Alkhabbas, “From requirements to test cases: An nlp-based approach for high-performance ecu test case automation,” in2025 IEEE 5th International Conference on Human-Machine Systems (ICHMS). IEEE, 2025, pp. 122–127

  31. [31]

    Generative oversampling methods for handling imbalanced data in software fault prediction,

    S. S. Rathore, S. S. Chouhan, D. K. Jain, and A. G. Vachhani, “Generative oversampling methods for handling imbalanced data in software fault prediction,”IEEE Transactions on Reliability, vol. 71, no. 2, pp. 747–762, 2022

  32. [32]

    Optimizing mortality prediction in cardiac patients using genetic algorithm and random forest with class imbalance handling,

    A. Javed, S. A. Ramay, T. Abbas, A. Javeed, S. Saeed, and W. Ak- bar, “Optimizing mortality prediction in cardiac patients using genetic algorithm and random forest with class imbalance handling,”Dialogue Social Science Review (DSSR), vol. 2, no. 3 (October), pp. 162–175, 2024

  33. [33]

    Software fault prediction using cross-project analysis: a study on class imbalance and model generalization,

    S. Kaliraj, A. Kishoore, and V . Sivakumar, “Software fault prediction using cross-project analysis: a study on class imbalance and model generalization,”IEEE Access, vol. 12, pp. 64 212–64 227, 2024

  34. [34]

    Leveraging fault locali- sation to enhance defect prediction,

    J. Sohn, Y . Kamei, S. McIntosh, and S. Yoo, “Leveraging fault locali- sation to enhance defect prediction,” in2021 IEEE International Con- ference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2021, pp. 284–294

  35. [35]

    Improving cloud efficiency: A machine learning-based stacking model for cpu utilization prediction,

    A. Javeed, A. Borg, H. Grahn, L. Lundberg, D. Patel, and S. Shirinbab, “Improving cloud efficiency: A machine learning-based stacking model for cpu utilization prediction,” in2025 8th International Conference on Data Science and Machine Learning Applications (CDMA). IEEE, 2025, pp. 120–125