A Feature-Driven Framework for Software Fault Prediction
Pith reviewed 2026-05-19 22:12 UTC · model grok-4.3
The pith
Combining correlation-based feature selection with genetic algorithm tuning reaches 88.4 percent accuracy for software fault prediction with random forest.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that the combined use of correlation-based feature selection and genetic algorithm tuning on random forest delivers the highest accuracy of 88.40 percent for software fault prediction, an 18 percent gain over models without selection or tuning. Feature selection shrinks the data space and surfaces key attributes including weighted methods per class and coupling between objects, while the tuning step aligns model settings to those reduced sets. The resulting models show low variability of plus or minus 1.0 percent in cross-validation and shorter training times especially with simpler selection methods.
What carries the argument
The pairing of correlation-based feature selection to shrink and rank input features with genetic algorithm tuning to optimize classifier parameters for random forest, logistic regression, and support vector machine models.
If this is right
- Key code metrics such as weighted methods per class and coupling between objects emerge as reliable predictors once feature selection is applied.
- Accuracy rises by up to 18 percent while cross-validation spread stays within plus or minus 1.0 percent.
- Training time shortens notably when simpler selection methods like L1 regularization are used.
- Early fault detection becomes more dependable, supporting better software quality and reduced maintenance effort.
Where Pith is reading between the lines
- The same selection-plus-tuning steps could be tried on related tasks such as estimating defect density if comparable code metrics are available.
- Checking the framework on larger industrial codebases would show whether the accuracy gains persist outside the studied settings.
- Teams might add the highlighted features to coding standards to avoid fault-prone modules from the start.
Load-bearing premise
The performance improvements from these feature selection and tuning steps will hold for software projects and datasets outside the ones examined here.
What would settle it
Apply the same correlation-based feature selection and genetic algorithm process to an independent collection of software modules from different projects and measure whether accuracy remains near 88 percent or drops back to baseline levels.
Figures
read the original abstract
Software fault prediction (SFP) is a critical task in software engineering, enabling early identification of faults in modules to improve software quality and reduce maintenance costs. This research investigates the combined effects of feature selection and parameter tuning on the performance of machine learning (ML) models for SFP. This study evaluates the interaction between feature selection methods, including correlation-based feature selection (CFS), recursive feature elimination (RFE), mutual information (MI), and L1 regularization, where hyperparameter tuning techniques such as grid search, randomized search, and genetic algorithm (GA) are used for optimization of ML algorithms, including random forest (RF), logistic regression (LR), and support vector machines (SVM) for optimized fault prediction performance. The combined application of CFS and GA yielded the highest accuracy, achieving 88.40% with RF, representing an improvement of 18% over baseline models without feature selection or tuning. Feature selection reduced dimensionality and identified critical attributes such as weighted methods per Class (WMC) and coupling between objects (CBO), while iterative parameter tuning optimized model alignment to these feature sets. Notably, the proposed methods demonstrated robustness, with minimal cross-validation variability (+-1.0%), and efficiency, reducing training times in univariate methods such as L1 regularization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a feature-driven framework for software fault prediction that integrates feature selection methods (CFS, RFE, MI, L1 regularization) with hyperparameter tuning techniques (grid search, randomized search, GA) across ML models (RF, LR, SVM). It reports that CFS combined with GA yields the highest accuracy of 88.40% using RF, an 18% improvement over baseline models without feature selection or tuning, while also noting dimensionality reduction, key features like WMC and CBO, low cross-validation variability, and efficiency gains.
Significance. If the reported performance gains are reproducible on representative public datasets with properly specified baselines and statistical validation, the work could provide actionable evidence for combining correlation-based selection with genetic algorithm tuning in SFP pipelines, potentially improving early fault detection in software engineering practice.
major comments (2)
- [Abstract] Abstract: the headline claim of 88.40% accuracy and an 18% improvement over baselines is presented without any dataset identities, sizes, class imbalance ratios, or public availability statements, rendering the central empirical result unverifiable and preventing assessment of whether the gain would hold under standard SFP benchmarks.
- [Abstract] Abstract and results: the baseline comparison ('models without feature selection or tuning') is not specified with exact configurations, default parameter values, or the raw baseline accuracy figures, so it is impossible to determine whether the 18% delta reflects a fair control or an untuned starting point that would be improved by routine tuning alone.
minor comments (1)
- [Abstract] The abstract states 'minimal cross-validation variability (+-1.0%)' without indicating the number of folds, the precise variability metric, or which models and feature-selection combinations were tested.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity and verifiability in the abstract and results. We address each point below and have planned revisions to strengthen the presentation of our empirical findings.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim of 88.40% accuracy and an 18% improvement over baselines is presented without any dataset identities, sizes, class imbalance ratios, or public availability statements, rendering the central empirical result unverifiable and preventing assessment of whether the gain would hold under standard SFP benchmarks.
Authors: We agree that the abstract would be strengthened by including dataset context to support verifiability. The full manuscript details experiments on publicly available standard SFP benchmark datasets, with sizes, imbalance ratios, and availability statements provided in the Experimental Setup section. We will revise the abstract to incorporate these elements, enabling readers to assess the results against established SFP benchmarks. revision: yes
-
Referee: [Abstract] Abstract and results: the baseline comparison ('models without feature selection or tuning') is not specified with exact configurations, default parameter values, or the raw baseline accuracy figures, so it is impossible to determine whether the 18% delta reflects a fair control or an untuned starting point that would be improved by routine tuning alone.
Authors: We acknowledge the importance of clearly defining the baselines for a transparent comparison. The baselines refer to the ML models trained on the complete feature set using default hyperparameter values from the implementation libraries. We will revise the abstract and results sections to specify these default configurations and report the raw baseline accuracy values, clarifying the nature of the 18% improvement. revision: yes
Circularity Check
No significant circularity in empirical ML evaluation
full rationale
The paper is a standard empirical comparison study that runs feature selection methods (CFS, RFE, MI, L1) and hyperparameter optimizers (grid search, random search, GA) on classifiers (RF, LR, SVM) for software fault prediction, then reports measured accuracies such as 88.40% for CFS+GA+RF. No mathematical derivation chain, first-principles result, or self-referential definition exists. The reported 18% lift is a direct experimental delta between the tuned/feature-selected runs and the untuned/no-FS baseline runs on the same data splits; it is not a quantity that is fitted or defined in terms of itself. No uniqueness theorems, ansatzes smuggled via self-citation, or renaming of known results are invoked. The work is therefore self-contained as an experimental report whose central numbers are produced by the described pipeline rather than presupposed by it.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The chosen software project datasets are representative of real-world modules and the accuracy metric reflects practical fault prediction value.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The combined application of CFS and GA yielded the highest accuracy, achieving 88.40% with RF, representing an improvement of 18% over baseline models without feature selection or tuning.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Feature selection reduced dimensionality and identified critical attributes such as weighted methods per Class (WMC) and coupling between objects (CBO)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Software defect association mining and defect correction effort prediction,
Q. Song, M. Shepperd, M. Cartwright, and C. Mair, “Software defect association mining and defect correction effort prediction,”IEEE Trans- actions on software engineering, vol. 32, no. 2, pp. 69–82, 2006
work page 2006
-
[2]
Experimental study on software fault prediction using machine learning model,
T. M. P. Ha, D. H. Tran, L. T. M. Hanh, and N. T. Binh, “Experimental study on software fault prediction using machine learning model,” in 2019 11th international conference on knowledge and systems engineer- ing (KSE). IEEE, 2019, pp. 1–5
work page 2019
-
[3]
Fault prediction for large scale projects using deep learning techniques,
R. T. Selvi and P. Patchaiammal, “Fault prediction for large scale projects using deep learning techniques,” in2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS). IEEE, 2022, pp. 482–489
work page 2022
-
[4]
An empirical study of some software fault prediction techniques for the number of faults prediction,
S. S. Rathore and S. Kumar, “An empirical study of some software fault prediction techniques for the number of faults prediction,”Soft Computing, vol. 21, no. 24, pp. 7417–7434, 2017
work page 2017
-
[5]
A promethee based evaluation of software defect predictors,
R. Jimoh, A. Balogun, A. Bajeh, and S. Ajayi, “A promethee based evaluation of software defect predictors,”Journal of Computer Science and Its Application, vol. 25, no. 1, pp. 106–119, 2018
work page 2018
-
[6]
Predicting depression in older adults: A novel feature selection and neural network framework,
A. Javeed, P. Anderberg, A. N. Ghazi, M. A. Saleem, and J. San- martin Berglund, “Predicting depression in older adults: A novel feature selection and neural network framework,”Neural Processing Letters, vol. 57, no. 3, pp. 1–21, 2025
work page 2025
-
[7]
A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, “Per- formance analysis of feature selection methods in software defect prediction: a search method approach,”applied sciences, vol. 9, no. 13, p. 2764, 2019
work page 2019
-
[8]
Predicting dementia risk factors based on feature selection and neural networks,
A. Javeed, A. L. Dallora, J. Sanmartin Berglund, A. Ali, P. Anderberg, and L. Ali, “Predicting dementia risk factors based on feature selection and neural networks,”Computers, Materials and Continua, vol. 75, no. 2, pp. 2491–2508, 2023
work page 2023
-
[9]
F. Khan, S. Kanwal, S. Alamri, and B. Mumtaz, “Hyper-parameter optimization of classifiers, using an artificial immune network and its application to software bug prediction,”Ieee Access, vol. 8, pp. 20 954– 20 964, 2020
work page 2020
-
[10]
The impact of automated parameter optimization on defect prediction models,
C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “The impact of automated parameter optimization on defect prediction models,”IEEE Transactions on Software Engineering, vol. 45, no. 7, pp. 683–711, 2018
work page 2018
-
[11]
H. Alsghaier and M. Akour, “Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier,”Software: Practice and Experience, vol. 50, no. 4, pp. 407– 427, 2020
work page 2020
-
[12]
F. AlShaikh and W. Elmedany, “Estimate the performance of applying machine learning algorithms to predict defects in software using weka,” in4th Smart Cities Symposium (SCS 2021), vol. 2021. IET, 2021, pp. 189–194
work page 2021
-
[13]
A feature selection-based k-nn model for fast soft- ware defect prediction,
J. B. Awotunde, S. Misra, A. E. Adeniyi, M. K. Abiodun, M. Kaushik, and M. O. Lawrence, “A feature selection-based k-nn model for fast soft- ware defect prediction,” inInternational Conference on Computational Science and Its Applications. Springer, 2022, pp. 49–61
work page 2022
-
[14]
K. Alkharabsheh, S. Alawadi, V . R. Kebande, Y . Crespo, M. Fern ´andez-Delgado, and J. A. Taboada, “A comparison of machine learning algorithms on design smell detection using balanced and imbalanced dataset: A study of god class,”Information and Software Technology, vol. 143, p. 106736, 2022. [Online]. Available: https://www.sciencedirect.com/science/ar...
work page 2022
-
[15]
Software defect prediction using machine learning,
S. A. Ali, N. R. Roy, and D. Raj, “Software defect prediction using machine learning,” in2023 10th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 2023, pp. 639–642
work page 2023
-
[16]
Analysis of feature selection methods in software defect prediction models,
M. Ali, T. Mazhar, T. Shahzad, Y . Y . Ghadi, S. M. Mohsin, S. M. A. Akber, and M. Ali, “Analysis of feature selection methods in software defect prediction models,”IEEE Access, vol. 11, pp. 145 954–145 974, 2023
work page 2023
-
[17]
Comparative performance of supervised learning models for software defect detection,
T. Nandeesh and A. Mehta, “Comparative performance of supervised learning models for software defect detection,” in2024 International Conference on Information Science and Communications Technologies (ICISCT). IEEE, 2024, pp. 19–24
work page 2024
-
[18]
The state of machine learning methodology in software fault prediction,
T. Hall and D. Bowes, “The state of machine learning methodology in software fault prediction,” in2012 11th international conference on machine learning and applications, vol. 2. IEEE, 2012, pp. 308–313
work page 2012
-
[19]
Cross-validation: what does it estimate and how well does it do it?
S. Bates, T. Hastie, and R. Tibshirani, “Cross-validation: what does it estimate and how well does it do it?”Journal of the American Statistical Association, vol. 119, no. 546, pp. 1434–1445, 2024
work page 2024
-
[20]
Ensemble feature ranking approach for software fault prediction,
B. Agrawalla, A. K. Shukla, D. Tripathi, K. K. Singh, and B. Ra- machandra Reddy, “Ensemble feature ranking approach for software fault prediction,”Journal of Intelligent & Fuzzy Systems, pp. JIFS– 219 431, 2024
work page 2024
-
[21]
A. Javeed, P. Anderberg, M. A. Saleem, A. N. Ghazi, and J. San- martin Berglund, “Unveiling cancer: A data-driven approach for early identification and prediction using f-rus-rf model,”International Journal of Imaging Systems and Technology, vol. 34, no. 6, p. e23221, 2024
work page 2024
-
[22]
S. Karim, H. L. H. S. Warnars, F. L. Gaol, E. Abdurachman, B. Soewito et al., “Software metrics for fault prediction using machine learning approaches: A literature review with promise repository dataset,” in 2017 IEEE international conference on cybernetics and computational intelligence (CyberneticsCom). IEEE, 2017, pp. 19–23
work page 2017
-
[23]
S. S. Rathore and S. Kumar, “Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study,”Applied Intelligence, vol. 51, no. 12, pp. 8945–8960, 2021
work page 2021
-
[24]
Prediction of software failures through logistic regression,
A. M. Salem, K. Rekab, and J. A. Whittaker, “Prediction of software failures through logistic regression,”Information and Software Technol- ogy, vol. 46, no. 12, pp. 781–789, 2004
work page 2004
-
[25]
A comparative analysis of svm and elm classification on software reliability prediction model,
S. K. Rath, M. Sahu, S. P. Das, S. K. Bisoy, and M. Sain, “A comparative analysis of svm and elm classification on software reliability prediction model,”Electronics, vol. 11, no. 17, p. 2707, 2022
work page 2022
-
[26]
Automated parameter optimization of classification techniques for defect prediction models,
C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “Automated parameter optimization of classification techniques for defect prediction models,” inProceedings of the 38th international conference on software engineering, 2016, pp. 321–332
work page 2016
-
[27]
Tuning for software analytics: Is it really necessary?
W. Fu, T. Menzies, and X. Shen, “Tuning for software analytics: Is it really necessary?”Information and Software Technology, vol. 76, pp. 135–146, 2016
work page 2016
-
[28]
Machine learning-based test case prioritization using hyperparameter optimization,
M. A. Khan, A. Azim, R. Liscano, K. Smith, Y .-K. Chang, Q. Tauseef, and G. Seferi, “Machine learning-based test case prioritization using hyperparameter optimization,” inProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024), 2024, pp. 125–135
work page 2024
-
[29]
Localization for v2x communication with noisy distance measurement,
I. Javed, X. Tang, M. A. Saleem, A. Javed, M. A. Zia, and I. A. Shoukat, “Localization for v2x communication with noisy distance measurement,” International Journal of Intelligent Networks, vol. 4, pp. 355–360, 2023
work page 2023
-
[30]
N. Medeshetty, A. N. Ghazi, S. Alawadi, and F. Alkhabbas, “From requirements to test cases: An nlp-based approach for high-performance ecu test case automation,” in2025 IEEE 5th International Conference on Human-Machine Systems (ICHMS). IEEE, 2025, pp. 122–127
work page 2025
-
[31]
Generative oversampling methods for handling imbalanced data in software fault prediction,
S. S. Rathore, S. S. Chouhan, D. K. Jain, and A. G. Vachhani, “Generative oversampling methods for handling imbalanced data in software fault prediction,”IEEE Transactions on Reliability, vol. 71, no. 2, pp. 747–762, 2022
work page 2022
-
[32]
A. Javed, S. A. Ramay, T. Abbas, A. Javeed, S. Saeed, and W. Ak- bar, “Optimizing mortality prediction in cardiac patients using genetic algorithm and random forest with class imbalance handling,”Dialogue Social Science Review (DSSR), vol. 2, no. 3 (October), pp. 162–175, 2024
work page 2024
-
[33]
S. Kaliraj, A. Kishoore, and V . Sivakumar, “Software fault prediction using cross-project analysis: a study on class imbalance and model generalization,”IEEE Access, vol. 12, pp. 64 212–64 227, 2024
work page 2024
-
[34]
Leveraging fault locali- sation to enhance defect prediction,
J. Sohn, Y . Kamei, S. McIntosh, and S. Yoo, “Leveraging fault locali- sation to enhance defect prediction,” in2021 IEEE International Con- ference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2021, pp. 284–294
work page 2021
-
[35]
Improving cloud efficiency: A machine learning-based stacking model for cpu utilization prediction,
A. Javeed, A. Borg, H. Grahn, L. Lundberg, D. Patel, and S. Shirinbab, “Improving cloud efficiency: A machine learning-based stacking model for cpu utilization prediction,” in2025 8th International Conference on Data Science and Machine Learning Applications (CDMA). IEEE, 2025, pp. 120–125
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.