{"paper":{"title":"A Feature-Driven Framework for Software Fault Prediction","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Combining correlation-based feature selection with genetic algorithm tuning reaches 88.4 percent accuracy for software fault prediction with random forest.","cross_cats":["cs.LG"],"primary_cat":"cs.SE","authors_text":"Ahmad Nauman Ghazi, Ashir Javeed, Fahed Alkhabbas, Khalid AlKharabsheh, Nagajyothi Devarapalli, Sadi Alawadi","submitted_at":"2026-05-17T19:16:20Z","abstract_excerpt":"Software fault prediction (SFP) is a critical task in software engineering, enabling early identification of faults in modules to improve software quality and reduce maintenance costs. This research investigates the combined effects of feature selection and parameter tuning on the performance of machine learning (ML) models for SFP. This study evaluates the interaction between feature selection methods, including correlation-based feature selection (CFS), recursive feature elimination (RFE), mutual information (MI), and L1 regularization, where hyperparameter tuning techniques such as grid sea"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The combined application of CFS and GA yielded the highest accuracy, achieving 88.40% with RF, representing an improvement of 18% over baseline models without feature selection or tuning.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the performance improvements generalize beyond the specific (unspecified) datasets and that the baseline models without feature selection or tuning provide a fair and representative comparison.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Combining correlation-based feature selection with genetic algorithm tuning on random forest achieves 88.40% accuracy for software fault prediction, an 18% gain over baselines without selection or tuning.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Combining correlation-based feature selection with genetic algorithm tuning reaches 88.4 percent accuracy for software fault prediction with random forest.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"09978a6113dd54990720705c85bcb6bc3c458517f01c7cd8f9e23b23d0789f88"},"source":{"id":"2605.17611","kind":"arxiv","version":1},"verdict":{"id":"bd2b69c3-a514-4d7a-b029-ff4cfb3a97ab","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T22:12:38.119559Z","strongest_claim":"The combined application of CFS and GA yielded the highest accuracy, achieving 88.40% with RF, representing an improvement of 18% over baseline models without feature selection or tuning.","one_line_summary":"Combining correlation-based feature selection with genetic algorithm tuning on random forest achieves 88.40% accuracy for software fault prediction, an 18% gain over baselines without selection or tuning.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the performance improvements generalize beyond the specific (unspecified) datasets and that the baseline models without feature selection or tuning provide a fair and representative comparison.","pith_extraction_headline":"Combining correlation-based feature selection with genetic algorithm tuning reaches 88.4 percent accuracy for software fault prediction with random forest."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.17611/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"cited_work_retraction","ran_at":"2026-05-19T22:53:08.808970Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T22:31:19.530262Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T22:21:36.119565Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T21:33:23.568907Z","status":"skipped","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T21:21:57.496482Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"03ceb264d97b250be634b31aff37cf27846d9311a28726d9b79c356ed7bbd2f8"},"references":{"count":35,"sample":[{"doi":"","year":2006,"title":"Software defect association mining and defect correction effort prediction,","work_id":"5533f335-4c3f-499c-aaa0-aa2f1db406b7","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"Experimental study on software fault prediction using machine learning model,","work_id":"80391f17-d6c9-4c87-a90b-b004603a4158","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Fault prediction for large scale projects using deep learning techniques,","work_id":"feb0e8ab-9877-4bb1-a512-fab109cf1b4f","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"An empirical study of some software fault prediction techniques for the number of faults prediction,","work_id":"7e0d6a17-7772-4804-bb6c-6347575e9f2e","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"A promethee based evaluation of software defect predictors,","work_id":"bbcc0216-9502-4f66-ba6e-8768fc4787fb","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":35,"snapshot_sha256":"aa5590d30d7106f92aacca314cf8ab6dc9159f5f9772b276c6600f22ab502f9a","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"42fedd223f19239588cfead58c4ca62d1716e0c045527614316c02bb7e8c5a2f"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}