SecureScan: An AI-Driven Multi-Layer Framework for Malware and Phishing Detection Using Logistic Regression and Threat Intelligence Integration
Pith reviewed 2026-05-16 05:45 UTC · model grok-4.3
The pith
SecureScan detects malware and phishing at 93.1 percent accuracy by layering logistic regression with heuristics and external threat checks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SecureScan is a triple-layer detection framework that integrates logistic regression-based classification, heuristic analysis, and external threat intelligence via the VirusTotal API for comprehensive triage of URLs, file hashes, and binaries. On benchmark datasets it reaches 93.1 percent accuracy with precision 0.87 and recall 0.92, using threshold-based decision calibration and gray-zone logic to minimize false positives and demonstrate strong generalization with reduced overfitting.
What carries the argument
The triple-layer architecture that filters known threats through heuristics, classifies uncertain samples with logistic regression, and validates borderline cases with VirusTotal intelligence.
Load-bearing premise
The chosen benchmark datasets represent real-world, evolving malware and phishing threats and the VirusTotal API supplies reliable, unbiased labels for borderline cases.
What would settle it
Running SecureScan on a fresh collection of recently emerged malware and phishing samples absent from the original benchmarks and from VirusTotal at training time, then observing accuracy fall substantially below 90 percent with degraded precision-recall balance.
Figures
read the original abstract
The growing sophistication of modern malware and phishing campaigns has diminished the effectiveness of traditional signature-based intrusion detection systems. This work presents SecureScan, an AI-driven, triple-layer detection framework that integrates logistic regression-based classification, heuristic analysis, and external threat intelligence via the VirusTotal API for comprehensive triage of URLs, file hashes, and binaries. The proposed architecture prioritizes efficiency by filtering known threats through heuristics, classifying uncertain samples using machine learning, and validating borderline cases with third-party intelligence. On benchmark datasets, SecureScan achieves 93.1 percent accuracy with balanced precision (0.87) and recall (0.92), demonstrating strong generalization and reduced overfitting through threshold-based decision calibration. A calibrated threshold and gray-zone logic (0.45-0.55) were introduced to minimize false positives and enhance real-world stability. Experimental results indicate that a lightweight statistical model, when augmented with calibrated verification and external intelligence, can achieve reliability and performance comparable to more complex deep learning systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SecureScan, a triple-layer detection framework that applies heuristic filtering for known threats, logistic regression for classifying uncertain samples, and VirusTotal API integration for validating borderline cases involving URLs, file hashes, and binaries. It claims 93.1% accuracy with 0.87 precision and 0.92 recall on benchmark datasets, attributing the results to calibrated gray-zone thresholds (0.45-0.55) that reduce false positives and overfitting.
Significance. If the performance claims can be substantiated with proper experimental controls, the work would show that a lightweight logistic regression model augmented by external threat intelligence can reach reliability comparable to deep learning systems, offering a practical efficiency advantage for real-time malware and phishing triage.
major comments (3)
- [Abstract] Abstract: the reported 93.1% accuracy, 0.87 precision, and 0.92 recall are presented without any information on benchmark dataset identities, sizes, temporal coverage, train-test splits, feature engineering, or statistical significance tests, leaving the central generalization claim unsupported.
- [Abstract] Abstract and architecture description: the logistic regression coefficients and gray-zone thresholds (0.45-0.55) appear to be fitted and calibrated on the same benchmark data used for final evaluation, creating circularity that prevents assessment of true generalization to unseen or evolving threats.
- [Abstract] Abstract: the claim that the framework achieves 'reliability and performance comparable to more complex deep learning systems' is not supported by any direct comparison, baseline results, or ablation study showing the contribution of each layer.
minor comments (1)
- [Abstract] The abstract refers to 'strong generalization and reduced overfitting' without specifying how overfitting was quantified (e.g., via cross-validation scores or learning curves).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions planned for the next version.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported 93.1% accuracy, 0.87 precision, and 0.92 recall are presented without any information on benchmark dataset identities, sizes, temporal coverage, train-test splits, feature engineering, or statistical significance tests, leaving the central generalization claim unsupported.
Authors: We agree that the abstract and results section lack sufficient experimental details. The revised manuscript will expand the abstract and add a dedicated experimental setup subsection specifying the benchmark dataset identities, sizes, temporal coverage, train-test splits (including ratios and stratification), feature engineering process, and statistical significance tests (e.g., McNemar's test or bootstrap confidence intervals) to substantiate the generalization claims. revision: yes
-
Referee: [Abstract] Abstract and architecture description: the logistic regression coefficients and gray-zone thresholds (0.45-0.55) appear to be fitted and calibrated on the same benchmark data used for final evaluation, creating circularity that prevents assessment of true generalization to unseen or evolving threats.
Authors: We acknowledge the risk of circularity in the current description. The revised manuscript will explicitly state that coefficients were learned on a training partition, thresholds were tuned on a held-out validation set, and final metrics were computed on a disjoint test set. We will also add discussion of temporal splits or cross-validation to address generalization to evolving threats. revision: yes
-
Referee: [Abstract] Abstract: the claim that the framework achieves 'reliability and performance comparable to more complex deep learning systems' is not supported by any direct comparison, baseline results, or ablation study showing the contribution of each layer.
Authors: We accept that the comparability claim requires supporting evidence. The revised version will include baseline comparisons against standard ML models (e.g., random forest, SVM), an ablation analysis quantifying each layer's contribution, and references to published deep learning results on similar malware/phishing benchmarks to contextualize the performance. revision: yes
Circularity Check
Performance metrics obtained by fitting logistic regression and calibrating thresholds on the same benchmark data used for reporting
specific steps
-
fitted input called prediction
[Abstract]
"On benchmark datasets, SecureScan achieves 93.1 percent accuracy with balanced precision (0.87) and recall (0.92), demonstrating strong generalization and reduced overfitting through threshold-based decision calibration. A calibrated threshold and gray-zone logic (0.45-0.55) were introduced to minimize false positives and enhance real-world stability."
The logistic regression parameters are fitted to the benchmark datasets and the decision thresholds are calibrated on the identical data; therefore the quoted accuracy, precision and recall numbers are direct results of that fitting step rather than predictions on independent held-out samples.
full rationale
The paper's central claim of 93.1% accuracy, 0.87 precision and 0.92 recall rests on a logistic regression classifier whose parameters are fitted directly to the benchmark datasets, with the gray-zone thresholds (0.45-0.55) also tuned on those same data. No train/test split, temporal hold-out, or external corpus is described, so the reported figures are outputs of the fitting process rather than independent predictions. This matches the fitted-input-called-prediction pattern and produces a moderate circularity score; the architecture description itself contains no further self-referential equations or self-citations that would raise the score higher.
Axiom & Free-Parameter Ledger
free parameters (2)
- logistic_regression_coefficients
- gray_zone_thresholds
axioms (1)
- domain assumption Benchmark datasets are representative of real-world threats
Reference graph
Works this paper leans on
-
[1]
doi: 10.1109/ACCESS.2022.3220184
-
[2]
B. Ajayi, B. Barakat, and K. McGarry, “Leveraging VAE- derived latent spaces for enhanced malware detection with machine learning classifiers,” arXiv preprint , arXiv:2501.04236, 2025
-
[3]
Phishing website detection using URL -based machine learning and hybrid features,
R. M. Mohammad, F. Thabtah, and L. McCluskey, “Phishing website detection using URL -based machine learning and hybrid features,” IEEE Transactions on Information Forensics and Security , vol. 18, pp. 4529 – 4542, 2023. doi: 10.1109/TIFS.2023.3241057
-
[4]
R. Kumar and R. Patel, “PhishInt: Hybrid phishing URL detection using lexical, content -based and external intelligence features,” Expert Systems with Applications , vol. 228, p. 120386, 2023. doi: 10.1016/j.eswa.2023.120386
-
[5]
A hybrid deep learning approach for network intrusion detection using CNN – LSTM architecture,
S. K. Sahu and A. K. Sahu, “A hybrid deep learning approach for network intrusion detection using CNN – LSTM architecture,” Computers & Security, vol. 139, p. 103727, 2024. doi: 10.1016/j.cose.2023.103727
-
[6]
Hybrid intelligent intrusion detection using feature fusion and ensemble learning,
V. Kumar, A. Singh, and S. Ghosh, “Hybrid intelligent intrusion detection using feature fusion and ensemble learning,” IEEE Access, vol. 11, pp. 95432–95447, 2023. doi: 10.1109/ACCESS.2023.3287316
-
[7]
Intelligent hybrid framework for malware detection using static and dynamic analysis,
Y. Zhang, J. Wang, and X. Zhang, “Intelligent hybrid framework for malware detection using static and dynamic analysis,” Journal of Information Security and Applications, vol. 71, p. 103398, 2022. doi: 10.1016/j.jisa.2022.103398
-
[8]
Combining deep learning and heuristic analysis for phishing detection in real time,
H. M. Nguyen and Q. T. Le, “Combining deep learning and heuristic analysis for phishing detection in real time,” Computers & Electrical Engineering, vol. 118, p. 109218,
-
[9]
doi: 10.1016/j.compeleceng.2024.109218
-
[10]
Role of logistic regression in malware detection: A systematic literature review,
A. Farooq and U. Akram, “Role of logistic regression in malware detection: A systematic literature review,” Journal of Cybersecurity Research, vol. 6, no. 2, pp. 77– 94, 2023. doi: 10.32604/jcsr.2023.026037
-
[11]
Threat intelligence –driven malware triage using VirusTotal and ML correlation models,
T. Sultana and S. Tariq, “Threat intelligence –driven malware triage using VirusTotal and ML correlation models,” Digital Communications and Networks , vol. 9, no. 3, pp. 534 –546, 2023. doi: 10.1016/j.dcan.2023.03.009
-
[12]
D. Trizna, “Quo Vadis: Hybrid machine learning meta - model based on contextual and behavioral malware representations,” arXiv preprint , arXiv:2208.03912, 2022
-
[13]
Review of hybrid analysis technique for malware detection,
Y. K. M. M. Yunus and S. B. Ngah, “Review of hybrid analysis technique for malware detection,” ResearchGate, 2023
work page 2023
-
[14]
Hybrid machine learning model for phishing detection,
P. Maturure et al., “Hybrid machine learning model for phishing detection,” Information Security Journal, 2024. doi: 10.1080/19393555.2024.1234567
-
[15]
Modeling hybrid feature -based phishing websites detection using machine learning,
“Modeling hybrid feature -based phishing websites detection using machine learning,” PubMed Central (PMC), 2023
work page 2023
-
[16]
A systematic literature review on phishing website detection,
“A systematic literature review on phishing website detection,” ScienceDirect, 2023. doi: 10.1016/j.cose.2023.102731
-
[17]
The applicability of a hybrid framework for automated phishing,
“The applicability of a hybrid framework for automated phishing,” ScienceDirect, 2023. doi: 10.1016/j.cose.2023.102721
-
[18]
M. M. Chiampi and L. L. Zilberti, “Induction of electric field in human bodies moving near MRI: An efficient BEM computational procedure,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 10, pp. 2787 –2793,
-
[19]
doi: 10.1109/TBME.2011.2158315
-
[20]
From information security to cyber security,
R. Von Solms and J. Van Niekerk, “From information security to cyber security,” Computers & Security , vol. 38, pp. 97–102, 2013. doi: 10.1016/j.cose.2013.04.004
-
[21]
A survey on encrypted network traffic analysis using deep learning,
E. Papadogiannaki, A. Ioannidis, and G. Kambourakis, “A survey on encrypted network traffic analysis using deep learning,” IEEE Access, vol. 9, pp. 74949 –74972,
-
[22]
doi: 10.1109/ACCESS.2021.3080099
-
[23]
Hybrid deep learning model for intrusion detection based on CNN and BiLSTM,
C. Zhao et al., “Hybrid deep learning model for intrusion detection based on CNN and BiLSTM,” IEEE Access, vol. 10, pp. 76853 –76865, 2022. doi: 10.1109/ACCESS.2022.3189614
-
[24]
Survey of intrusion detection systems: Techniques, datasets and challenges,
A. T. Khraisat, A. V. Gondal, and P. Vamplew, “Survey of intrusion detection systems: Techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1, pp. 1–22, 2019. doi: 10.1186/s42400-019-0038-7
-
[25]
Hybrid ensemble learning model for malware detection based on static and dynamic features,
Y. Li et al., “Hybrid ensemble learning model for malware detection based on static and dynamic features,” Expert Systems with Applications , vol. 210, p. 118321,
-
[26]
doi: 10.1016/j.eswa.2022.118321
-
[27]
Phishing detection using hybrid feature extraction and machine learning,
S. Chaudhary et al., “Phishing detection using hybrid feature extraction and machine learning,” IEEE Access, vol. 11, pp. 5692 –5705, 2023. doi: 10.1109/ACCESS.2023.3247120
-
[28]
Hybrid intelligent malware detection system using CNN –RF architecture,
K. N. Kumar, A. R. Basha, and T. Anuradha, “Hybrid intelligent malware detection system using CNN –RF architecture,” Journal of King Saud University – Computer and Information Sciences , 2023. doi: 10.1016/j.jksuci.2023.101545
-
[29]
Deep hybrid model for URL - based phishing detection using character -level CNNs,
P. Singh and D. Ghosh, “Deep hybrid model for URL - based phishing detection using character -level CNNs,” Computers & Security , vol. 115, p. 102645, 2022. doi: 10.1016/j.cose.2021.102645
-
[30]
Malware classification using explainable hybrid ML framework,
S. R. Dey, J. Banik, and A. Mukherjee, “Malware classification using explainable hybrid ML framework,” Pattern Recognition Letters, vol. 171, pp. 131–139, 2023. doi: 10.1016/j.patrec.2023.03.019
-
[31]
Adaptive hybrid framework for cyber -threat intelligence fusion,
H. Gupta, R. Verma, and S. Singh, “Adaptive hybrid framework for cyber -threat intelligence fusion,” Digital Threats: Research and Practice , vol. 5, no. 1, pp. 1 –14,
-
[32]
doi: 10.1145/3635441
-
[33]
Explainable machine learning for malware detection: A hybrid approach,
C. Zhou and X. Jiang, “Explainable machine learning for malware detection: A hybrid approach,” ACM Computing Surveys, vol. 56, no. 5, pp. 1 –28, 2024. doi: 10.1145/3631012
-
[34]
A detailed analysis of the KDD CUP 99 data set
S. Tavallaee et al., “A detailed analysis of the KDD CUP 99 data set,” IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009. doi: 10.1109/CISDA.2009.5356528
-
[35]
European Commission, Action plan against disinformation, 2018
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.