Multi-population Diversity-guided Genetic Algorithm for Feature Selection in Network Intrusion Detection
Pith reviewed 2026-05-20 01:30 UTC · model grok-4.3
The pith
A chained multi-population genetic algorithm with an information-gain-ratio operator maintains diversity to select effective features for network intrusion detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a chained multi-population structure combined with a diversity-guided operator based on information gain ratio solves the problems of maintaining diversity and providing evolutionary guidance in genetic algorithms applied to high-dimensional redundant traffic features for intrusion detection.
What carries the argument
Chained multi-population evolutionary structure with information-gain-ratio diversity operator, which preserves variety across populations and directs selection toward informative features.
Load-bearing premise
The chained multi-population structure combined with an information-gain-ratio diversity operator will reliably maintain population diversity and provide effective evolutionary guidance when the input feature space is high-dimensional and redundant.
What would settle it
Experiments on a similar high-dimensional dataset where the proposed approach fails to maintain higher diversity levels or does not achieve superior accuracy compared to standard genetic algorithms would falsify the claim.
Figures
read the original abstract
Network Intrusion Detection System is a critical means of ensuring cybersecurity. However, existing Genetic Algorithm-based feature selection methods face several limitations when dealing with high-dimensional redundant traffic features. For example, population diversity is difficult to maintain, and evolutionary operators lack guidance. To solve these problems, this study proposes the Multi-Population Diversity-Guided Genetic Algorithm (MPDGGA). First, we build a chained multi-population evolutionary structure. Second, we introduce a diversity-guided operator based on information gain ratio. Experiments on NSL-KDD, UNSW-NB15, and 9 UCI datasets show that the proposed model significantly outperforms four other advanced multi-population feature selection models. Across the 11 datasets, it attains the highest accuracy on 10 datasets and at least 2.26% of the features were selected.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the Multi-Population Diversity-Guided Genetic Algorithm (MPDGGA) for feature selection in network intrusion detection. It introduces a chained multi-population evolutionary structure and an information-gain-ratio-based diversity operator to address difficulties in maintaining population diversity and providing evolutionary guidance in high-dimensional redundant feature spaces. Experiments on the NSL-KDD, UNSW-NB15, and nine UCI datasets are claimed to show that MPDGGA significantly outperforms four other advanced multi-population feature selection models, attaining the highest accuracy on 10 of the 11 datasets while selecting at least 2.26% of the features.
Significance. If the reported performance gains prove robust under proper statistical controls, the work could offer a practical advance in evolutionary feature selection for cybersecurity by demonstrating how structured multi-population evolution combined with an information-theoretic diversity mechanism can improve accuracy while aggressively reducing feature count on standard intrusion-detection benchmarks.
major comments (2)
- [Experimental results] Experimental results section: the central claim that MPDGGA 'significantly outperforms' the four baselines on 11 datasets is unsupported by any report of the number of independent runs, mean/stddev accuracy across trials, or statistical significance tests (t-test, Wilcoxon, etc.). Because genetic algorithms are stochastic, single-point accuracy figures cannot reliably establish superiority in high-dimensional redundant spaces; this omission directly weakens the empirical foundation for the chained multi-population plus information-gain-ratio design.
- [Method] Method section (chained multi-population structure): the description of how the chained populations interact and how the information-gain-ratio operator supplies evolutionary guidance lacks sufficient formalization or pseudocode. Without explicit definitions of migration rules, diversity metric computation, and selection pressure, it is impossible to verify that the claimed diversity maintenance actually occurs or is responsible for the reported accuracy gains.
minor comments (2)
- [Abstract] The abstract states 'at least 2.26% of the features were selected' without clarifying whether this is the minimum across datasets, an average, or a specific dataset; a table summarizing selected-feature percentages per dataset would improve clarity.
- [Abstract] Baseline methods are referred to only as 'four other advanced multi-population feature selection models' without naming them or citing their original papers in the abstract; this should be expanded for immediate context.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the empirical and methodological presentation of our work. We have revised the manuscript to incorporate additional statistical reporting and formal algorithmic details as detailed below.
read point-by-point responses
-
Referee: [Experimental results] Experimental results section: the central claim that MPDGGA 'significantly outperforms' the four baselines on 11 datasets is unsupported by any report of the number of independent runs, mean/stddev accuracy across trials, or statistical significance tests (t-test, Wilcoxon, etc.). Because genetic algorithms are stochastic, single-point accuracy figures cannot reliably establish superiority in high-dimensional redundant spaces; this omission directly weakens the empirical foundation for the chained multi-population plus information-gain-ratio design.
Authors: We agree that the stochastic nature of genetic algorithms requires reporting of multiple runs and statistical tests to support claims of superiority. In the revised manuscript we now specify that all experiments were repeated over 30 independent runs per method per dataset. We report mean accuracy and standard deviation for each method, and we have added Wilcoxon signed-rank tests (with p-values) comparing MPDGGA against each baseline. The updated tables and text show that the performance differences remain statistically significant on the majority of datasets, thereby reinforcing the contribution of the chained multi-population structure and information-gain-ratio operator. revision: yes
-
Referee: [Method] Method section (chained multi-population structure): the description of how the chained populations interact and how the information-gain-ratio operator supplies evolutionary guidance lacks sufficient formalization or pseudocode. Without explicit definitions of migration rules, diversity metric computation, and selection pressure, it is impossible to verify that the claimed diversity maintenance actually occurs or is responsible for the reported accuracy gains.
Authors: We accept that greater formalization is needed for reproducibility. The revised method section now includes (i) a complete pseudocode listing for MPDGGA that explicitly shows the chaining of populations, (ii) the precise migration rule (top-k elite individuals transferred every m generations), (iii) the mathematical definition of the information-gain-ratio diversity metric and how it is used to adjust selection probabilities, and (iv) the selection-pressure schedule. These additions allow direct verification that the diversity mechanism operates as claimed. revision: yes
Circularity Check
No circularity: empirical algorithmic proposal with independent experimental validation
full rationale
The paper introduces MPDGGA as a new multi-population genetic algorithm variant incorporating a chained evolutionary structure and an information-gain-ratio diversity operator. All performance claims rest on direct empirical comparisons against four baseline multi-population methods across 11 public datasets (NSL-KDD, UNSW-NB15, and 9 UCI sets), reporting accuracy and feature-selection ratios as observed outcomes. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims therefore remain self-contained experimental results rather than reductions to prior inputs or definitions by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard genetic algorithm operators (selection, crossover, mutation) can be extended with multi-population chaining and information-gain guidance without breaking convergence properties.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a chained multi-population structure... diversity-guided operator based on information gain ratio... fitness function f(ℒi,j) = (1−α)⋅Acc(ℒi,j) + α⋅(1−Ns/Nf)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Experiments on NSL-KDD, UNSW-NB15, and 9 UCI datasets... highest accuracy on 10 datasets
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
An explainable deep learning-enabled intrusion detection framework in iot networks
Keshk, M., Koroniotis, N., Pham, N., Moustafa, N., Turnbull, B., Zomaya, A.Y ., 2023. An explainable deep learning-enabled intrusion detection framework in iot networks. Information Sciences 639, 119000. doi:https://doi.org/10.1016/j.ins.2023 .119000
-
[2]
Esvi-gamm: A fast network intrusion detection approach based on the bayesian gamma mixture model
He, W., Cai, X., Lai, Y ., Yuan, X., 2024. Esvi-gamm: A fast network intrusion detection approach based on the bayesian gamma mixture model. Information Sciences 678, 121001. doi:https://doi.or g/10.1016/j.ins.2024.121001
-
[3]
Rejin Paul, N., Nallarasan, V ., Krishnaiah, N., Guganathan, L., 2026. Blockchain-integrated intrusion detection system with optimized cosine cnn for enhanced privacy and security in cloud computing. Information Sciences 735, 123015. doi:https://doi.org/10 .1016/j.ins.2025.123015
-
[4]
Zoppi, T., Gazzini, S., Ceccarelli, A., 2024. Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree- based classifiers. Future Generation Computer Systems 160, 951–
work page 2024
-
[5]
doi:https://doi.org/10.1016/j.future.2024. 06.051
-
[6]
Luo, Y ., Chen, R., Li, C., Yang, D., Tang, K., Su, J., 2025. An improved binary simulated annealing algorithm and TPE-FL- LightGBM for fast network intrusion detection. Electronics 14. doi:http://dx.doi.org/10.3390/electronics1402 0231
-
[7]
Multi- strategy RIME optimization algorithm for feature selection of net- work intrusion detection
Wang, L., Xu, J., Jia, L., Wang, T., Xu, Y ., Liu, X., 2025. Multi- strategy RIME optimization algorithm for feature selection of net- work intrusion detection. Computers & Security 153, 104393. doi:https://doi.org/10.1016/j.cose.2025.104393
-
[8]
Gong, X., Yang, Y ., Zhang, Y ., Li, N., Guan, Y ., Jiang, R., 2025. Feature selection method for network intrusion based on hybrid meta- heuristic dynamic optimization algorithm. Computers & Security 156, 104512. doi:https://doi.org/10.1016/j.cose.202 5.104512
-
[9]
An entropy-based multi- objective feature selection method for network intrusion detection
Raeisi, Z., Maleki, H.R., Akbari, R., 2025. An entropy-based multi- objective feature selection method for network intrusion detection. Cluster Computing 28, 790. doi:https://doi.org/10.1007/ s10586-025-05465-z
work page 2025
-
[10]
Rahamathulla, M.Y ., Ramaiah, M., 2025. Building robust lightweight network intrusion detection: a multi-pronged approach with DSSTE and deep learning for securing edge-enabled industrial IoT applica- tions. Cluster Computing 28, 515. doi:http://dx.doi.org/1 0.1007/s10586-025-05171-w
work page 2025
-
[11]
Umer, M., Tahir, M., Sardaraz, M., Sharif, M., Elmannai, H., Algarni, A.D., 2025. Network intrusion detection model using wrapper based feature selection and multi head attention transformers. Scientific Reports 15, 28718. doi:http://dx.doi.org/10.1038/s41 598-025-11348-5
work page doi:10.1038/s41 2025
-
[12]
Holland, J.H., 1992. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press
work page 1992
-
[13]
Research of multi-population agent genetic algorithm for feature selection
Li, Y ., Zhang, S., Zeng, X., 2009. Research of multi-population agent genetic algorithm for feature selection. Expert Systems with Applications 36, 11570–11581. doi:https://doi.org/10.1 016/j.eswa.2009.03.032
work page 2009
-
[14]
A survey on evolutionary computation approaches to feature selection
Xue, B., Zhang, M., Browne, W.N., Yao, X., 2016. A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation 20, 606–626. doi:http: //dx.doi.org/10.1109/TEVC.2015.2504420
-
[15]
Deng, L., Xiao, M., 2024. Hyperparameter recommendation via automated meta-feature selection embedded with kernel group lasso learning. Knowledge-Based Systems 306, 112706. doi:https: //doi.org/10.1016/j.knosys.2024.112706
-
[16]
Nssibi, M., Manita, G., Korbaa, O., 2023. Advances in nature- inspired metaheuristic optimization for feature selection problem: A comprehensive survey. Computer Science Review 49, 100559. doi:https://doi.org/10.1016/j.cosrev.2023.10 0559
-
[17]
Zorarpaci, E., 2024. A fast intrusion detection system based on swift wrapper feature selection and speedy ensemble classifier. Engineer- ing Applications of Artificial Intelligence 133, 108162. doi:https: //doi.org/10.1016/j.engappai.2024.108162
-
[18]
Hassan, F., Syed, Z.S., Memon, A.A., Alqahtany, S.S., Ahmed, N., Reshan, M.S.A., Asiri, Y ., Shaikh, A., 2025. A hybrid approach for intrusion detection in vehicular networks using feature selection and dimensionality reduction with optimized deep learning. PLOS ONE 20, 1–18. doi:http://dx.doi.org/10.1371/journal.p one.0312752
-
[19]
Ahakonye, L.A.C., Nwakanma, C.I., Lee, J.M., Kim, D.S., 2023. SCADA intrusion detection scheme exploiting the fusion of modified decision tree and chi-square feature selection. Internet of Things 21, 100676. doi:https://doi.org/10.1016/j.iot.2022.1 00676
-
[20]
Optimizing feature selection with genetic algorithms: a review of methods and applica- tions
Taha, Z.Y ., Abdullah, A.A., Rashid, T.A., 2025. Optimizing feature selection with genetic algorithms: a review of methods and applica- tions. Knowledge and Information Systems 67, 9739–9778. doi:ht tp://dx.doi.org/10.1007/s10115-025-02515-1
-
[22]
An ensemble learning framework for cyber attack and fault discrimination in smart grids
Naqqad, A., Boulal, A., Habachi, R., 2025. An ensemble learning framework for cyber attack and fault discrimination in smart grids. Energies 18. doi:http://dx.doi.org/10.3390/en18236 305
-
[23]
Hagar, A.A., Gawali, B.W., 2022. Apache spark and deep learning models for high-performance network intrusion detection using CSE- CIC-IDS2018. Computational Intelligence and Neuroscience 2022, 3131153. doi:https://doi.org/10.1155/2022/3131153
-
[25]
Silva, H., Fred, A., 2007. Feature subspace ensembles: A parallel classifier combination scheme using feature selection, in: Haindl, M., Kittler, J., Roli, F. (Eds.), Multiple Classifier Systems, Springer Li et al.:Preprint submitted to Elsevier Page 15 of 16 Berlin Heidelberg, Berlin, Heidelberg. pp. 261–270
work page 2007
-
[26]
Hou, Y ., Sun, H., Yuan, G., Li, Y ., Che, Z., Ge, H., 2025. A correlation-guided cooperative coevolutionary method for feature selection via interaction learning-based space division. Swarm and Evolutionary Computation 93, 101846. doi:https://doi.org/ 10.1016/j.swevo.2025.101846
-
[27]
Zhang, Z., Xue, J., 2025. A novel cooperative co-evolutionary algorithm with context vector enhancement strategy for feature se- lection on high-dimensional classification. Computers & Operations Research 178, 107009. doi:https://doi.org/10.1016/j. cor.2025.107009
work page doi:10.1016/j 2025
-
[28]
A multiagent genetic algorithm for global numerical optimization
Zhong, W., Liu, J., Xue, M., Jiao, L., 2004. A multiagent genetic algorithm for global numerical optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34, 1128–1141. doi:http://dx.doi.org/10.1109/TSMCB.2003.8214 56
-
[29]
Li, W., Chai, Z., 2024. MPEA-FS: A decomposition-based multi- population evolutionary algorithm for high-dimensional feature se- lection. Expert Systems with Applications 247, 123296. doi:https: //doi.org/10.1016/j.eswa.2024.123296
-
[30]
MPDCGA: A real- coded multi-population dynamic competitive genetic algorithm for feature selection
Li, C., Huang, C., Chen, R., Yu, Z., Li, S., 2025. MPDCGA: A real- coded multi-population dynamic competitive genetic algorithm for feature selection. Journal of King Saud University - Computer and Information Sciences 37, 199. doi:http://dx.doi.org/10. 1007/s44443-025-00112-4
work page 2025
-
[31]
Ren, B., Bai, D., Xue, Z., Xie, H., Zhang, H., 2022. Method for fault feature selection for a baler gearbox based on an improved adaptive genetic algorithm. Chinese Journal of Mechanical Engineering 35,
work page 2022
-
[32]
doi:http://dx.doi.org/10.1186/s10033-022-007 28-x
-
[33]
A new and fast rival genetic algorithm for feature selection
Too, J., Abdullah, A.R., 2021. A new and fast rival genetic algorithm for feature selection. The Journal of Supercomputing 77, 2844–2874. doi:http://dx.doi.org/10.1007/s11227-020-03378 -9
-
[34]
Evo- lutionary prototype selection for multi-output regression
Kordos, M., Álvar Arnaiz-González, García-Osorio, C., 2019. Evo- lutionary prototype selection for multi-output regression. Neurocom- puting 358, 309–320. doi:https://doi.org/10.1016/j.ne ucom.2019.05.055
-
[35]
A correlation guided genetic algorithm and its application to feature selection
Zhou, J., Hua, Z., 2022. A correlation guided genetic algorithm and its application to feature selection. Applied Soft Computing 123, 108964. doi:https://doi.org/10.1016/j.asoc.2022. 108964
-
[36]
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A., 2009. A detailed analysis of the kdd cup 99 data set, in: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, IEEE. pp. 1–6
work page 2009
-
[37]
Moustafa, N., Slay, J., 2015. UNSW-NB15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set), in: Military Communications and Information Systems Conference (MilCIS), IEEE. pp. 1–6
work page 2015
-
[38]
UCI machine learning repository
Dua, D., Graff, C., 2017. UCI machine learning repository. Li et al.:Preprint submitted to Elsevier Page 16 of 16
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.