pith. sign in

arxiv: 2605.19864 · v1 · pith:W35DP7LJnew · submitted 2026-05-19 · 💻 cs.NE

Multi-population Diversity-guided Genetic Algorithm for Feature Selection in Network Intrusion Detection

Pith reviewed 2026-05-20 01:30 UTC · model grok-4.3

classification 💻 cs.NE
keywords genetic algorithmfeature selectionnetwork intrusion detectionmulti-populationdiversity maintenanceinformation gain ratiocybersecurity
0
0 comments X

The pith

A chained multi-population genetic algorithm with an information-gain-ratio operator maintains diversity to select effective features for network intrusion detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses limitations in genetic algorithm-based feature selection for network intrusion detection, where high-dimensional redundant features make it hard to maintain population diversity and guide evolution. It proposes the Multi-Population Diversity-Guided Genetic Algorithm that first constructs a chained multi-population evolutionary structure and then applies a diversity-guided operator using information gain ratio. Experiments across eleven datasets demonstrate that this approach yields higher accuracy than competing methods while selecting a small fraction of features. A sympathetic reader would care because more efficient feature selection could lead to faster and more accurate cybersecurity systems.

Core claim

The authors establish that a chained multi-population structure combined with a diversity-guided operator based on information gain ratio solves the problems of maintaining diversity and providing evolutionary guidance in genetic algorithms applied to high-dimensional redundant traffic features for intrusion detection.

What carries the argument

Chained multi-population evolutionary structure with information-gain-ratio diversity operator, which preserves variety across populations and directs selection toward informative features.

Load-bearing premise

The chained multi-population structure combined with an information-gain-ratio diversity operator will reliably maintain population diversity and provide effective evolutionary guidance when the input feature space is high-dimensional and redundant.

What would settle it

Experiments on a similar high-dimensional dataset where the proposed approach fails to maintain higher diversity levels or does not achieve superior accuracy compared to standard genetic algorithms would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.19864 by Chunzhen Li.

Figure 1
Figure 1. Figure 1: Overall flowchart of the model proposed in this study: Different colored circles represent chromosomes from different subpopulations. 𝐶 = (𝑔1 , ⋯ 𝑔𝑘 , ⋯ 𝑔𝑑 ), 𝑔𝑘 ∈ {0, 1}, 1 ≤ 𝑘 ≤ 𝑑, (1) where 𝑔𝑘 = 1 signifies the inclusion of the 𝑘-th feature, whereas 𝑔𝑘 = 0 indicates its exclusion. 3.2.2. Population Random Initialization In feature selection, each chromosome represents a candidate feature subset. To guar… view at source ↗
Figure 2
Figure 2. Figure 2: Chain structure of chromosomes within a subpopulation: Circles represent chromosomes, which are connected sequentially by chains. As shown by the arrows, the current chromosome crossover with the next chromosome, and then replaces the current chromosome based on the quality of the crossover result. 3.3.2. Chromosomal Evolution among Subpopulations While intra-subpopulation evolution emphasizes local optimi… view at source ↗
Figure 3
Figure 3. Figure 3: Chain structure of chromosomes among a subpopulation: The different colored circles appearing within the dashed boxes represent chromosomes that have been cross-substituted. 3.4.1. Subset Evaluation Criteria Based on Information Gain Ratio Crossover or mutation operation yields a candidate fea￾ture subset. To avoid calling the costly classifiers too often to check the quality of these candidate solutions, … view at source ↗
read the original abstract

Network Intrusion Detection System is a critical means of ensuring cybersecurity. However, existing Genetic Algorithm-based feature selection methods face several limitations when dealing with high-dimensional redundant traffic features. For example, population diversity is difficult to maintain, and evolutionary operators lack guidance. To solve these problems, this study proposes the Multi-Population Diversity-Guided Genetic Algorithm (MPDGGA). First, we build a chained multi-population evolutionary structure. Second, we introduce a diversity-guided operator based on information gain ratio. Experiments on NSL-KDD, UNSW-NB15, and 9 UCI datasets show that the proposed model significantly outperforms four other advanced multi-population feature selection models. Across the 11 datasets, it attains the highest accuracy on 10 datasets and at least 2.26% of the features were selected.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Multi-Population Diversity-Guided Genetic Algorithm (MPDGGA) for feature selection in network intrusion detection. It introduces a chained multi-population evolutionary structure and an information-gain-ratio-based diversity operator to address difficulties in maintaining population diversity and providing evolutionary guidance in high-dimensional redundant feature spaces. Experiments on the NSL-KDD, UNSW-NB15, and nine UCI datasets are claimed to show that MPDGGA significantly outperforms four other advanced multi-population feature selection models, attaining the highest accuracy on 10 of the 11 datasets while selecting at least 2.26% of the features.

Significance. If the reported performance gains prove robust under proper statistical controls, the work could offer a practical advance in evolutionary feature selection for cybersecurity by demonstrating how structured multi-population evolution combined with an information-theoretic diversity mechanism can improve accuracy while aggressively reducing feature count on standard intrusion-detection benchmarks.

major comments (2)
  1. [Experimental results] Experimental results section: the central claim that MPDGGA 'significantly outperforms' the four baselines on 11 datasets is unsupported by any report of the number of independent runs, mean/stddev accuracy across trials, or statistical significance tests (t-test, Wilcoxon, etc.). Because genetic algorithms are stochastic, single-point accuracy figures cannot reliably establish superiority in high-dimensional redundant spaces; this omission directly weakens the empirical foundation for the chained multi-population plus information-gain-ratio design.
  2. [Method] Method section (chained multi-population structure): the description of how the chained populations interact and how the information-gain-ratio operator supplies evolutionary guidance lacks sufficient formalization or pseudocode. Without explicit definitions of migration rules, diversity metric computation, and selection pressure, it is impossible to verify that the claimed diversity maintenance actually occurs or is responsible for the reported accuracy gains.
minor comments (2)
  1. [Abstract] The abstract states 'at least 2.26% of the features were selected' without clarifying whether this is the minimum across datasets, an average, or a specific dataset; a table summarizing selected-feature percentages per dataset would improve clarity.
  2. [Abstract] Baseline methods are referred to only as 'four other advanced multi-population feature selection models' without naming them or citing their original papers in the abstract; this should be expanded for immediate context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the empirical and methodological presentation of our work. We have revised the manuscript to incorporate additional statistical reporting and formal algorithmic details as detailed below.

read point-by-point responses
  1. Referee: [Experimental results] Experimental results section: the central claim that MPDGGA 'significantly outperforms' the four baselines on 11 datasets is unsupported by any report of the number of independent runs, mean/stddev accuracy across trials, or statistical significance tests (t-test, Wilcoxon, etc.). Because genetic algorithms are stochastic, single-point accuracy figures cannot reliably establish superiority in high-dimensional redundant spaces; this omission directly weakens the empirical foundation for the chained multi-population plus information-gain-ratio design.

    Authors: We agree that the stochastic nature of genetic algorithms requires reporting of multiple runs and statistical tests to support claims of superiority. In the revised manuscript we now specify that all experiments were repeated over 30 independent runs per method per dataset. We report mean accuracy and standard deviation for each method, and we have added Wilcoxon signed-rank tests (with p-values) comparing MPDGGA against each baseline. The updated tables and text show that the performance differences remain statistically significant on the majority of datasets, thereby reinforcing the contribution of the chained multi-population structure and information-gain-ratio operator. revision: yes

  2. Referee: [Method] Method section (chained multi-population structure): the description of how the chained populations interact and how the information-gain-ratio operator supplies evolutionary guidance lacks sufficient formalization or pseudocode. Without explicit definitions of migration rules, diversity metric computation, and selection pressure, it is impossible to verify that the claimed diversity maintenance actually occurs or is responsible for the reported accuracy gains.

    Authors: We accept that greater formalization is needed for reproducibility. The revised method section now includes (i) a complete pseudocode listing for MPDGGA that explicitly shows the chaining of populations, (ii) the precise migration rule (top-k elite individuals transferred every m generations), (iii) the mathematical definition of the information-gain-ratio diversity metric and how it is used to adjust selection probabilities, and (iv) the selection-pressure schedule. These additions allow direct verification that the diversity mechanism operates as claimed. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical algorithmic proposal with independent experimental validation

full rationale

The paper introduces MPDGGA as a new multi-population genetic algorithm variant incorporating a chained evolutionary structure and an information-gain-ratio diversity operator. All performance claims rest on direct empirical comparisons against four baseline multi-population methods across 11 public datasets (NSL-KDD, UNSW-NB15, and 9 UCI sets), reporting accuracy and feature-selection ratios as observed outcomes. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims therefore remain self-contained experimental results rather than reductions to prior inputs or definitions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions of evolutionary computation (population-based search, selection, crossover, mutation) plus the domain assumption that information gain ratio is a suitable proxy for guiding diversity in feature-selection GAs. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Standard genetic algorithm operators (selection, crossover, mutation) can be extended with multi-population chaining and information-gain guidance without breaking convergence properties.
    Invoked implicitly when the authors claim the new operators solve the stated limitations of existing GA-based feature selection.

pith-pipeline@v0.9.0 · 5659 in / 1140 out tokens · 43667 ms · 2026-05-20T01:30:52.964929+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    An explainable deep learning-enabled intrusion detection framework in iot networks

    Keshk, M., Koroniotis, N., Pham, N., Moustafa, N., Turnbull, B., Zomaya, A.Y ., 2023. An explainable deep learning-enabled intrusion detection framework in iot networks. Information Sciences 639, 119000. doi:https://doi.org/10.1016/j.ins.2023 .119000

  2. [2]

    Esvi-gamm: A fast network intrusion detection approach based on the bayesian gamma mixture model

    He, W., Cai, X., Lai, Y ., Yuan, X., 2024. Esvi-gamm: A fast network intrusion detection approach based on the bayesian gamma mixture model. Information Sciences 678, 121001. doi:https://doi.or g/10.1016/j.ins.2024.121001

  3. [3]

    Blockchain-integrated intrusion detection system with optimized cosine cnn for enhanced privacy and security in cloud computing

    Rejin Paul, N., Nallarasan, V ., Krishnaiah, N., Guganathan, L., 2026. Blockchain-integrated intrusion detection system with optimized cosine cnn for enhanced privacy and security in cloud computing. Information Sciences 735, 123015. doi:https://doi.org/10 .1016/j.ins.2025.123015

  4. [4]

    Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree- based classifiers

    Zoppi, T., Gazzini, S., Ceccarelli, A., 2024. Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree- based classifiers. Future Generation Computer Systems 160, 951–

  5. [5]

    doi:https://doi.org/10.1016/j.future.2024. 06.051

  6. [6]

    An improved binary simulated annealing algorithm and TPE-FL- LightGBM for fast network intrusion detection

    Luo, Y ., Chen, R., Li, C., Yang, D., Tang, K., Su, J., 2025. An improved binary simulated annealing algorithm and TPE-FL- LightGBM for fast network intrusion detection. Electronics 14. doi:http://dx.doi.org/10.3390/electronics1402 0231

  7. [7]

    Multi- strategy RIME optimization algorithm for feature selection of net- work intrusion detection

    Wang, L., Xu, J., Jia, L., Wang, T., Xu, Y ., Liu, X., 2025. Multi- strategy RIME optimization algorithm for feature selection of net- work intrusion detection. Computers & Security 153, 104393. doi:https://doi.org/10.1016/j.cose.2025.104393

  8. [8]

    Feature selection method for network intrusion based on hybrid meta- heuristic dynamic optimization algorithm

    Gong, X., Yang, Y ., Zhang, Y ., Li, N., Guan, Y ., Jiang, R., 2025. Feature selection method for network intrusion based on hybrid meta- heuristic dynamic optimization algorithm. Computers & Security 156, 104512. doi:https://doi.org/10.1016/j.cose.202 5.104512

  9. [9]

    An entropy-based multi- objective feature selection method for network intrusion detection

    Raeisi, Z., Maleki, H.R., Akbari, R., 2025. An entropy-based multi- objective feature selection method for network intrusion detection. Cluster Computing 28, 790. doi:https://doi.org/10.1007/ s10586-025-05465-z

  10. [10]

    Building robust lightweight network intrusion detection: a multi-pronged approach with DSSTE and deep learning for securing edge-enabled industrial IoT applica- tions

    Rahamathulla, M.Y ., Ramaiah, M., 2025. Building robust lightweight network intrusion detection: a multi-pronged approach with DSSTE and deep learning for securing edge-enabled industrial IoT applica- tions. Cluster Computing 28, 515. doi:http://dx.doi.org/1 0.1007/s10586-025-05171-w

  11. [11]

    Network intrusion detection model using wrapper based feature selection and multi head attention transformers

    Umer, M., Tahir, M., Sardaraz, M., Sharif, M., Elmannai, H., Algarni, A.D., 2025. Network intrusion detection model using wrapper based feature selection and multi head attention transformers. Scientific Reports 15, 28718. doi:http://dx.doi.org/10.1038/s41 598-025-11348-5

  12. [12]

    Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence

    Holland, J.H., 1992. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press

  13. [13]

    Research of multi-population agent genetic algorithm for feature selection

    Li, Y ., Zhang, S., Zeng, X., 2009. Research of multi-population agent genetic algorithm for feature selection. Expert Systems with Applications 36, 11570–11581. doi:https://doi.org/10.1 016/j.eswa.2009.03.032

  14. [14]

    A survey on evolutionary computation approaches to feature selection

    Xue, B., Zhang, M., Browne, W.N., Yao, X., 2016. A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation 20, 606–626. doi:http: //dx.doi.org/10.1109/TEVC.2015.2504420

  15. [15]

    Hyperparameter recommendation via automated meta-feature selection embedded with kernel group lasso learning

    Deng, L., Xiao, M., 2024. Hyperparameter recommendation via automated meta-feature selection embedded with kernel group lasso learning. Knowledge-Based Systems 306, 112706. doi:https: //doi.org/10.1016/j.knosys.2024.112706

  16. [16]

    Advances in nature- inspired metaheuristic optimization for feature selection problem: A comprehensive survey

    Nssibi, M., Manita, G., Korbaa, O., 2023. Advances in nature- inspired metaheuristic optimization for feature selection problem: A comprehensive survey. Computer Science Review 49, 100559. doi:https://doi.org/10.1016/j.cosrev.2023.10 0559

  17. [17]

    A fast intrusion detection system based on swift wrapper feature selection and speedy ensemble classifier

    Zorarpaci, E., 2024. A fast intrusion detection system based on swift wrapper feature selection and speedy ensemble classifier. Engineer- ing Applications of Artificial Intelligence 133, 108162. doi:https: //doi.org/10.1016/j.engappai.2024.108162

  18. [18]

    A hybrid approach for intrusion detection in vehicular networks using feature selection and dimensionality reduction with optimized deep learning

    Hassan, F., Syed, Z.S., Memon, A.A., Alqahtany, S.S., Ahmed, N., Reshan, M.S.A., Asiri, Y ., Shaikh, A., 2025. A hybrid approach for intrusion detection in vehicular networks using feature selection and dimensionality reduction with optimized deep learning. PLOS ONE 20, 1–18. doi:http://dx.doi.org/10.1371/journal.p one.0312752

  19. [19]

    SCADA intrusion detection scheme exploiting the fusion of modified decision tree and chi-square feature selection

    Ahakonye, L.A.C., Nwakanma, C.I., Lee, J.M., Kim, D.S., 2023. SCADA intrusion detection scheme exploiting the fusion of modified decision tree and chi-square feature selection. Internet of Things 21, 100676. doi:https://doi.org/10.1016/j.iot.2022.1 00676

  20. [20]

    Optimizing feature selection with genetic algorithms: a review of methods and applica- tions

    Taha, Z.Y ., Abdullah, A.A., Rashid, T.A., 2025. Optimizing feature selection with genetic algorithms: a review of methods and applica- tions. Knowledge and Information Systems 67, 9739–9778. doi:ht tp://dx.doi.org/10.1007/s10115-025-02515-1

  21. [22]

    An ensemble learning framework for cyber attack and fault discrimination in smart grids

    Naqqad, A., Boulal, A., Habachi, R., 2025. An ensemble learning framework for cyber attack and fault discrimination in smart grids. Energies 18. doi:http://dx.doi.org/10.3390/en18236 305

  22. [23]

    Apache spark and deep learning models for high-performance network intrusion detection using CSE- CIC-IDS2018

    Hagar, A.A., Gawali, B.W., 2022. Apache spark and deep learning models for high-performance network intrusion detection using CSE- CIC-IDS2018. Computational Intelligence and Neuroscience 2022, 3131153. doi:https://doi.org/10.1155/2022/3131153

  23. [25]

    Feature subspace ensembles: A parallel classifier combination scheme using feature selection, in: Haindl, M., Kittler, J., Roli, F

    Silva, H., Fred, A., 2007. Feature subspace ensembles: A parallel classifier combination scheme using feature selection, in: Haindl, M., Kittler, J., Roli, F. (Eds.), Multiple Classifier Systems, Springer Li et al.:Preprint submitted to Elsevier Page 15 of 16 Berlin Heidelberg, Berlin, Heidelberg. pp. 261–270

  24. [26]

    A correlation-guided cooperative coevolutionary method for feature selection via interaction learning-based space division

    Hou, Y ., Sun, H., Yuan, G., Li, Y ., Che, Z., Ge, H., 2025. A correlation-guided cooperative coevolutionary method for feature selection via interaction learning-based space division. Swarm and Evolutionary Computation 93, 101846. doi:https://doi.org/ 10.1016/j.swevo.2025.101846

  25. [27]

    Structure-to-structure damage correlation for scenario-based re- gional seismic risk assessment.Structural Safety, 95, March 2022

    Zhang, Z., Xue, J., 2025. A novel cooperative co-evolutionary algorithm with context vector enhancement strategy for feature se- lection on high-dimensional classification. Computers & Operations Research 178, 107009. doi:https://doi.org/10.1016/j. cor.2025.107009

  26. [28]

    A multiagent genetic algorithm for global numerical optimization

    Zhong, W., Liu, J., Xue, M., Jiao, L., 2004. A multiagent genetic algorithm for global numerical optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34, 1128–1141. doi:http://dx.doi.org/10.1109/TSMCB.2003.8214 56

  27. [29]

    MPEA-FS: A decomposition-based multi- population evolutionary algorithm for high-dimensional feature se- lection

    Li, W., Chai, Z., 2024. MPEA-FS: A decomposition-based multi- population evolutionary algorithm for high-dimensional feature se- lection. Expert Systems with Applications 247, 123296. doi:https: //doi.org/10.1016/j.eswa.2024.123296

  28. [30]

    MPDCGA: A real- coded multi-population dynamic competitive genetic algorithm for feature selection

    Li, C., Huang, C., Chen, R., Yu, Z., Li, S., 2025. MPDCGA: A real- coded multi-population dynamic competitive genetic algorithm for feature selection. Journal of King Saud University - Computer and Information Sciences 37, 199. doi:http://dx.doi.org/10. 1007/s44443-025-00112-4

  29. [31]

    Method for fault feature selection for a baler gearbox based on an improved adaptive genetic algorithm

    Ren, B., Bai, D., Xue, Z., Xie, H., Zhang, H., 2022. Method for fault feature selection for a baler gearbox based on an improved adaptive genetic algorithm. Chinese Journal of Mechanical Engineering 35,

  30. [32]

    doi:http://dx.doi.org/10.1186/s10033-022-007 28-x

  31. [33]

    A new and fast rival genetic algorithm for feature selection

    Too, J., Abdullah, A.R., 2021. A new and fast rival genetic algorithm for feature selection. The Journal of Supercomputing 77, 2844–2874. doi:http://dx.doi.org/10.1007/s11227-020-03378 -9

  32. [34]

    Evo- lutionary prototype selection for multi-output regression

    Kordos, M., Álvar Arnaiz-González, García-Osorio, C., 2019. Evo- lutionary prototype selection for multi-output regression. Neurocom- puting 358, 309–320. doi:https://doi.org/10.1016/j.ne ucom.2019.05.055

  33. [35]

    A correlation guided genetic algorithm and its application to feature selection

    Zhou, J., Hua, Z., 2022. A correlation guided genetic algorithm and its application to feature selection. Applied Soft Computing 123, 108964. doi:https://doi.org/10.1016/j.asoc.2022. 108964

  34. [36]

    A detailed analysis of the kdd cup 99 data set, in: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, IEEE

    Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A., 2009. A detailed analysis of the kdd cup 99 data set, in: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, IEEE. pp. 1–6

  35. [37]

    Moustafa, N., Slay, J., 2015. UNSW-NB15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set), in: Military Communications and Information Systems Conference (MilCIS), IEEE. pp. 1–6

  36. [38]

    UCI machine learning repository

    Dua, D., Graff, C., 2017. UCI machine learning repository. Li et al.:Preprint submitted to Elsevier Page 16 of 16