pith. sign in

arxiv: 1907.05943 · v1 · pith:QM4C7XGAnew · submitted 2019-07-13 · 💻 cs.LG · stat.ML

A Study and Analysis of a Feature Subset Selection Technique using Penguin Search Optimization Algorithm (FS-PeSOA)

Pith reviewed 2026-05-24 22:08 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords feature subset selectionpenguin search optimizationmetaheuristicclassification accuracyRandom ForestSupport Vector MachineNearest NeighbourUCI datasets
0
0 comments X

The pith

FS-PeSOA adapts penguin hunting jumps to find small feature subsets that improve accuracy in standard classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Feature selection Penguin Search Optimization Algorithm (FS-PeSOA), a metaheuristic for choosing the fewest features needed to predict class labels from large data. Penguins in the model dive to random depths and share food-location information to converge on the global optimum; the algorithm uses the same process to generate and score trial feature subsets. Fitness of each subset is measured by running it through Random Forest, Nearest Neighbour, and Support Vector Machine classifiers. The method is intended for well-known UCI benchmark datasets and is positioned for comparison against existing feature-selection techniques. A reader would care because successful feature reduction can cut computational load while preserving or raising predictive performance.

Core claim

The central claim is that translating the group hunting strategy of penguins into an optimization loop produces an effective search over feature-subset candidates, and that the subsets found by this loop yield higher classification accuracy than state-of-the-art methods when evaluated with Random Forest, Nearest Neighbour and SVM on UCI data.

What carries the argument

Penguin Search optimization algorithm: a population-based procedure that generates trial feature subsets by simulating random-depth dives and information sharing, then scores each subset by its classification performance under three fixed evaluators.

If this is right

  • FS-PeSOA will generate trial subsets whose fitness is scored by Random Forest, Nearest Neighbour and SVM.
  • The algorithm will be tested on standard UCI benchmark datasets.
  • Classification accuracy obtained with FS-PeSOA will be compared directly with state-of-the-art feature-selection methods.
  • The approach is expected to identify smaller feature sets that still support accurate class prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the penguin model works for feature selection, the same information-sharing loop could be reused for other combinatorial search problems in machine learning.
  • Success on UCI data would motivate testing the method on high-dimensional real-world collections such as gene-expression or image-feature sets.
  • The three-classifier fitness step could be replaced by a single faster evaluator in resource-constrained settings without changing the core search mechanism.

Load-bearing premise

The natural hunting strategy of penguins can be translated into a search procedure that reliably finds feature subsets giving higher classification accuracy than existing methods.

What would settle it

Run FS-PeSOA on the planned UCI datasets and measure whether the selected subsets produce lower or equal accuracy with Random Forest, Nearest Neighbour or SVM compared with current state-of-the-art feature selection algorithms.

Figures

Figures reproduced from arXiv: 1907.05943 by Agnip Dasgupta, Aniket Ghosh Dastidar, Antara Barman, Ardhendu Banerjee, Sanjay Chakraborty.

Figure 1
Figure 1. Figure 1: Fig.1 [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
read the original abstract

In today world of enormous amounts of data, it is very important to extract useful knowledge from it. This can be accomplished by feature subset selection. Feature subset selection is a method of selecting a minimum number of features with the help of which our machine can learn and predict which class a particular data belongs to. We will introduce a new adaptive algorithm called Feature selection Penguin Search optimization algorithm which is a metaheuristic approach. It is adapted from the natural hunting strategy of penguins in which a group of penguins take jumps at random depths and come back and share the status of food availability with other penguins and in this way, the global optimum solution is found. In order to explore the feature subset candidates, the bioinspired approach Penguin Search optimization algorithm generates during the process a trial feature subset and estimates its fitness value by using three different classifiers for each case: Random Forest, Nearest Neighbour and Support Vector Machines. However, we are planning to implement our proposed approach Feature selection Penguin Search optimization algorithm on some well known benchmark datasets collected from the UCI repository and also try to evaluate and compare its classification accuracy with some state of art algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a new metaheuristic for feature subset selection called FS-PeSOA, adapted from the hunting behavior of penguins (random-depth jumps and food-status sharing to locate a global optimum). It states that the approach will generate trial feature subsets, evaluate their fitness using Random Forest, Nearest Neighbor, and SVM classifiers, and plans to implement and compare the method against state-of-the-art algorithms on UCI benchmark datasets.

Significance. A fully specified and empirically validated penguin-inspired feature-selection algorithm could add a new bio-inspired optimizer to the feature-selection literature. However, because the manuscript supplies neither a formal algorithm definition nor any experimental results, its significance cannot be assessed from the current text.

major comments (3)
  1. [Abstract] Abstract: the central claim is the introduction of a new adaptive algorithm FS-PeSOA, yet the text supplies only a high-level biological analogy and states that the authors 'are planning to implement' the method; no encoding of feature subsets (binary vector, subset size, etc.), position-update equations, control parameters, or pseudocode are provided.
  2. [Abstract] Abstract: the title promises 'a study and analysis,' but the manuscript contains no implementation, no UCI dataset results, no accuracy numbers, and no comparison tables, leaving the empirical claims without support.
  3. [Abstract] Abstract: the fitness-evaluation procedure is described only as 'estimates its fitness value by using three different classifiers'; no details are given on how the three classifier outputs are combined into a single fitness score or how the search balances exploration and exploitation.
minor comments (1)
  1. [Abstract] Abstract: 'today world' should read 'today's world'; 'Neighbour' spelling should be consistent with the journal's preferred variant.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify that the submitted manuscript is a high-level proposal rather than a fully implemented and evaluated study. We will revise the manuscript to provide the requested algorithmic details and to adjust the title and claims to match the current scope. Full experimental results would require additional implementation work beyond the present draft.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim is the introduction of a new adaptive algorithm FS-PeSOA, yet the text supplies only a high-level biological analogy and states that the authors 'are planning to implement' the method; no encoding of feature subsets (binary vector, subset size, etc.), position-update equations, control parameters, or pseudocode are provided.

    Authors: We agree the current text is limited to a biological analogy. In revision we will add the binary encoding of feature subsets, the position-update equations derived from penguin depth jumps and food-status sharing, the control parameters (e.g., maximum jump depth, sharing probability), and pseudocode for the complete FS-PeSOA procedure. revision: yes

  2. Referee: [Abstract] Abstract: the title promises 'a study and analysis,' but the manuscript contains no implementation, no UCI dataset results, no accuracy numbers, and no comparison tables, leaving the empirical claims without support.

    Authors: The title is indeed broader than the content delivered. We will change the title to reflect a proposed method (e.g., 'A Proposed Feature Subset Selection Technique using Penguin Search Optimization Algorithm (FS-PeSOA)') and will remove or qualify any statements implying completed experiments. Full empirical validation on UCI datasets will be reserved for a subsequent extended manuscript. revision: partial

  3. Referee: [Abstract] Abstract: the fitness-evaluation procedure is described only as 'estimates its fitness value by using three different classifiers'; no details are given on how the three classifier outputs are combined into a single fitness score or how the search balances exploration and exploitation.

    Authors: We will expand the fitness section to define an explicit aggregation rule (e.g., weighted average of classification accuracies from Random Forest, Nearest Neighbor, and SVM) and will describe how the penguin-inspired operators control the exploration-exploitation trade-off through random-depth jumps and information sharing. revision: yes

Circularity Check

0 steps flagged

No derivation chain exists; paper announces planned algorithm without equations or formal definition.

full rationale

The manuscript provides only a high-level biological analogy for FS-PeSOA and states an intention to implement and compare on UCI data. No position-update rules, feature-subset encoding, fitness functions, or any equations appear, so no load-bearing step can reduce to its own inputs by construction. The central claim is an unexecuted proposal rather than a delivered derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no mathematical formulation, free parameters, axioms, or invented entities; the algorithm is described only at the conceptual level.

pith-pipeline@v0.9.0 · 5753 in / 946 out tokens · 21355 ms · 2026-05-24T22:08:07.526499+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Stewart, S., & Thomas, M. (2007). Eigenvalues and eigenvectors: Formal, symbolic, and embodied thinking. In The 10th Conference of the Special Interest Group of the Mathematical Association of America on Research in Undergraduate Mathematics Education (pp. 275-296)

  2. [2]

    S., Chakraborty, S., & Kairi, A

    Tibrewal, B., Chaudhury, G. S., Chakraborty, S., & Kairi, A. (2019). Rough Set-Based Feature Subset Selection Technique Using Jaccard‟s Similarity Index. In Proceedings of International Ethical Hacking Conference 2018 (pp. 477-487). Springer, Singapore

  3. [3]

    Goswami, S., Das, A.K., Guha, P. et al. (2017). An approach of feature selection using graph -theoretic heuristic and hill climbing. Pattern Analysis and Applications, Springer. https://doi.org/10.1007/s1 0044- 017-0668-x

  4. [4]

    Goswami, S., Das, A.K., Guha, P. et al. (2017). A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Systems with Applications, Elsevier, 88, 81-94. https://doi.org/10.1016/j.eswa.2017.06.032

  5. [5]

    Ng, A. (2000). CS229 Lecture notes. CS229 Lecture notes, 1(1), 1-3

  6. [6]

    Goswami, S., Chakraborty, S., Guha, P., Tarafdar, A., & Kedia, A. (2019). Filter -Based Feature Selection Methods Using Hill Climbing Approach. In Natural Computing for Unsupervised Learning (pp. 213 -234). Springer, Cham

  7. [7]

    Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182

  8. [8]

    Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273 - 324

  9. [9]

    (2013, June)

    Gheraibia, Y., & Moussaoui, A. (2013, June). Penguins search optimization algorithm (PeSOA). In International Conference on Industrial, Engineering and Other Applications of Applied Intel ligent Systems (pp. 222-231). Springer, Berlin, Heidelberg

  10. [10]

    Chandrasekhar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28

  11. [11]

    Al-Ani, A. (2005). Feature subset selection using ant colony optimization. International journal of computational intelligence

  12. [12]

    Sahu, B., & Mishra, D. (2012). A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Engineering, 38, 27-31

  13. [13]

    Mafarja, M., & Mirjalili, S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62, 441-453

  14. [14]

    Rashedi, E., & Nezamabadi-pour, H. (2014). Feature subset selection using improved binary gravitational search algorithm. Journal of Intelligent & Fuzzy Systems, 26(3), 1211-1221

  15. [15]

    E., & Vrahatis, M

    Parsopoulos, K. E., & Vrahatis, M. N. (2002). Particle swarm optimization method for constrained optimization problems. Intelligent Technologies –Theory and Application: New Trends in Intelligent Technologies , 76(1), 214-220

  16. [16]

    B., Zaharakis, I., & Pintelas, P

    Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, 3-24

  17. [17]

    Lichman, M., & Bache, K. (2013). Uci mac hine learning repository. university of california, irvine, school of information and computer sciences. In [Online]. Available: http://archive.ics.uci.edu/ml