A Study and Analysis of a Feature Subset Selection Technique using Penguin Search Optimization Algorithm (FS-PeSOA)

Agnip Dasgupta; Aniket Ghosh Dastidar; Antara Barman; Ardhendu Banerjee; Sanjay Chakraborty

arxiv: 1907.05943 · v1 · pith:QM4C7XGAnew · submitted 2019-07-13 · 💻 cs.LG · stat.ML

A Study and Analysis of a Feature Subset Selection Technique using Penguin Search Optimization Algorithm (FS-PeSOA)

Agnip Dasgupta , Ardhendu Banerjee , Aniket Ghosh Dastidar , Antara Barman , Sanjay Chakraborty This is my paper

Pith reviewed 2026-05-24 22:08 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords feature subset selectionpenguin search optimizationmetaheuristicclassification accuracyRandom ForestSupport Vector MachineNearest NeighbourUCI datasets

0 comments

The pith

FS-PeSOA adapts penguin hunting jumps to find small feature subsets that improve accuracy in standard classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Feature selection Penguin Search Optimization Algorithm (FS-PeSOA), a metaheuristic for choosing the fewest features needed to predict class labels from large data. Penguins in the model dive to random depths and share food-location information to converge on the global optimum; the algorithm uses the same process to generate and score trial feature subsets. Fitness of each subset is measured by running it through Random Forest, Nearest Neighbour, and Support Vector Machine classifiers. The method is intended for well-known UCI benchmark datasets and is positioned for comparison against existing feature-selection techniques. A reader would care because successful feature reduction can cut computational load while preserving or raising predictive performance.

Core claim

The central claim is that translating the group hunting strategy of penguins into an optimization loop produces an effective search over feature-subset candidates, and that the subsets found by this loop yield higher classification accuracy than state-of-the-art methods when evaluated with Random Forest, Nearest Neighbour and SVM on UCI data.

What carries the argument

Penguin Search optimization algorithm: a population-based procedure that generates trial feature subsets by simulating random-depth dives and information sharing, then scores each subset by its classification performance under three fixed evaluators.

If this is right

FS-PeSOA will generate trial subsets whose fitness is scored by Random Forest, Nearest Neighbour and SVM.
The algorithm will be tested on standard UCI benchmark datasets.
Classification accuracy obtained with FS-PeSOA will be compared directly with state-of-the-art feature-selection methods.
The approach is expected to identify smaller feature sets that still support accurate class prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the penguin model works for feature selection, the same information-sharing loop could be reused for other combinatorial search problems in machine learning.
Success on UCI data would motivate testing the method on high-dimensional real-world collections such as gene-expression or image-feature sets.
The three-classifier fitness step could be replaced by a single faster evaluator in resource-constrained settings without changing the core search mechanism.

Load-bearing premise

The natural hunting strategy of penguins can be translated into a search procedure that reliably finds feature subsets giving higher classification accuracy than existing methods.

What would settle it

Run FS-PeSOA on the planned UCI datasets and measure whether the selected subsets produce lower or equal accuracy with Random Forest, Nearest Neighbour or SVM compared with current state-of-the-art feature selection algorithms.

Figures

Figures reproduced from arXiv: 1907.05943 by Agnip Dasgupta, Aniket Ghosh Dastidar, Antara Barman, Ardhendu Banerjee, Sanjay Chakraborty.

read the original abstract

In today world of enormous amounts of data, it is very important to extract useful knowledge from it. This can be accomplished by feature subset selection. Feature subset selection is a method of selecting a minimum number of features with the help of which our machine can learn and predict which class a particular data belongs to. We will introduce a new adaptive algorithm called Feature selection Penguin Search optimization algorithm which is a metaheuristic approach. It is adapted from the natural hunting strategy of penguins in which a group of penguins take jumps at random depths and come back and share the status of food availability with other penguins and in this way, the global optimum solution is found. In order to explore the feature subset candidates, the bioinspired approach Penguin Search optimization algorithm generates during the process a trial feature subset and estimates its fitness value by using three different classifiers for each case: Random Forest, Nearest Neighbour and Support Vector Machines. However, we are planning to implement our proposed approach Feature selection Penguin Search optimization algorithm on some well known benchmark datasets collected from the UCI repository and also try to evaluate and compare its classification accuracy with some state of art algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The manuscript is a proposal to adapt penguin search for feature selection but gives no algorithm definition, encoding, update rules, or results.

read the letter

The manuscript is essentially a proposal to develop a feature selection method based on the Penguin Search Optimization Algorithm. It outlines how penguins' hunting strategy of random depth jumps and food sharing could be used to explore feature subsets, with fitness checked using Random Forest, Nearest Neighbor, and SVM. The authors intend to test this on UCI datasets and compare it to existing methods. What stands out as new is the specific choice of PeSOA for this task, though the adaptation itself follows a standard pattern seen with many other swarm or evolutionary algorithms in feature selection. The abstract explains the high-level idea without much elaboration. The work does not include any formal definition of the algorithm. There is no mention of how a feature subset is represented, what the position update rules would be, or how the random jumps translate to the search space. No pseudocode or parameter values are given. Since the text ends with a statement about future implementation, there are also no experimental results or comparisons to evaluate. This absence makes the central claim hard to assess. The assumption that the penguin model will lead to better accuracy than current methods remains untested in the paper. Minor issues like the lack of detail on the classifiers' role in fitness are secondary to the fact that nothing has been built yet. Readers interested in metaheuristic feature selection might note the idea, but the paper offers little for immediate use or citation. It does not engage deeply enough with the technical challenges of the domain to stand as a contribution on its own. I would not bring this to a reading group or cite it. It should not go to peer review as is, because it lacks the required substance of a research paper.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a new metaheuristic for feature subset selection called FS-PeSOA, adapted from the hunting behavior of penguins (random-depth jumps and food-status sharing to locate a global optimum). It states that the approach will generate trial feature subsets, evaluate their fitness using Random Forest, Nearest Neighbor, and SVM classifiers, and plans to implement and compare the method against state-of-the-art algorithms on UCI benchmark datasets.

Significance. A fully specified and empirically validated penguin-inspired feature-selection algorithm could add a new bio-inspired optimizer to the feature-selection literature. However, because the manuscript supplies neither a formal algorithm definition nor any experimental results, its significance cannot be assessed from the current text.

major comments (3)

[Abstract] Abstract: the central claim is the introduction of a new adaptive algorithm FS-PeSOA, yet the text supplies only a high-level biological analogy and states that the authors 'are planning to implement' the method; no encoding of feature subsets (binary vector, subset size, etc.), position-update equations, control parameters, or pseudocode are provided.
[Abstract] Abstract: the title promises 'a study and analysis,' but the manuscript contains no implementation, no UCI dataset results, no accuracy numbers, and no comparison tables, leaving the empirical claims without support.
[Abstract] Abstract: the fitness-evaluation procedure is described only as 'estimates its fitness value by using three different classifiers'; no details are given on how the three classifier outputs are combined into a single fitness score or how the search balances exploration and exploitation.

minor comments (1)

[Abstract] Abstract: 'today world' should read 'today's world'; 'Neighbour' spelling should be consistent with the journal's preferred variant.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify that the submitted manuscript is a high-level proposal rather than a fully implemented and evaluated study. We will revise the manuscript to provide the requested algorithmic details and to adjust the title and claims to match the current scope. Full experimental results would require additional implementation work beyond the present draft.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim is the introduction of a new adaptive algorithm FS-PeSOA, yet the text supplies only a high-level biological analogy and states that the authors 'are planning to implement' the method; no encoding of feature subsets (binary vector, subset size, etc.), position-update equations, control parameters, or pseudocode are provided.

Authors: We agree the current text is limited to a biological analogy. In revision we will add the binary encoding of feature subsets, the position-update equations derived from penguin depth jumps and food-status sharing, the control parameters (e.g., maximum jump depth, sharing probability), and pseudocode for the complete FS-PeSOA procedure. revision: yes
Referee: [Abstract] Abstract: the title promises 'a study and analysis,' but the manuscript contains no implementation, no UCI dataset results, no accuracy numbers, and no comparison tables, leaving the empirical claims without support.

Authors: The title is indeed broader than the content delivered. We will change the title to reflect a proposed method (e.g., 'A Proposed Feature Subset Selection Technique using Penguin Search Optimization Algorithm (FS-PeSOA)') and will remove or qualify any statements implying completed experiments. Full empirical validation on UCI datasets will be reserved for a subsequent extended manuscript. revision: partial
Referee: [Abstract] Abstract: the fitness-evaluation procedure is described only as 'estimates its fitness value by using three different classifiers'; no details are given on how the three classifier outputs are combined into a single fitness score or how the search balances exploration and exploitation.

Authors: We will expand the fitness section to define an explicit aggregation rule (e.g., weighted average of classification accuracies from Random Forest, Nearest Neighbor, and SVM) and will describe how the penguin-inspired operators control the exploration-exploitation trade-off through random-depth jumps and information sharing. revision: yes

Circularity Check

0 steps flagged

No derivation chain exists; paper announces planned algorithm without equations or formal definition.

full rationale

The manuscript provides only a high-level biological analogy for FS-PeSOA and states an intention to implement and compare on UCI data. No position-update rules, feature-subset encoding, fitness functions, or any equations appear, so no load-bearing step can reduce to its own inputs by construction. The central claim is an unexecuted proposal rather than a delivered derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no mathematical formulation, free parameters, axioms, or invented entities; the algorithm is described only at the conceptual level.

pith-pipeline@v0.9.0 · 5753 in / 946 out tokens · 21355 ms · 2026-05-24T22:08:07.526499+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Stewart, S., & Thomas, M. (2007). Eigenvalues and eigenvectors: Formal, symbolic, and embodied thinking. In The 10th Conference of the Special Interest Group of the Mathematical Association of America on Research in Undergraduate Mathematics Education (pp. 275-296)

work page 2007
[2]

S., Chakraborty, S., & Kairi, A

Tibrewal, B., Chaudhury, G. S., Chakraborty, S., & Kairi, A. (2019). Rough Set-Based Feature Subset Selection Technique Using Jaccard‟s Similarity Index. In Proceedings of International Ethical Hacking Conference 2018 (pp. 477-487). Springer, Singapore

work page 2019
[3]

Goswami, S., Das, A.K., Guha, P. et al. (2017). An approach of feature selection using graph -theoretic heuristic and hill climbing. Pattern Analysis and Applications, Springer. https://doi.org/10.1007/s1 0044- 017-0668-x

work page doi:10.1007/s1 2017
[4]

Goswami, S., Das, A.K., Guha, P. et al. (2017). A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Systems with Applications, Elsevier, 88, 81-94. https://doi.org/10.1016/j.eswa.2017.06.032

work page doi:10.1016/j.eswa.2017.06.032 2017
[5]

Ng, A. (2000). CS229 Lecture notes. CS229 Lecture notes, 1(1), 1-3

work page 2000
[6]

Goswami, S., Chakraborty, S., Guha, P., Tarafdar, A., & Kedia, A. (2019). Filter -Based Feature Selection Methods Using Hill Climbing Approach. In Natural Computing for Unsupervised Learning (pp. 213 -234). Springer, Cham

work page 2019
[7]

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182

work page 2003
[8]

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273 - 324

work page 1997
[9]

(2013, June)

Gheraibia, Y., & Moussaoui, A. (2013, June). Penguins search optimization algorithm (PeSOA). In International Conference on Industrial, Engineering and Other Applications of Applied Intel ligent Systems (pp. 222-231). Springer, Berlin, Heidelberg

work page 2013
[10]

Chandrasekhar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28

work page 2014
[11]

Al-Ani, A. (2005). Feature subset selection using ant colony optimization. International journal of computational intelligence

work page 2005
[12]

Sahu, B., & Mishra, D. (2012). A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Engineering, 38, 27-31

work page 2012
[13]

Mafarja, M., & Mirjalili, S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62, 441-453

work page 2018
[14]

Rashedi, E., & Nezamabadi-pour, H. (2014). Feature subset selection using improved binary gravitational search algorithm. Journal of Intelligent & Fuzzy Systems, 26(3), 1211-1221

work page 2014
[15]

E., & Vrahatis, M

Parsopoulos, K. E., & Vrahatis, M. N. (2002). Particle swarm optimization method for constrained optimization problems. Intelligent Technologies –Theory and Application: New Trends in Intelligent Technologies , 76(1), 214-220

work page 2002
[16]

B., Zaharakis, I., & Pintelas, P

Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, 3-24

work page 2007
[17]

Lichman, M., & Bache, K. (2013). Uci mac hine learning repository. university of california, irvine, school of information and computer sciences. In [Online]. Available: http://archive.ics.uci.edu/ml

work page 2013

[1] [1]

Stewart, S., & Thomas, M. (2007). Eigenvalues and eigenvectors: Formal, symbolic, and embodied thinking. In The 10th Conference of the Special Interest Group of the Mathematical Association of America on Research in Undergraduate Mathematics Education (pp. 275-296)

work page 2007

[2] [2]

S., Chakraborty, S., & Kairi, A

Tibrewal, B., Chaudhury, G. S., Chakraborty, S., & Kairi, A. (2019). Rough Set-Based Feature Subset Selection Technique Using Jaccard‟s Similarity Index. In Proceedings of International Ethical Hacking Conference 2018 (pp. 477-487). Springer, Singapore

work page 2019

[3] [3]

Goswami, S., Das, A.K., Guha, P. et al. (2017). An approach of feature selection using graph -theoretic heuristic and hill climbing. Pattern Analysis and Applications, Springer. https://doi.org/10.1007/s1 0044- 017-0668-x

work page doi:10.1007/s1 2017

[4] [4]

Goswami, S., Das, A.K., Guha, P. et al. (2017). A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Systems with Applications, Elsevier, 88, 81-94. https://doi.org/10.1016/j.eswa.2017.06.032

work page doi:10.1016/j.eswa.2017.06.032 2017

[5] [5]

Ng, A. (2000). CS229 Lecture notes. CS229 Lecture notes, 1(1), 1-3

work page 2000

[6] [6]

Goswami, S., Chakraborty, S., Guha, P., Tarafdar, A., & Kedia, A. (2019). Filter -Based Feature Selection Methods Using Hill Climbing Approach. In Natural Computing for Unsupervised Learning (pp. 213 -234). Springer, Cham

work page 2019

[7] [7]

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182

work page 2003

[8] [8]

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273 - 324

work page 1997

[9] [9]

(2013, June)

Gheraibia, Y., & Moussaoui, A. (2013, June). Penguins search optimization algorithm (PeSOA). In International Conference on Industrial, Engineering and Other Applications of Applied Intel ligent Systems (pp. 222-231). Springer, Berlin, Heidelberg

work page 2013

[10] [10]

Chandrasekhar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28

work page 2014

[11] [11]

Al-Ani, A. (2005). Feature subset selection using ant colony optimization. International journal of computational intelligence

work page 2005

[12] [12]

Sahu, B., & Mishra, D. (2012). A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Engineering, 38, 27-31

work page 2012

[13] [13]

Mafarja, M., & Mirjalili, S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62, 441-453

work page 2018

[14] [14]

Rashedi, E., & Nezamabadi-pour, H. (2014). Feature subset selection using improved binary gravitational search algorithm. Journal of Intelligent & Fuzzy Systems, 26(3), 1211-1221

work page 2014

[15] [15]

E., & Vrahatis, M

Parsopoulos, K. E., & Vrahatis, M. N. (2002). Particle swarm optimization method for constrained optimization problems. Intelligent Technologies –Theory and Application: New Trends in Intelligent Technologies , 76(1), 214-220

work page 2002

[16] [16]

B., Zaharakis, I., & Pintelas, P

Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, 3-24

work page 2007

[17] [17]

Lichman, M., & Bache, K. (2013). Uci mac hine learning repository. university of california, irvine, school of information and computer sciences. In [Online]. Available: http://archive.ics.uci.edu/ml

work page 2013