Feature Selection via Mutual Information: New Theoretical Insights

Alberto Maria Metelli; Andrea Tirinzoni; Marcello Restelli; Mario Beraha; Matteo Papini

arxiv: 1907.07384 · v1 · pith:JOPGGPL5new · submitted 2019-07-17 · 💻 cs.LG · stat.ML

Feature Selection via Mutual Information: New Theoretical Insights

Mario Beraha , Alberto Maria Metelli , Matteo Papini , Andrea Tirinzoni , Marcello Restelli This is my paper

Pith reviewed 2026-05-24 20:31 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords feature selectionmutual informationconditional mutual informationgreedy algorithmserror boundsregressionclassification

0 comments

The pith

Conditional mutual information arises when bounding the ideal prediction errors of feature subsets, leading to guaranteed stopping conditions for greedy selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that conditional mutual information between the target variable and unselected features, given the selected ones, provides bounds on the lowest achievable regression or classification error. This theoretical link allows the design of a stopping condition for forward and backward greedy feature selection algorithms that keeps the ideal error below a user-chosen threshold. Current heuristic methods lack such guarantees on solution quality. A reader might care because it offers a way to perform feature selection with explicit control over prediction performance in high-dimensional settings.

Core claim

The authors prove that conditional mutual information naturally appears in the derivation of bounds on the ideal errors achieved by different feature subsets. Using this, they introduce a novel stopping condition for greedy methods ensuring the selected subset's prediction error remains bounded.

What carries the argument

Conditional mutual information used to bound ideal regression and classification errors for feature subsets.

If this is right

Greedy algorithms gain a theoretically justified termination criterion based on error bounds.
The approach works for both regression and classification tasks.
Feature subsets can be selected while controlling the maximum ideal error.
Mutual information based selection gains theoretical support beyond heuristics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This bound could be approximated using empirical estimates in practice to make the method usable.
It may inspire similar information-theoretic bounds in other selection algorithms.
Validation on benchmark datasets would test if the bounds are tight enough for real-world use.

Load-bearing premise

The error bounds derived from conditional mutual information can be evaluated or approximated sufficiently well to serve as a practical stopping condition without requiring knowledge of the true data distribution.

What would settle it

A simulation or real dataset where the stopping condition is triggered but the actual ideal prediction error of the selected subset exceeds the specified threshold.

Figures

Figures reproduced from arXiv: 1907.07384 by Alberto Maria Metelli, Andrea Tirinzoni, Marcello Restelli, Mario Beraha, Matteo Papini.

**Figure 1.** Figure 1: SVM test accuracy for different choices of the number of features that generate the problem and different stopping criteria. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Classification accuracy and fraction of selected features as a function [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables. However, existing algorithms are mostly heuristic and do not offer any guarantee on the proposed solution. In this paper, we provide novel theoretical results showing that conditional mutual information naturally arises when bounding the ideal regression/classification errors achieved by different subsets of features. Leveraging on these insights, we propose a novel stopping condition for backward and forward greedy methods which ensures that the ideal prediction error using the selected feature subset remains bounded by a user-specified threshold. We provide numerical simulations to support our theoretical claims and compare to common heuristic methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows conditional mutual information appearing in bounds on ideal prediction error for feature subsets and turns that into a stopping rule for greedy selection.

read the letter

The core claim is that conditional mutual information shows up naturally when you bound the Bayes error achievable with different feature subsets, and the authors use this to define a stopping condition for forward and backward greedy algorithms that keeps the error below a chosen threshold. That is the main new piece relative to the heuristic MI methods they contrast against. They also run some simulations to check the idea. What the work does reasonably well is take a common practical trick and try to give it an error guarantee instead of leaving it as an ad-hoc relevance-minus-redundancy score. The framing is straightforward and stays within standard information-theoretic identities, so there is no obvious circularity or invented quantity. The soft spots are straightforward. Only the abstract is available here, so the actual derivations, tightness of the bounds, and how the stopping condition is implemented in practice cannot be checked. The simulations are mentioned but give no detail on datasets, effect sizes, or comparison strength, which leaves the empirical support unverified. The practical usefulness also rests on being able to approximate the relevant conditional mutual information terms well enough without the true distribution, and that step is not spelled out. This is the kind of paper that would interest people working on information-theoretic feature selection who want something more than pure heuristics. A reader already familiar with MI bounds for Bayes risk would not be surprised by the appearance of conditional MI, but the application to a concrete stopping rule is still worth seeing in full. It is coherent on its own terms and shows honest engagement with the literature it cites, so it deserves a serious referee even if the proofs turn out to need tightening.

Referee Report

2 major / 3 minor

Summary. The paper derives theoretical bounds on the ideal (Bayes) regression and classification error achieved by a feature subset, showing that conditional mutual information arises naturally in these bounds from standard information-theoretic identities. It then introduces a stopping criterion for forward and backward greedy feature-selection algorithms that guarantees the selected subset keeps the ideal error below a user-specified threshold, and supports the claims with numerical simulations comparing the new criterion to common heuristic MI-based methods.

Significance. If the derived bounds are non-vacuous and the stopping condition can be approximated from data without knowledge of the true distribution, the work would supply the first explicit performance guarantees for a widely used class of filter methods, moving MI-based selection from purely heuristic to theoretically grounded. The simulations provide initial empirical support, but the practical utility hinges on bound tightness and computability.

major comments (2)

[§4, Eq. (12)] §4, Eq. (12) and the stopping condition (13): the bound on ideal error is expressed in terms of conditional MI quantities that are not directly observable; the manuscript does not provide a concrete estimator or concentration result showing that the stopping threshold can be reliably checked from finite samples without additional assumptions on the joint distribution.
[Theorem 2] Theorem 2 (forward greedy case): the proof that the stopping condition preserves the error bound relies on the submodularity or monotonicity properties of conditional MI; it is unclear whether these properties hold for the specific error functional used or only under the additional modeling assumptions stated in §3.1.

minor comments (3)

[§2] Notation for conditional MI I(X;Y|Z) is introduced without an explicit reminder of the chain-rule identity used in the derivation; adding one sentence in §2 would improve readability.
[Figure 2] Figure 2 caption does not state the number of Monte-Carlo repetitions or the precise definition of 'ideal error' plotted; this makes direct comparison to the theoretical bound difficult.
[§1.2] The related-work section omits the recent information-theoretic analyses of greedy selection by [specific papers on submodular MI bounds]; a brief comparison would strengthen the novelty claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and the positive recommendation for minor revision. We address each major comment below.

read point-by-point responses

Referee: §4, Eq. (12) and the stopping condition (13): the bound on ideal error is expressed in terms of conditional MI quantities that are not directly observable; the manuscript does not provide a concrete estimator or concentration result showing that the stopping threshold can be reliably checked from finite samples without additional assumptions on the joint distribution.

Authors: The manuscript focuses on deriving theoretical bounds using population quantities of conditional mutual information. We do not provide estimators or concentration results because the goal is to establish the connection between conditional MI and ideal prediction errors, and to propose the stopping condition in terms of these quantities. In practice, one can use standard nonparametric estimators for MI from data, as done in our simulations. Adding finite-sample guarantees would require stronger assumptions and is beyond the current scope, though we agree it is an important practical consideration. revision: no
Referee: Theorem 2 (forward greedy case): the proof that the stopping condition preserves the error bound relies on the submodularity or monotonicity properties of conditional MI; it is unclear whether these properties hold for the specific error functional used or only under the additional modeling assumptions stated in §3.1.

Authors: The proof of Theorem 2 applies the known monotonicity and submodularity properties of conditional mutual information, which hold under the general setting. These properties are used to show that the stopping condition ensures the bound on the error functional remains satisfied. The error bounds in §4 are derived directly from information-theoretic identities that hold under the assumptions of §3.1, so the properties carry over. We will revise the manuscript to make this connection more explicit by adding a brief discussion of the relevant properties of conditional MI. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The abstract frames the core contribution as deriving that conditional mutual information arises naturally when bounding ideal regression/classification errors from standard information-theoretic identities. No equations, self-definitions, or fitted-parameter-as-prediction steps are visible. The proposed stopping condition for greedy selection follows directly from these bounds rather than reducing to an input by construction. No self-citation load-bearing or ansatz smuggling is indicated. The derivation is self-contained against external information-theoretic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract only; no explicit free parameters, new axioms, or invented entities are described. The work rests on standard definitions of mutual information and conditional mutual information.

axioms (1)

standard math Standard definitions and chain-rule properties of mutual information and conditional mutual information from information theory.
Invoked to derive the error bounds mentioned in the abstract.

pith-pipeline@v0.9.0 · 5649 in / 1078 out tokens · 31458 ms · 2026-05-24T20:31:31.097191+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Predicting Fetal Birthweight from High Dimensional Data using Advanced Machine Learning
cs.LG 2025-02 unverdicted novelty 3.0

Machine learning pipeline with MICE imputation, tree-based feature selection, and ensemble models predicts birth weight, claiming improved performance on constrained clinical datasets.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper

[1]

A survey on feature selection methods,

G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers & Electrical Engineering , vol. 40, no. 1, pp. 16–28, 2014

work page 2014
[2]

Wrappers for feature subset selection,

R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artiﬁcial intelligence, vol. 97, no. 1-2, pp. 273–324, 1997

work page 1997
[3]

Embedded methods,

T. N. Lal, O. Chapelle, J. Weston, and A. Elisseeff, “Embedded methods,” in Feature extraction. Springer, 2006, pp. 137–165

work page 2006
[4]

Feature selection for svms,

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V . Vap- nik, “Feature selection for svms,” in Advances in neural information processing systems, 2001, pp. 668–674

work page 2001
[5]

Feature selection and ranking ﬁlters,

W. Duch, T. Winiarski, J. Biesiada, and A. Kachel, “Feature selection and ranking ﬁlters,” in International conference on artiﬁcial neural networks (ICANN) and International conference on neural information processing (ICONIP), vol. 251. Citeseer, 2003, p. 254

work page 2003
[6]

Filter methods,

W. Duch, “Filter methods,” in Feature Extraction. Springer, 2006, pp. 89–117

work page 2006
[7]

A review of feature selection methods based on mutual information,

J. R. Vergara and P. A. Est ´evez, “A review of feature selection methods based on mutual information,” Neural computing and applications , vol. 24, no. 1, pp. 175–186, 2014

work page 2014
[8]

T. M. Cover and J. A. Thomas, Elements of information theory . John Wiley & Sons, 2012

work page 2012
[9]

Feature selection and feature extraction for text categoriza- tion,

D. D. Lewis, “Feature selection and feature extraction for text categoriza- tion,” in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 212–217

work page 1992
[10]

Using mutual information for selecting features in supervised neural net learning,

R. Battiti, “Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on neural networks, vol. 5, no. 4, pp. 537–550, 1994

work page 1994
[11]

Data visualization and feature selection: New algorithms for nongaussian data,

H. H. Yang and J. Moody, “Data visualization and feature selection: New algorithms for nongaussian data,” in Advances in Neural Information Processing Systems, 2000, pp. 687–693

work page 2000
[12]

Fast binary feature selection with conditional mutual infor- mation,

F. Fleuret, “Fast binary feature selection with conditional mutual infor- mation,” Journal of Machine Learning Research , vol. 5, no. Nov, pp. 1531–1555, 2004

work page 2004
[13]

Conditional infomax learning: an integrated framework for feature extraction and fusion,

D. Lin and X. Tang, “Conditional infomax learning: an integrated framework for feature extraction and fusion,” in European Conference on Computer Vision . Springer, 2006, pp. 68–82

work page 2006
[14]

Conditional mutual information-based feature selection analyzing for synergy and redun- dancy,

g. Cheng, Z. Qin, C. Feng, Y . Wang, and F. Li, “Conditional mutual information-based feature selection analyzing for synergy and redun- dancy,” Etri Journal, vol. 33, no. 2, pp. 210–218, 2011

work page 2011
[15]

Conditional likelihood maximisation: a unifying framework for information theoretic feature selection,

G. Brown, A. Pocock, M.-J. Zhao, and M. Luj ´an, “Conditional likelihood maximisation: a unifying framework for information theoretic feature selection,” Journal of machine learning research , vol. 13, no. Jan, pp. 27–66, 2012

work page 2012
[16]

Algorithms for large scale markov blanket discovery

I. Tsamardinos, C. F. Aliferis, A. R. Statnikov, and E. Statnikov, “Algorithms for large scale markov blanket discovery.” in FLAIRS conference, vol. 2, 2003, pp. 376–380

work page 2003
[17]

Information and information stability of random vari- ables and processes,

M. S. Pinsker, “Information and information stability of random vari- ables and processes,” 1960

work page 1960
[18]

Information-type measures of difference of probability dis- tributions and indirect observation,

I. Csisz ´ar, “Information-type measures of difference of probability dis- tributions and indirect observation,” studia scientiarum Mathematicarum Hungarica, vol. 2, pp. 229–318, 1967

work page 1967
[19]

A lower bound for discrimination information in terms of variation (corresp.),

S. Kullback, “A lower bound for discrimination information in terms of variation (corresp.),” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 126–127, 1967

work page 1967
[20]

Irrelevant features and the subset selection problem,

G. H. John, R. Kohavi, and K. Pﬂeger, “Irrelevant features and the subset selection problem,” in Machine Learning Proceedings 1994 . Elsevier, 1994, pp. 121–129

work page 1994
[21]

Estimation of entropy and mutual information,

L. Paninski, “Estimation of entropy and mutual information,” Neural computation, vol. 15, no. 6, pp. 1191–1253, 2003

work page 2003
[22]

Nearest neighbor estimate of conditional mutual information in feature selection,

A. Tsimpiris, I. Vlachos, and D. Kugiumtzis, “Nearest neighbor estimate of conditional mutual information in feature selection,” Expert Systems with Applications, vol. 39, no. 16, pp. 12 697–12 708, 2012

work page 2012
[23]

Estimating mutual information for discrete-continuous mixtures,

W. Gao, S. Kannan, S. Oh, and P. Viswanath, “Estimating mutual information for discrete-continuous mixtures,” in Advances in Neural Information Processing Systems , 2017, pp. 5986–5997

work page 2017
[24]

Kraskov, H

A. Kraskov, H. St ¨ogbauer, and P. Grassberger, “Estimating mutual information,” Phys. Rev. E , vol. 69, p. 066138, Jun 2004. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.69.066138

work page doi:10.1103/physreve.69.066138 2004
[25]

Functional properties of minimum mean-square error and mutual information,

Y . Wu and S. Verd ´u, “Functional properties of minimum mean-square error and mutual information,” IEEE Transactions on Information The- ory, vol. 58, no. 3, pp. 1289–1301, 2012

work page 2012
[26]

Feature selection with a linear dependence measure,

S. K. Das, “Feature selection with a linear dependence measure,” IEEE transactions on Computers , vol. 100, no. 9, pp. 1106–1109, 1971

work page 1971
[27]

Correlation-based feature selection for machine learning,

M. A. Hall, “Correlation-based feature selection for machine learning,” 1999

work page 1999
[28]

Feature selection for high-dimensional data: A fast correlation-based ﬁlter solution,

L. Yu and H. Liu, “Feature selection for high-dimensional data: A fast correlation-based ﬁlter solution,” in Proceedings of the 20th interna- tional conference on machine learning (ICML-03) , 2003, pp. 856–863

work page 2003
[29]

Feature selection for high-dimensional dataa pearson redundancy based ﬁlter,

J. Biesiada and W. Duch, “Feature selection for high-dimensional dataa pearson redundancy based ﬁlter,” in Computer recognition systems 2 . Springer, 2007, pp. 242–249

work page 2007
[30]

Lin- ear correlation-based feature selection for network intrusion detection model,

H. F. Eid, A. E. Hassanien, T.-h. Kim, and S. Banerjee, “Lin- ear correlation-based feature selection for network intrusion detection model,” in Advances in Security of Information and Communication Networks. Springer, 2013, pp. 240–248

work page 2013
[31]

Kernel feature selection via conditional covariance minimization,

J. Chen, M. Stern, M. J. Wainwright, and M. I. Jordan, “Kernel feature selection via conditional covariance minimization,” in Advances in Neural Information Processing Systems , 2017, pp. 6946–6955

work page 2017
[32]

UCI machine learning repository,

D. Dheeru and E. Karra Taniskidou, “UCI machine learning repository,”

work page
[33]

Available: http://archive.ics.uci.edu/ml

[Online]. Available: http://archive.ics.uci.edu/ml

work page

[1] [1]

A survey on feature selection methods,

G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers & Electrical Engineering , vol. 40, no. 1, pp. 16–28, 2014

work page 2014

[2] [2]

Wrappers for feature subset selection,

R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artiﬁcial intelligence, vol. 97, no. 1-2, pp. 273–324, 1997

work page 1997

[3] [3]

Embedded methods,

T. N. Lal, O. Chapelle, J. Weston, and A. Elisseeff, “Embedded methods,” in Feature extraction. Springer, 2006, pp. 137–165

work page 2006

[4] [4]

Feature selection for svms,

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V . Vap- nik, “Feature selection for svms,” in Advances in neural information processing systems, 2001, pp. 668–674

work page 2001

[5] [5]

Feature selection and ranking ﬁlters,

W. Duch, T. Winiarski, J. Biesiada, and A. Kachel, “Feature selection and ranking ﬁlters,” in International conference on artiﬁcial neural networks (ICANN) and International conference on neural information processing (ICONIP), vol. 251. Citeseer, 2003, p. 254

work page 2003

[6] [6]

Filter methods,

W. Duch, “Filter methods,” in Feature Extraction. Springer, 2006, pp. 89–117

work page 2006

[7] [7]

A review of feature selection methods based on mutual information,

J. R. Vergara and P. A. Est ´evez, “A review of feature selection methods based on mutual information,” Neural computing and applications , vol. 24, no. 1, pp. 175–186, 2014

work page 2014

[8] [8]

T. M. Cover and J. A. Thomas, Elements of information theory . John Wiley & Sons, 2012

work page 2012

[9] [9]

Feature selection and feature extraction for text categoriza- tion,

D. D. Lewis, “Feature selection and feature extraction for text categoriza- tion,” in Proceedings of the workshop on Speech and Natural Language. Association for Computational Linguistics, 1992, pp. 212–217

work page 1992

[10] [10]

Using mutual information for selecting features in supervised neural net learning,

R. Battiti, “Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on neural networks, vol. 5, no. 4, pp. 537–550, 1994

work page 1994

[11] [11]

Data visualization and feature selection: New algorithms for nongaussian data,

H. H. Yang and J. Moody, “Data visualization and feature selection: New algorithms for nongaussian data,” in Advances in Neural Information Processing Systems, 2000, pp. 687–693

work page 2000

[12] [12]

Fast binary feature selection with conditional mutual infor- mation,

F. Fleuret, “Fast binary feature selection with conditional mutual infor- mation,” Journal of Machine Learning Research , vol. 5, no. Nov, pp. 1531–1555, 2004

work page 2004

[13] [13]

Conditional infomax learning: an integrated framework for feature extraction and fusion,

D. Lin and X. Tang, “Conditional infomax learning: an integrated framework for feature extraction and fusion,” in European Conference on Computer Vision . Springer, 2006, pp. 68–82

work page 2006

[14] [14]

Conditional mutual information-based feature selection analyzing for synergy and redun- dancy,

g. Cheng, Z. Qin, C. Feng, Y . Wang, and F. Li, “Conditional mutual information-based feature selection analyzing for synergy and redun- dancy,” Etri Journal, vol. 33, no. 2, pp. 210–218, 2011

work page 2011

[15] [15]

Conditional likelihood maximisation: a unifying framework for information theoretic feature selection,

G. Brown, A. Pocock, M.-J. Zhao, and M. Luj ´an, “Conditional likelihood maximisation: a unifying framework for information theoretic feature selection,” Journal of machine learning research , vol. 13, no. Jan, pp. 27–66, 2012

work page 2012

[16] [16]

Algorithms for large scale markov blanket discovery

I. Tsamardinos, C. F. Aliferis, A. R. Statnikov, and E. Statnikov, “Algorithms for large scale markov blanket discovery.” in FLAIRS conference, vol. 2, 2003, pp. 376–380

work page 2003

[17] [17]

Information and information stability of random vari- ables and processes,

M. S. Pinsker, “Information and information stability of random vari- ables and processes,” 1960

work page 1960

[18] [18]

Information-type measures of difference of probability dis- tributions and indirect observation,

I. Csisz ´ar, “Information-type measures of difference of probability dis- tributions and indirect observation,” studia scientiarum Mathematicarum Hungarica, vol. 2, pp. 229–318, 1967

work page 1967

[19] [19]

A lower bound for discrimination information in terms of variation (corresp.),

S. Kullback, “A lower bound for discrimination information in terms of variation (corresp.),” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 126–127, 1967

work page 1967

[20] [20]

Irrelevant features and the subset selection problem,

G. H. John, R. Kohavi, and K. Pﬂeger, “Irrelevant features and the subset selection problem,” in Machine Learning Proceedings 1994 . Elsevier, 1994, pp. 121–129

work page 1994

[21] [21]

Estimation of entropy and mutual information,

L. Paninski, “Estimation of entropy and mutual information,” Neural computation, vol. 15, no. 6, pp. 1191–1253, 2003

work page 2003

[22] [22]

Nearest neighbor estimate of conditional mutual information in feature selection,

A. Tsimpiris, I. Vlachos, and D. Kugiumtzis, “Nearest neighbor estimate of conditional mutual information in feature selection,” Expert Systems with Applications, vol. 39, no. 16, pp. 12 697–12 708, 2012

work page 2012

[23] [23]

Estimating mutual information for discrete-continuous mixtures,

W. Gao, S. Kannan, S. Oh, and P. Viswanath, “Estimating mutual information for discrete-continuous mixtures,” in Advances in Neural Information Processing Systems , 2017, pp. 5986–5997

work page 2017

[24] [24]

Kraskov, H

A. Kraskov, H. St ¨ogbauer, and P. Grassberger, “Estimating mutual information,” Phys. Rev. E , vol. 69, p. 066138, Jun 2004. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.69.066138

work page doi:10.1103/physreve.69.066138 2004

[25] [25]

Functional properties of minimum mean-square error and mutual information,

Y . Wu and S. Verd ´u, “Functional properties of minimum mean-square error and mutual information,” IEEE Transactions on Information The- ory, vol. 58, no. 3, pp. 1289–1301, 2012

work page 2012

[26] [26]

Feature selection with a linear dependence measure,

S. K. Das, “Feature selection with a linear dependence measure,” IEEE transactions on Computers , vol. 100, no. 9, pp. 1106–1109, 1971

work page 1971

[27] [27]

Correlation-based feature selection for machine learning,

M. A. Hall, “Correlation-based feature selection for machine learning,” 1999

work page 1999

[28] [28]

Feature selection for high-dimensional data: A fast correlation-based ﬁlter solution,

L. Yu and H. Liu, “Feature selection for high-dimensional data: A fast correlation-based ﬁlter solution,” in Proceedings of the 20th interna- tional conference on machine learning (ICML-03) , 2003, pp. 856–863

work page 2003

[29] [29]

Feature selection for high-dimensional dataa pearson redundancy based ﬁlter,

J. Biesiada and W. Duch, “Feature selection for high-dimensional dataa pearson redundancy based ﬁlter,” in Computer recognition systems 2 . Springer, 2007, pp. 242–249

work page 2007

[30] [30]

Lin- ear correlation-based feature selection for network intrusion detection model,

H. F. Eid, A. E. Hassanien, T.-h. Kim, and S. Banerjee, “Lin- ear correlation-based feature selection for network intrusion detection model,” in Advances in Security of Information and Communication Networks. Springer, 2013, pp. 240–248

work page 2013

[31] [31]

Kernel feature selection via conditional covariance minimization,

J. Chen, M. Stern, M. J. Wainwright, and M. I. Jordan, “Kernel feature selection via conditional covariance minimization,” in Advances in Neural Information Processing Systems , 2017, pp. 6946–6955

work page 2017

[32] [32]

UCI machine learning repository,

D. Dheeru and E. Karra Taniskidou, “UCI machine learning repository,”

work page

[33] [33]

Available: http://archive.ics.uci.edu/ml

[Online]. Available: http://archive.ics.uci.edu/ml

work page