pith. machine review for the scientific record. sign in

arxiv: 1411.2331 · v1 · submitted 2014-11-10 · 📊 stat.ML · cs.LG

Recognition: unknown

N³LARS: Minimum Redundancy Maximum Relevance Feature Selection for Large and High-dimensional Data

Authors on Pith no claims yet
classification 📊 stat.ML cs.LG
keywords high-dimensionallarslargemethoddatafeatureselectionfeatures
0
0 comments X
read the original abstract

We propose a feature selection method that finds non-redundant features from a large and high-dimensional data in nonlinear way. Specifically, we propose a nonlinear extension of the non-negative least-angle regression (LARS) called N${}^3$LARS, where the similarity between input and output is measured through the normalized version of the Hilbert-Schmidt Independence Criterion (HSIC). An advantage of N${}^3$LARS is that it can easily incorporate with map-reduce frameworks such as Hadoop and Spark. Thus, with the help of distributed computing, a set of features can be efficiently selected from a large and high-dimensional data. Moreover, N${}^3$LARS is a convex method and can find a global optimum solution. The effectiveness of the proposed method is first demonstrated through feature selection experiments for classification and regression with small and high-dimensional datasets. Finally, we evaluate our proposed method over a large and high-dimensional biology dataset.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.