Feature selection in high-dimensional dataset using MapReduce

Claudio Reggiani; Gianluca Bontempi; Yann-A\"el Le Borgne

arxiv: 1709.02327 · v1 · pith:52USBR36new · submitted 2017-09-07 · 💻 cs.DC · cs.LG· stat.ML

Feature selection in high-dimensional dataset using MapReduce

Claudio Reggiani , Yann-A\"el Le Borgne , Gianluca Bontempi This is my paper

classification 💻 cs.DC cs.LGstat.ML

keywords datasetsfeatureimplementationmapreduceselectionalgorithmapproachbioinformatics

0 comments

read the original abstract

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.

This paper has not been read by Pith yet.

Feature selection in high-dimensional dataset using MapReduce

discussion (0)