pith. sign in

arxiv: 1703.02682 · v1 · pith:SUMPN44Onew · submitted 2017-03-08 · 📊 stat.ML · cs.IT· cs.LG· math.IT

Sparse Quadratic Logistic Regression in Sub-quadratic Time

classification 📊 stat.ML cs.ITcs.LGmath.IT
keywords quadraticregressiontermscaselogisticsupportconsidercorrelation
0
0 comments X p. Extension
pith:SUMPN44O Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{SUMPN44O}

Prints a linked pith:SUMPN44O badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

We consider support recovery in the quadratic logistic regression setting - where the target depends on both p linear terms $x_i$ and up to $p^2$ quadratic terms $x_i x_j$. Quadratic terms enable prediction/modeling of higher-order effects between features and the target, but when incorporated naively may involve solving a very large regression problem. We consider the sparse case, where at most $s$ terms (linear or quadratic) are non-zero, and provide a new faster algorithm. It involves (a) identifying the weak support (i.e. all relevant variables) and (b) standard logistic regression optimization only on these chosen variables. The first step relies on a novel insight about correlation tests in the presence of non-linearity, and takes $O(pn)$ time for $n$ samples - giving potentially huge computational gains over the naive approach. Motivated by insights from the boolean case, we propose a non-linear correlation test for non-binary finite support case that involves hashing a variable and then correlating with the output variable. We also provide experimental results to demonstrate the effectiveness of our methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.