pith. sign in

arxiv: 2006.15812 · v2 · pith:EL52NBQPnew · submitted 2020-06-29 · 💻 cs.LG · cs.DS· stat.ML

Statistical-Query Lower Bounds via Functional Gradients

classification 💻 cs.LG cs.DSstat.ML
keywords learningboundsepsilonlowerrelustatistical-queryagnosticallyboosting
0
0 comments X
read the original abstract

We give the first statistical-query lower bounds for agnostically learning any non-polynomial activation with respect to Gaussian marginals (e.g., ReLU, sigmoid, sign). For the specific problem of ReLU regression (equivalently, agnostically learning a ReLU), we show that any statistical-query algorithm with tolerance $n^{-(1/\epsilon)^b}$ must use at least $2^{n^c} \epsilon$ queries for some constant $b, c > 0$, where $n$ is the dimension and $\epsilon$ is the accuracy parameter. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems. Our techniques involve a gradient boosting procedure for "amplifying" recent lower bounds due to Diakonikolas et al. (COLT 2020) and Goel et al. (ICML 2020) on the SQ dimension of functions computed by two-layer neural networks. The crucial new ingredient is the use of a nonstandard convex functional during the boosting procedure. This also yields a best-possible reduction between two commonly studied models of learning: agnostic learning and probabilistic concepts.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

    stat.ML 2026-05 unverdicted novelty 7.0

    ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is min...