pith. sign in

arxiv: 1711.00501 · v2 · pith:R3OOL5VTnew · submitted 2017-11-01 · 💻 cs.LG · cs.DS· math.OC· stat.ML

Learning One-hidden-layer Neural Networks with Landscape Design

classification 💻 cs.LG cs.DSmath.OCstat.ML
keywords globalmathbbminimadesignformulagradientlandscapelearning
0
0 comments X
read the original abstract

We consider the problem of learning a one-hidden-layer neural network: we assume the input $x\in \mathbb{R}^d$ is from Gaussian distribution and the label $y = a^\top \sigma(Bx) + \xi$, where $a$ is a nonnegative vector in $\mathbb{R}^m$ with $m\le d$, $B\in \mathbb{R}^{m\times d}$ is a full-rank weight matrix, and $\xi$ is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function $G(\cdot)$ whose landscape is guaranteed to have the following properties: 1. All local minima of $G$ are also global minima. 2. All global minima of $G$ correspond to the ground truth parameters. 3. The value and gradient of $G$ can be estimated using samples. With these properties, stochastic gradient descent on $G$ provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity result and validate the results by simulations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Limitations of Lazy Training of Two-layers Neural Networks

    stat.ML 2019-06 unverdicted novelty 8.0

    For quadratic targets in d dimensions, two-layer quadratic networks achieve lower risk when fully trained than in random features or neural tangent regimes if hidden units < d.

  2. Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    cs.LG 2024-01 unverdicted novelty 6.0

    SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...