pith. machine review for the scientific record. sign in

arxiv: 1412.6544 · v6 · submitted 2014-12-19 · 💻 cs.NE · cs.LG· stat.ML

Recognition: unknown

Qualitatively characterizing neural network optimization problems

Authors on Pith no claims yet
classification 💻 cs.NE cs.LGstat.ML
keywords networksneuraloptimizationtraininglocalobstaclesproblemsvariety
0
0 comments X
read the original abstract

Training neural networks involves solving large-scale non-convex optimization problems. This task has long been believed to be extremely difficult, with fear of local minima and other obstacles motivating a variety of schemes to improve optimization, such as unsupervised pretraining. However, modern neural networks are able to achieve negligible training error on complex tasks, using only direct training with stochastic gradient descent. We introduce a simple analysis technique to look for evidence that such networks are overcoming local optima. We find that, in fact, on a straight path from initialization to solution, a variety of state of the art neural networks never encounter any significant obstacles.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. In-context Learning and Induction Heads

    cs.LG 2022-09 unverdicted novelty 7.0

    Induction heads, which implement pattern completion in attention, develop at the same training stage as a sudden rise in in-context learning, providing evidence they are the primary mechanism for in-context learning i...

  2. From Attribution to Action: A Human-Centered Application of Activation Steering

    cs.AI 2026-04 unverdicted novelty 6.0

    Activation steering paired with attribution enables intervention-based debugging in vision models, as all 8 interviewed experts shifted to hypothesis testing, most trusted observed responses, and highlighted risks lik...

  3. Language Models (Mostly) Know What They Know

    cs.CL 2022-07 unverdicted novelty 6.0

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  4. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  5. Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks

    cs.LG 2026-04 unverdicted novelty 5.0

    A closed-form upper bound on the maximum Hessian eigenvalue of cross-entropy loss is derived for smooth nonlinear neural networks.

  6. The Platonic Representation Hypothesis

    cs.LG 2024-05 unverdicted novelty 5.0

    Representations learned by large AI models are converging toward a shared statistical model of reality.