Recognition: unknown
Representation Benefits of Deep Feedforward Networks
read the original abstract
This note provides a family of classification problems, indexed by a positive integer $k$, where all shallow networks with fewer than exponentially (in $k$) many nodes exhibit error at least $1/6$, whereas a deep network with 2 nodes in each of $2k$ layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated $k$ times. The proof is elementary, and the networks are standard feedforward networks with ReLU (Rectified Linear Unit) nonlinearities.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
ReLU Networks for Exact Generation of Similar Graphs
Constant-depth ReLU networks of size O(n²d) exist that deterministically generate graphs within edit distance d from any given n-vertex input graph.
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.