Representation Benefits of Deep Feedforward Networks

Matus Telgarsky

Representation Benefits of Deep Feedforward Networks

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1509.08101 v2 pith:A33SDCWM submitted 2015-09-27 cs.LG cs.NE

Representation Benefits of Deep Feedforward Networks

Matus Telgarsky This is my paper

classification cs.LG cs.NE

keywords networksnodesdeeperrorfeedforwardnetworkachievesbenefits

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

This note provides a family of classification problems, indexed by a positive integer $k$, where all shallow networks with fewer than exponentially (in $k$) many nodes exhibit error at least $1/6$, whereas a deep network with 2 nodes in each of $2k$ layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated $k$ times. The proof is elementary, and the networks are standard feedforward networks with ReLU (Rectified Linear Unit) nonlinearities.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Algorithmic Foundations of Deep Learning: Complexity-Theoretic Rates and a Characterization of Universal Approximation
cs.LG 2026-06 unverdicted novelty 8.0

Neural networks emulate real-valued circuits with explicit complexity bounds controlled by gate count and structure; any definable model with a parallelization condition is a universal approximator precisely when it c...
ReLU Networks for Exact Generation of Similar Graphs
cs.LG 2026-04 unverdicted novelty 7.0

Constant-depth ReLU networks of size O(n²d) exist that deterministically generate graphs within edit distance d from any given n-vertex input graph.
Local large deviations for linear-region growth in random piecewise-linear networks
math.PR 2026-07 accept novelty 6.0

Submultiplicative pressure yields exponential large-deviation upper bounds for linear-region counts in random tent-map compositions, with certified bridge constructions giving matching lower bounds near the determinis...
A Theory on Flow Matching with Neural Networks
cs.LG 2026-06 unverdicted novelty 6.0

Establishes convergence guarantees for overparameterized 2-layer ReLU networks in flow matching, generalization bounds for the velocity-field objective, and Wasserstein guarantees for generated samples, using multi-ta...
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
cs.LG 2024-01 unverdicted novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...
Approximation Theory for Neural Networks: Old and New
cs.LG 2026-05 unverdicted novelty 2.0

A survey summarizing classical density results and quantitative approximation theory for feedforward networks and KANs, with emphasis on depth advantages.