pith. machine review for the scientific record. sign in

arxiv: 1805.12076 · v1 · submitted 2018-05-30 · 💻 cs.LG · stat.ML

Recognition: unknown

Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords networkscomplexitygeneralizationneuralboundover-parametrizationcapacitylower
0
0 comments X
read the original abstract

Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization. In this work we suggest a novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound for two layer ReLU networks. Our capacity bound correlates with the behavior of test error with increasing network sizes, and could potentially explain the improvement in generalization with over-parametrization. We further present a matching lower bound for the Rademacher complexity that improves over previous capacity lower bounds for neural networks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    cs.LG 2024-01 unverdicted novelty 6.0

    SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...

  2. Artificial Jagged Intelligence as Uneven Optimization Energy Allocation Capability Concentration, Redistribution, and Optimization Governance

    cs.AI 2026-05 unverdicted novelty 4.0

    AJI frames jagged AI capabilities as lower bounds on performance dispersion arising from concentrated optimization energy allocation under anisotropic objectives, with theorems on tradeoffs and redistribution interventions.