A Closer Look at Memorization in Deep Networks

Aaron Courville; Asja Fischer; David Krueger; Devansh Arpit; Emmanuel Bengio; Maxinder S. Kanwal; Nicolas Ballas; Simon Lacoste-Julien; Stanis{\l}aw Jastrz\k{e}bski; Tegan Maharaj

arxiv: 1706.05394 · v2 · pith:SHQ7OVK4new · submitted 2017-06-16 · 📊 stat.ML · cs.LG

A Closer Look at Memorization in Deep Networks

Devansh Arpit , Stanis{\l}aw Jastrz\k{e}bski , Nicolas Ballas , David Krueger , Emmanuel Bengio , Maxinder S. Kanwal , Tegan Maharaj , Asja Fischer

show 3 more authors

Aaron Courville Yoshua Bengio Simon Lacoste-Julien

This is my paper

classification 📊 stat.ML cs.LG

keywords deepdatanetworksgeneralizationmemorizationnoisecapacitylearning

0 comments

read the original abstract

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An overview of condensation phenomenon in deep learning
cs.LG 2025-04 unverdicted novelty 2.0

Neural networks exhibit condensation of neurons into clusters with similar outputs whose number increases monotonically during training, facilitated by small initializations or dropout, providing insights into general...