pith. sign in

hub Mixed citations

An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211

Mixed citation behavior. Most common role is background (60%).

28 Pith papers citing it
Background 60% of classified citations
abstract

Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting. We find that it is always best to train using the dropout algorithm--the dropout algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes. We find that different tasks and relationships between tasks result in very different rankings of activation function performance. This suggests the choice of activation function should always be cross-validated.

hub tools

citation-role summary

background 4 other 1

citation-polarity summary

representative citing papers

NetTailor: Tuning the Architecture, Not Just the Weights

cs.CV · 2019-06-29 · unverdicted · novelty 7.0

NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for simpler tasks.

Debiasing LLMs by Fine-tuning

q-fin.GN · 2026-04-03 · unverdicted · novelty 6.0

Supervised fine-tuning with LoRA on rational benchmark forecasts corrects extrapolation bias out-of-sample in LLM predictions for controlled experiments and cross-sectional stock returns.

On the Stability of Growth in Structural Plasticity

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Newborn units in growing neural networks are forward-active but backward-starved, receiving weaker gradients than existing units and creating integration challenges that make growth less reliable than pruning in complex tasks.

Adaptive Compression-based Lifelong Learning

cs.CV · 2019-07-23 · unverdicted · novelty 5.0

Bayesian optimization enables adaptive network pruning rates in lifelong learning, performing heavier pruning on small/simple tasks and milder on large/complex ones.

Online Generalised Predictive Coding

stat.ML · 2026-05-04 · unverdicted · novelty 5.0

Online generalised predictive coding (ODEM) tracks latent states in nonlinear and chaotic generative models by separating temporal scales for fast Bayesian belief updating and slow parameter learning.

citing papers explorer

Showing 28 of 28 citing papers.