Learning Activation Functions to Improve Deep Neural Networks

Forest Agostinelli , Matthew Hoffman , Peter Sadowski , Pierre Baldi

Authors on Pith no claims yet

classification 💻 cs.NE cs.CVcs.LGstat.ML

keywords activationfunctionneuraldeepimprovelinearnetworksneuron

read the original abstract

Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Searching for Activation Functions
cs.NE 2017-10 conditional novelty 7.0

Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.
Competing nonlinearities, criticality, and order-to-chaos transition in deep networks
cond-mat.dis-nn 2026-05 unverdicted novelty 6.0

A statistical mixture of Tanh and Swish activations with critical mixing fraction p_c induces a continuous phase transition to scale-invariant signal propagation in deep networks while preserving smoothness.