Learning activation functions to improve deep neural networks

Forest Agostinelli, Matthew Hoffman, Peter Sadowski, Pierre Baldi · 2014 · cs.NE · arXiv 1412.6830

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.

representative citing papers

Searching for Activation Functions

cs.NE · 2017-10-16 · conditional · novelty 7.0

Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.

Competing nonlinearities, criticality, and order-to-chaos transition in deep networks

cond-mat.dis-nn · 2026-05-06 · unverdicted · novelty 6.0

A statistical mixture of Tanh and Swish activations with critical mixing fraction p_c induces a continuous phase transition to scale-invariant signal propagation in deep networks while preserving smoothness.

Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning

cs.LG · 2019-07-16 · unverdicted · novelty 5.0

Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.

citing papers explorer

Showing 3 of 3 citing papers.

Searching for Activation Functions cs.NE · 2017-10-16 · conditional · none · ref 1
Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.
Competing nonlinearities, criticality, and order-to-chaos transition in deep networks cond-mat.dis-nn · 2026-05-06 · unverdicted · none · ref 43
A statistical mixture of Tanh and Swish activations with critical mixing fraction p_c induces a continuous phase transition to scale-invariant signal propagation in deep networks while preserving smoothness.
Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning cs.LG · 2019-07-16 · unverdicted · none · ref 1 · internal anchor
Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.

Learning activation functions to improve deep neural networks

fields

years

verdicts

representative citing papers

citing papers explorer