pith. machine review for the scientific record. sign in

arxiv: 1801.09403 · v3 · submitted 2018-01-29 · 💻 cs.LG

Recognition: unknown

Learning Combinations of Activation Functions

Authors on Pith no claims yet
classification 💻 cs.LG
keywords activationfunctionsalexnetapproachesarchitecturescombinationsilsvrc-2012novel
0
0 comments X
read the original abstract

In the last decade, an active area of research has been devoted to design novel activation functions that are able to help deep neural networks to converge, obtaining better performance. The training procedure of these architectures usually involves optimization of the weights of their layers only, while non-linearities are generally pre-specified and their (possible) parameters are usually considered as hyper-parameters to be tuned manually. In this paper, we introduce two approaches to automatically learn different combinations of base activation functions (such as the identity function, ReLU, and tanh) during the training phase. We present a thorough comparison of our novel approaches with well-known architectures (such as LeNet-5, AlexNet, and ResNet-56) on three standard datasets (Fashion-MNIST, CIFAR-10, and ILSVRC-2012), showing substantial improvements in the overall performance, such as an increase in the top-1 accuracy for AlexNet on ILSVRC-2012 of 3.01 percentage points.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Competing nonlinearities, criticality, and order-to-chaos transition in deep networks

    cond-mat.dis-nn 2026-05 unverdicted novelty 6.0

    A statistical mixture of Tanh and Swish activations with critical mixing fraction p_c induces a continuous phase transition to scale-invariant signal propagation in deep networks while preserving smoothness.