Continuously Differentiable Exponential Linear Units

Jonathan T. Barron

arxiv: 1704.07483 · v1 · pith:ZKUCEBT4new · submitted 2017-04-24 · 💻 cs.LG

Continuously Differentiable Exponential Linear Units

Jonathan T. Barron This is my paper

classification 💻 cs.LG

keywords alphalinearparametrizationrespectalternativecontinuouslydifferentiableeasier

0 comments

read the original abstract

Exponential Linear Units (ELUs) are a useful rectifier for constructing deep learning architectures, as they may speed up and otherwise improve learning by virtue of not have vanishing gradients and by having mean activations near zero. However, the ELU activation as parametrized in [1] is not continuously differentiable with respect to its input when the shape parameter alpha is not equal to 1. We present an alternative parametrization which is C1 continuous for all values of alpha, making the rectifier easier to reason about and making alpha easier to tune. This alternative parametrization has several other useful properties that the original parametrization of ELU does not: 1) its derivative with respect to x is bounded, 2) it contains both the linear transfer function and ReLU as special cases, and 3) it is scale-similar with respect to alpha.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling
stat.ML 2025-08 unverdicted novelty 7.0

The authors derive drift and diffusion constraints plus a parameterization that forces neural SDE solutions to remain inside compact polyhedral domains, yielding better forecasts on real EMA suicide-risk datasets than...
Universal Smoothness via Bernstein Polynomials: A Constructive Approximation Approach for Activation Functions
cs.AI 2026-05 unverdicted novelty 5.0

BerLU constructs a C1-differentiable activation with Lipschitz constant 1 via Bernstein polynomial approximation, showing better performance and efficiency than baselines on image classification with ViTs and CNNs.
Activation Function Design Sustains Plasticity in Continual Learning
cs.LG 2025-09 unverdicted novelty 5.0

Smooth-Leaky and Randomized Smooth-Leaky activations mitigate loss of plasticity in continual learning by targeting negative-branch shape and saturation behavior.