Empirical Evaluation of Rectified Activations in Convolutional Network

Bing Xu , Naiyan Wang , Tianqi Chen , Mu Li

Authors on Pith no claims yet

classification 💻 cs.LG cs.CVstat.ML

keywords rectifiedlinearactivationleakynegativereluunitconvolutional

read the original abstract

In this paper we investigate the performance of different types of rectified activation functions in convolutional neural network: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), parametric rectified linear unit (PReLU) and a new randomized leaky rectified linear units (RReLU). We evaluate these activation function on standard image classification task. Our experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results. Thus our findings are negative on the common belief that sparsity is the key of good performance in ReLU. Moreover, on small scale dataset, using deterministic negative slope or learning it are both prone to overfitting. They are not as effective as using their randomized counterpart. By using RReLU, we achieved 75.68\% accuracy on CIFAR-100 test set without multiple test or ensemble.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
cs.LG 2015-11 accept novelty 8.0

DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions
stat.ML 2026-05 unverdicted novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is min...
Searching for Activation Functions
cs.NE 2017-10 conditional novelty 7.0

Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.
Materialistic RIR: Material Conditioned Realistic RIR Generation
cs.CV 2026-04 unverdicted novelty 6.0

A two-module neural model disentangles spatial layout from material properties to generate controllable and more realistic room impulse responses, reporting gains of up to 16% on acoustic metrics and 70% on material m...
Functional Similarity Metric for Neural Networks: Overcoming Parametric Ambiguity via Activation Region Analysis
cs.LG 2026-04 unverdicted novelty 6.0

A functional similarity metric for ReLU networks uses normalized activation region signatures and MinHash to overcome parametric symmetries like neuron permutation and scaling.
Sparsity Hurts: Simple Linear Adapter Can Boost Generalized Category Discovery
cs.CV 2026-05 unverdicted novelty 5.0

LAGCD inserts residual linear adapters into each ViT block plus a distribution alignment loss to improve generalized category discovery by increasing model flexibility while reducing bias between seen and novel classes.