Empirical Evaluation of Rectified Activations in Convolutional Network

Bing Xu; Mu Li; Naiyan Wang; Tianqi Chen

arxiv: 1505.00853 · v2 · pith:CNS54ZTLnew · submitted 2015-05-05 · 💻 cs.LG · cs.CV· stat.ML

Empirical Evaluation of Rectified Activations in Convolutional Network

Bing Xu , Naiyan Wang , Tianqi Chen , Mu Li This is my paper

classification 💻 cs.LG cs.CVstat.ML

keywords rectifiedlinearactivationleakynegativereluunitconvolutional

0 comments

read the original abstract

In this paper we investigate the performance of different types of rectified activation functions in convolutional neural network: standard rectified linear unit (ReLU), leaky rectified linear unit (Leaky ReLU), parametric rectified linear unit (PReLU) and a new randomized leaky rectified linear units (RReLU). We evaluate these activation function on standard image classification task. Our experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results. Thus our findings are negative on the common belief that sparsity is the key of good performance in ReLU. Moreover, on small scale dataset, using deterministic negative slope or learning it are both prone to overfitting. They are not as effective as using their randomized counterpart. By using RReLU, we achieved 75.68\% accuracy on CIFAR-100 test set without multiple test or ensemble.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
cs.LG 2015-11 accept novelty 8.0

DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions
stat.ML 2026-05 unverdicted novelty 7.0

ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is min...
Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models
cs.LG 2025-10 unverdicted novelty 7.0

Introduces Generalized N Factor Model and LGES algorithm that identifies true causal structure including latents up to Markov equivalence class via score-based greedy search.
Searching for Activation Functions
cs.NE 2017-10 conditional novelty 7.0

Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.
Materialistic RIR: Material Conditioned Realistic RIR Generation
cs.CV 2026-04 unverdicted novelty 6.0

A two-module neural model disentangles spatial layout from material properties to generate controllable and more realistic room impulse responses, reporting gains of up to 16% on acoustic metrics and 70% on material m...
Functional Similarity Metric for Neural Networks: Overcoming Parametric Ambiguity via Activation Region Analysis
cs.LG 2026-04 unverdicted novelty 6.0

A functional similarity metric for ReLU networks uses normalized activation region signatures and MinHash to overcome parametric symmetries like neuron permutation and scaling.
On Divergence Measures for Training GFlowNets
cs.LG 2024-10 unverdicted novelty 6.0

Introduces statistically efficient estimators for Renyi-α, Tsallis-α, reverse and forward KL divergences with REINFORCE and score-matching control variates for faster GFlowNet training.
Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments
cs.CV 2019-06 unverdicted novelty 6.0

The MIA model with GC, RGA, and BFM modules achieves state-of-the-art performance on the CUHK-PEDES dataset for description-based person re-identification.
Sparsity Hurts: Simple Linear Adapter Can Boost Generalized Category Discovery
cs.CV 2026-05 unverdicted novelty 5.0

LAGCD inserts residual linear adapters into each ViT block plus a distribution alignment loss to improve generalized category discovery by increasing model flexibility while reducing bias between seen and novel classes.
Activation Function Design Sustains Plasticity in Continual Learning
cs.LG 2025-09 unverdicted novelty 5.0

Smooth-Leaky and Randomized Smooth-Leaky activations mitigate loss of plasticity in continual learning by targeting negative-branch shape and saturation behavior.
Gamma-Ray Burst Light Curve Reconstruction: A Comparative Machine and Deep Learning Analysis
astro-ph.HE 2024-12 unverdicted novelty 5.0

MLP and Attention U-Net outperform other models in reconstructing GRB light curves on 521 events, cutting plateau parameter uncertainties by 37-41% versus the Willingale baseline while achieving low MSE.
High-throughput Onboard Hyperspectral Image Compression with Ground-based CNN Reconstruction
eess.IV 2019-07 unverdicted novelty 5.0

Prequantization-based lossless predictive compression onboard hyperspectral images with CNN ground reconstruction recovers the entire SNR drop at 2 bpp.
Adaptive Reorganization of Neural Pathways for Continual Learning with Spiking Neural Networks
cs.NE 2023-09 unverdicted novelty 4.0

SOR-SNN employs Self-Organizing Regulation networks to reorganize a single SNN into sparse pathways, achieving better performance, energy efficiency, memory use, backward transfer, and self-repair on continual learnin...
Discriminative Embedding Autoencoder with a Regressor Feedback for Zero-Shot Learning
cs.CV 2019-07 unverdicted novelty 4.0

A new autoencoder model with margin-based discriminative embeddings and regressor feedback outperforms prior zero-shot learning methods on SUN, CUB, AWA1 and AWA2, with larger gains in generalized ZSL.
Two-stream Spatiotemporal Feature for Video QA Task
cs.CV 2019-07 unverdicted novelty 4.0

A two-stream spatiotemporal feature extractor with squeeze-and-excitation and attention-based context matching improves text-only video QA on TVQA but shows limitations with visual features.
On Reducing Negative Jacobian Determinant of the Deformation Predicted by Deep Registration Networks
cs.CV 2019-06 unverdicted novelty 4.0

Two training mechanisms for unsupervised deep registration networks reduce the number of locations with negative Jacobian determinants in predicted deformations.
Modern CNNs for IoT Based Farms
cs.CY 2019-07 unverdicted novelty 2.0

A survey of state-of-the-art CNN architectures for agricultural IoT applications that proposes a tailored classification taxonomy and reviews existing research to guide architecture selection.
Deep learning in ultrasound imaging
eess.SP 2019-07 unverdicted novelty 2.0

A review outlining deep learning strategies for adaptive beamforming, spectral Doppler, compressive color Doppler encodings, and structured signal recovery in ultrasound.