Understanding the difficulty of training deep feedfor- ward neural networks

Xavier Glorot, Yoshua Bengio · 2010

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.

Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.

A Composite Activation Function for Learning Stable Binary Representations

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.

citing papers explorer

Showing 3 of 3 citing papers.

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences cs.LG · 2026-05-06 · unverdicted · none · ref 13
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization cs.CV · 2026-05-14 · unverdicted · none · ref 58
Nonlinear Bipolar Compensation with Bipolar Logarithmic Transformation reduces outlier effects in post-training quantization by performing compensation in a compressed transformed space.
A Composite Activation Function for Learning Stable Binary Representations cs.LG · 2026-05-12 · unverdicted · none · ref 19
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.

Understanding the difficulty of training deep feedfor- ward neural networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer