(35) Note that it is necessary to have the ω2 · x contribution to be O(ϵ2), otherwise the α2(ω2 · x) term would diverge with 1/ϵ

+ [α1(ω1 · x) + α2(ω2 · x)]σ′(ω0 · x/ √ 3) + 1 12 α2(ω1 · x)2σ′′(ω0 · x/ √ 3)

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2025-06-17 · unverdicted · novelty 7.0

Neural loss landscapes contain flat channels to infinity along which gradient flow leads pairs of neurons to implement gated linear units.

Showing 1 of 1 citing paper.

Flat Channels to Infinity in Neural Loss Landscapes cs.LG · 2025-06-17 · unverdicted · none · ref 64
Neural loss landscapes contain flat channels to infinity along which gradient flow leads pairs of neurons to implement gated linear units.