Hochreiter, The vanishing gradient problem during learning recurrent neural networks, Int

· 1998 · DOI 10.1142/s0218488598000094

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.

The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs

cs.CV · 2026-02-09 · unverdicted · novelty 4.0

Effective depth, an operational count of sequential transformations, predicts CNN trainability better than nominal layer count because shortcuts and branches decouple the two.

A Transfer Learning Evaluation of Deep Neural Networks for Image Classification

cs.CV · 2026-05-12 · unverdicted · novelty 2.0

Empirical comparison of transfer learning performance across eleven pre-trained models on five image datasets using accuracy, time, and size metrics.

citing papers explorer

Showing 3 of 3 citing papers.

What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies cs.LG · 2026-05-08 · unverdicted · none · ref 95
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs cs.CV · 2026-02-09 · unverdicted · none · ref 1
Effective depth, an operational count of sequential transformations, predicts CNN trainability better than nominal layer count because shortcuts and branches decouple the two.
A Transfer Learning Evaluation of Deep Neural Networks for Image Classification cs.CV · 2026-05-12 · unverdicted · none · ref 26
Empirical comparison of transfer learning performance across eleven pre-trained models on five image datasets using accuracy, time, and size metrics.

Hochreiter, The vanishing gradient problem during learning recurrent neural networks, Int

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer