Recognition: unknown
Can recurrent neural networks warp time?
read the original abstract
Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi- invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Ex- perimentally, this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Mamba Sequence Modeling meets Model Predictive Control
Mamba-MPC stabilizes and tracks references on SISO and MIMO systems in simulation and hardware while outperforming LSTM-MPC with faster computation.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
-
To Use AI as Dice of Possibilities with Timing Computation
Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.