pith. machine review for the scientific record. sign in

arxiv: 1804.11188 · v1 · submitted 2018-03-23 · 💻 cs.LG · cs.NE· stat.ML

Recognition: unknown

Can recurrent neural networks warp time?

Authors on Pith no claims yet
classification 💻 cs.LG cs.NEstat.ML
keywords recurrentlongdependenciesgrusimprovelearninglstmsmodels
0
0 comments X
read the original abstract

Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi- invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Ex- perimentally, this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mamba Sequence Modeling meets Model Predictive Control

    math.OC 2026-04 unverdicted novelty 7.0

    Mamba-MPC stabilizes and tracks references on SISO and MIMO systems in simulation and hardware while outperforming LSTM-MPC with faster computation.

  2. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

    cs.LG 2021-04 accept novelty 6.0

    Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

  3. To Use AI as Dice of Possibilities with Timing Computation

    cs.AI 2026-05 unverdicted novelty 5.0

    Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.