pith. sign in

Unitary Evolution Recurrent Neural Networks

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

fields

cs.CL 1 cs.LG 1

years

2026 1 2019 1

representative citing papers

UWM-JEPA: Predictive World Models That Imagine in Belief Space

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

UWM-JEPA uses a density-matrix latent and unitary predictor in JEPA to preserve joint-state spectrum during blind rollouts, achieving 0.77 accuracy on a five-step hidden-velocity task versus 0.53 for an LSTM baseline.

Language Models as Knowledge Bases?

cs.CL · 2019-09-03 · accept · novelty 7.0

BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.

citing papers explorer

Showing 2 of 2 citing papers.

  • UWM-JEPA: Predictive World Models That Imagine in Belief Space cs.LG · 2026-05-25 · unverdicted · none · ref 33 · internal anchor

    UWM-JEPA uses a density-matrix latent and unitary predictor in JEPA to preserve joint-state spectrum during blind rollouts, achieving 0.77 accuracy on a five-step hidden-velocity task versus 0.53 for an LSTM baseline.

  • Language Models as Knowledge Bases? cs.CL · 2019-09-03 · accept · none · ref 93 · internal anchor

    BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.