pith. sign in

arxiv: 2605.10650 · v1 · submitted 2026-05-11 · 💻 cs.LG · cond-mat.dis-nn

A Random-Matrix Criterion for Initializing Gated Recurrent Neural Networks

Pith reviewed 2026-05-12 04:31 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.dis-nn
keywords random matrix theorygated recurrent networksreservoir computingweight initializationedge of chaoschaotic forecastingcritical gain
0
0 comments X

The pith

A random-matrix criterion estimates the critical initialization gain where gated RNN reservoirs reach peak performance on chaotic tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a simple criterion, based on random-matrix analysis in the infinite-width limit, to locate the critical weight variance g_c that marks the boundary between ordered and chaotic dynamics in gated recurrent networks. This matters because reservoir computing relies on fixed recurrent weights to generate rich yet stable dynamics, and prior work has shown that operating near this phase transition maximizes the reservoir's expressive power. The authors demonstrate that their estimated g_c closely matches the gain at which a gated-RNN reservoir achieves its highest accuracy on a standard chaotic forecasting benchmark. They conclude that the same criterion can guide the design of initialization schemes for recurrent architectures more generally.

Core claim

In the infinite-width limit, meaningful random initializations for a broad class of gated recurrent networks sit at an effective critical point controlled by the weight variance g squared; the transition separates an ordered phase from a chaotic phase in which information degrades. The authors supply an explicit random-matrix criterion that estimates the critical gain g_c for this transition and verify that, on a chaotic time-series forecasting task, reservoir performance peaks near the predicted g_c.

What carries the argument

The random-matrix criterion for the critical gain g_c, obtained by locating the point at which the largest eigenvalue of the effective random matrix product equals unity and thereby marks the ordered-to-chaotic transition.

If this is right

  • The same criterion applies without modification to a wide family of recurrent architectures beyond the specific gated RNN tested.
  • Reservoir performance on chaotic forecasting tasks reaches its maximum when the initialization variance is set to the estimated critical value.
  • The criterion supplies an explicit, parameter-free design rule that future initialization schemes for recurrent networks can adopt directly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The criterion could be used to initialize recurrent layers inside larger hybrid models that combine reservoirs with trained weights.
  • Because the derivation relies only on the spectral properties of the random matrix product, it may extend to recurrent architectures with different gating mechanisms or activation functions.
  • Testing the same initialization on tasks with long memory requirements but non-chaotic statistics would reveal whether the edge-of-chaos optimum is task-dependent.

Load-bearing premise

That the infinite-width edge-of-chaos transition identified by the random-matrix analysis remains the optimal operating point for finite-width gated RNNs on concrete prediction tasks.

What would settle it

Measure the forecasting error of a gated-RNN reservoir while sweeping the initialization gain g around the predicted g_c; if the error minimum occurs at a gain differing by more than a few percent from the criterion's prediction, the claimed correspondence is falsified.

Figures

Figures reproduced from arXiv: 2605.10650 by Francesco Casola, Riccardo Marcaccioli, Tommaso Fioratti.

Figure 1
Figure 1. Figure 1: FIG. 1: Stationary signal power in LSTM with Gaussian [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: Maximum Lyapunov exponent [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3: Phase diagram under Gaussian bias [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4: Training (dashed) and test (solid) mean squared [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Proper weight initialization prior to training has historically been one of the key factors that helped kick off the deep learning revolution. Initialization is even more crucial in "reservoir computing", where the weights of a readout layer are learned linearly while the reservoir weights are fixed and largely determine the richness, stability and memory of the resulting dynamics. In the infinite-width limit it has been shown that meaningful initializations are those sitting at an effective critical point of the randomly initialized model. The phase transition is controlled by the weight variance $g^2$ and separates an ordered phase from a chaotic one where information progressively degrades. Here we derive a simple criterion to estimate the critical $g_c$ for a broad class of recurrent architectures and we show that it closely tracks the gain at which a gated-RNN reservoir achieves peak performance on a chaotic forecasting task. Finally, we argue that our criterion can serve as a design principle for future initialization schemes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The manuscript derives a random-matrix criterion for the critical gain g_c marking the edge-of-chaos transition in a broad class of gated recurrent architectures in the infinite-width limit. It then shows empirically that initializing a gated-RNN reservoir at this g_c produces peak performance on a chaotic time-series forecasting task and proposes the criterion as a general initialization design principle.

Significance. If the derivation and alignment hold, the work supplies a theoretically motivated initialization rule for gated reservoirs that could reduce hyperparameter search costs while improving dynamical stability and forecasting accuracy. Extending mean-field edge-of-chaos analysis to gated units is a useful incremental contribution to reservoir computing literature.

minor comments (4)
  1. [Abstract] Abstract: the phrase 'closely tracks' should be replaced by a quantitative statement (e.g., relative error or correlation coefficient) that is backed by the results section.
  2. [§3] §3 (or wherever the derivation appears): explicitly list the steps that map the linearized Jacobian of the gated recurrence to the final g_c expression; the extension from vanilla RNNs is not obvious and needs to be shown.
  3. [Figure 2] Figure 2 (performance vs. gain curves): add error bars or report the number of independent runs; without them the visual alignment with the derived g_c is hard to assess.
  4. [Throughout] Notation: consistently distinguish the variance parameter g² from the critical value g_c; the current usage occasionally blurs the two.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript and for recommending minor revision. The summary accurately captures the core contribution: a random-matrix derivation of the critical gain g_c for gated RNN reservoirs in the infinite-width limit, together with empirical evidence that this initialization yields peak performance on chaotic forecasting tasks. As the report contains no specific major comments, we see no need for revisions and believe the current version is suitable for publication.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives a random-matrix criterion for the critical gain g_c by extending standard mean-field edge-of-chaos analysis to the linearized Jacobian of gated RNN architectures in the infinite-width limit. This derivation is presented as a direct theoretical extension using established variance-controlled phase transitions and does not reduce to any fitted quantity or self-referential definition. The subsequent empirical check that the derived g_c aligns with peak performance on a chaotic forecasting task serves only as validation; the task data is not used to construct or tune the criterion itself. No load-bearing self-citations, ansatz smuggling, or renaming of known results appear in the derivation chain. The central claim therefore remains independent of its empirical test.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the central claim rests on the infinite-width random-matrix description of RNN phase transitions and the assumption that peak task performance occurs at the critical point.

axioms (1)
  • domain assumption Infinite-width limit governs the phase transition controlled by weight variance g^2 in the recurrent architectures considered.
    Explicitly invoked in the abstract as the regime where meaningful initializations sit at the critical point.

pith-pipeline@v0.9.0 · 5460 in / 1238 out tokens · 48728 ms · 2026-05-12T04:31:52.644180+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Glorot and Y

    X. Glorot and Y. Bengio, in Proceedings of the 13th Inter- national Conference on Artificial Intelligence and Statis- tics (2010)

  2. [2]

    K. He, X. Zhang, S. Ren, and J. Sun, in Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015) pp. 1026–1034

  3. [3]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Ad- vances in Neural Information Processing Systems , Vol. 25 (2012)

  4. [4]

    A. M. Saxe, J. L. McClelland, and S. Ganguli, in Inter- national Conference on Learning Representations (2014)

  5. [5]

    Pennington, S

    J. Pennington, S. S. Schoenholz, and S. Ganguli, in Ad- vances in Neural Information Processing Systems , Vol. 30 (2017) pp. 4788–4798

  6. [6]

    Bertschinger and T

    N. Bertschinger and T. Natschläger, Neural Computation 16, 1413 (2004)

  7. [7]

    Boedecker, O

    J. Boedecker, O. Obst, J. T. Lizier, N. M. Mayer, and M. Asada, Theory in Biosciences 131, 205 (2012)

  8. [8]

    Sompolinsky, A

    H. Sompolinsky, A. Crisanti, and H.-J. Sommers, Physi- cal Review Letters 61, 259 (1988)

  9. [9]

    S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl- Dickstein, in International Conference on Learning Rep- resentations (2017)

  10. [10]

    Cardy, Scaling and Renormalization in Statistical Physics (Cambridge University Press, 1996)

    J. Cardy, Scaling and Renormalization in Statistical Physics (Cambridge University Press, 1996)

  11. [11]

    Molgedey, J

    L. Molgedey, J. Schuchhardt, and H. G. Schuster, Physi- cal Review Letters 69, 3717 (1992)

  12. [12]

    Ahmadian, F

    Y. Ahmadian, F. Fumarola, and K. D. Miller, Physical Review E 91, 012820 (2015)

  13. [13]

    T. Can, K. Krishnamurthy, and D. J. Schwab, in Proceed- ings of the First Mathematical and Scientific Machine Learning Conference, Proceedings of Machine Learning Research, Vol. 107 (2020) pp. 476–511, arXiv:2002.00025 [cs.LG]

  14. [14]

    Krishnamurthy, T

    K. Krishnamurthy, T. Can, and D. J. Schwab, Physical Review X 12, 011011 (2022)

  15. [15]

    R. G. Brown, Exponential Smoothing for Predicting De- mand (Arthur D. Little Inc., 1956). 9

  16. [16]

    P. J. Kaufman, Smarter Trading: Improving Perfor- mance in Changing Markets (McGraw-Hill, New York, 1995)

  17. [17]

    S. F. Edwards and P. W. Anderson, Journal of Physics F: Metal Physics 5, 965 (1975)

  18. [18]

    Poole, S

    B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and S. Ganguli, in Advances in Neural Information Processing Systems, Vol. 29 (2016)

  19. [19]

    Benettin, L

    G. Benettin, L. Galgani, A. Giorgilli, and J.-M. Strelcyn, Meccanica 15, 21 (1980)

  20. [20]

    Tao and V

    T. Tao and V. Vu, Annals of Probability 38, 2023 (2010) , with an appendix by M. Krishnapur

  21. [21]

    Tallec and Y

    C. Tallec and Y. Ollivier, in International Conference on Learning Representations (2018)

  22. [22]

    M. C. Mackey and L. Glass, Science 197, 287 (1977)

  23. [23]

    echo state

    H. Jaeger, The “echo state” approach to analysing and training recurrent neural networks , Tech. Rep. GMD Re- port 148 (German National Research Center for Informa- tion Technology, 2001)

  24. [24]

    Jaeger and H

    H. Jaeger and H. Haas, Science 304, 78 (2004)

  25. [25]

    Cowsik, T

    A. Cowsik, T. Nebabu, X.-L. Qi, and S. Ganguli, Physical Review E 112, 055301 (2025) . Appendix A: Removing the regularizer The boundary condition ( 4) is taken from [ 12] in its full form, with the limits in the order limr→0+ limN →∞. This order matters in general: for arbitrary deterministic sequences of M, L, R, the unregularized empirical sum may dive...