pith. sign in

arxiv: 2605.14685 · v1 · pith:4MQ5UP5Znew · submitted 2026-05-14 · 💻 cs.LG · cond-mat.stat-mech· cs.AI

Spontaneous symmetry breaking and Goldstone modes for deep information propagation

Pith reviewed 2026-06-30 21:17 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.stat-mechcs.AI
keywords spontaneous symmetry breakingGoldstone modesequivariant layerssignal propagationdeep neural networksrecurrent neural networkstrainabilitylong-term memory
0
0 comments X

The pith

Equivariant layers in neural networks undergo spontaneous symmetry breaking to generate Goldstone-like modes that propagate signals coherently through depth and recurrent steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that neural-network layers equivariant under a continuous symmetry can break that symmetry spontaneously, producing Goldstone-like degrees of freedom. These modes carry signals with little degradation across many layers in feedforward nets and across many recurrent iterations in RNNs and GRUs. The mechanism therefore supplies stable information flow without residual connections or normalization. In practice this yields better trainability, richer layer-wise representations, and stronger performance on long-sequence tasks.

Core claim

When the internal layers of a deep neural network are equivariant under a continuous symmetry, spontaneous symmetry breaking occurs inside the network and generates Goldstone-like excitations. These excitations permit coherent propagation of signals across many layers or recurrent iterations. This mechanism supports stable information flow in both feedforward and recurrent architectures without the need for stabilizers such as residuals or batch normalization, resulting in improved trainability and representational diversity in the former case and improved long-term memory in the latter.

What carries the argument

Goldstone-like degrees of freedom generated by spontaneous symmetry breaking in symmetry-equivariant layers, which carry coherent signals through depth and time.

If this is right

  • Improved trainability of deep feedforward networks without residual connections or normalization.
  • Increased representational diversity across successive layers.
  • Improved performance of RNNs and GRUs on long-sequence modeling tasks through better long-term memory.
  • Coherent signal propagation across depth and recurrent iterations without architectural stabilizers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same symmetry-breaking mechanism could be tested in architectures that already use explicit symmetry constraints, such as graph neural networks.
  • One could measure the spectrum of fluctuations in activations to isolate the Goldstone-like modes directly.
  • If the mechanism scales, it might reduce reliance on normalization layers in very deep or very wide models.
  • The approach might combine with existing inductive biases to further stabilize training on tasks with inherent symmetries.

Load-bearing premise

Layers that are merely equivariant under a continuous symmetry will undergo spontaneous symmetry breaking and thereby support Goldstone-like degrees of freedom inside the network.

What would settle it

Train an equivariant network on a task that requires long-range signal transmission, then add small asymmetric perturbations that explicitly break the symmetry and check whether propagation coherence and performance drop sharply while an otherwise identical symmetric network remains stable.

Figures

Figures reproduced from arXiv: 2605.14685 by Max Welling, Nabil Iqbal, Takeru Miyato, T. Anderson Keller, Yue Song.

Figure 1
Figure 1. Figure 1: Equilibrium configurations are minima of the potential energy [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of mean-field result for c ⋆ from numerically solving (8) with empirical result measured over an ensemble of L = 100, N = 16 neural networks at initialization. Note clear phase transition at σW = 1. i.e. the U(1)-invariant magnitude of each activation, averaged over the whole set, where the expectation value is measured over randomly sampled weights and some appropriate input distribution at ini… view at source ↗
Figure 3
Figure 3. Figure 3: Test accuracy after 5 epochs on Fashion-MNIST for [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effective rank dynamics during training (left) and across network depth (right). [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Demonstration of how performance on Fashion-MNIST degrades with increasing layer number for a generic network and for different sorts of equivari￾ance. Note the O(4) model does not degrade at all. It is interesting to further investigate the mech￾anism behind this improved performance. We believe that part of this arises from the coher￾ent phase φ l explained in subsection 2.3. It can also be understood in… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of RNNs and GRUs with U(1) equivariant counterparts on copy tasks of max delay Tmax = 25 and Tmax = 100 respectively. We see that the U(1) models significantly outperform the non￾equivariant counterparts for both architectures, even when generic baselines have more real trainable params. phase, even away from the critical point it is possible to train a 100-layer network with no other mitigation… view at source ↗
Figure 7
Figure 7. Figure 7: Test accuracy vs. # real train￾able params on psMNIST. Equivariant RNNs and GRUs consistently outperform non￾equivariant models at all parameter ranges. Sequential Image Classification. We next compare models on a pixel-by-pixel image classification task that requires both long-sequence memory and simultaneous processing of the information. Specifically, we study a variant of sequential MNIST [37], where e… view at source ↗
Figure 8
Figure 8. Figure 8: Excitations in symmetry unbroken (left) and symmetry spontaneously broken (right) phases respectively. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Protected component of Jacobian at initialization. In the symmetry unbroken phase, all components [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Here we demonstrate the “conventional” edge of chaos for a model with the same nonlinearity as [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Test accuracy after 5 epochs on MNIST (as compared to Fashion-MNIST in Figure 3) for [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of hidden state magnitude (top), phase (middle), and vorticity (bottom) for a 2D [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: A second channel from the same 2D convolutional U(1) equivariant RNN. In this channel, we observe [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
read the original abstract

In physical systems, whenever a continuous symmetry is spontaneously broken, the system possesses excitations called Goldstone modes, which allow coherent information propagation over long distances and times. In this work, we study deep neural networks whose internal layers are equivariant under a continuous symmetry and may therefore support analogous Goldstone-like degrees of freedom. We demonstrate, both analytically and empirically, that these degrees of freedom enable coherent signal propagation across depth and recurrent iterations, providing a mechanism for stable information flow without relying on architectural stabilizers such as residual connections or normalization. In feedforward networks, this results in improved trainability and representational diversity across layers. In recurrent settings, we demonstrate the same mechanism is valuable for long-term memory by propagating information over recurrent iterations, thereby improving performance of RNNs and GRUs on long-sequence modeling tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that neural-network layers equivariant under a continuous symmetry undergo spontaneous symmetry breaking, thereby hosting Goldstone-like modes that enable coherent signal propagation across depth in feedforward nets and across recurrent iterations in RNNs/GRUs. These modes are asserted to improve trainability, representational diversity, and long-sequence performance without residuals or normalization; the support is described as both analytical derivations and empirical demonstrations.

Significance. If the central mapping from equivariance to dynamical SSB and gapless modes is rigorously established, the work would supply a symmetry-based mechanism for stable information flow that is conceptually distinct from existing architectural stabilizers. This could influence the design of deep and recurrent architectures and provide a new lens on why certain networks remain trainable at large depth.

major comments (2)
  1. [Abstract (and the analytical demonstration section referenced therein)] The load-bearing step is the assertion that layer equivariance implies spontaneous symmetry breaking in the activation dynamics (rather than merely preserving the symmetry or breaking it trivially). No explicit effective potential, vacuum selection, or gapless-mode construction is supplied to show how the broken phase is dynamically realized; the abstract's phrasing therefore leaves the analytical claim conditional on an unverified transfer of the physics analogy.
  2. [Empirical evaluation sections] Empirical results on improved propagation and long-sequence performance are presented as evidence for the Goldstone-like mechanism, yet the manuscript does not isolate the contribution of the putative modes from other factors (e.g., the specific choice of equivariant layers or initialization). A controlled ablation that removes the symmetry while preserving layer expressivity would be required to substantiate the causal link.
minor comments (2)
  1. [Introduction] Notation for the continuous symmetry group and the associated equivariant maps should be introduced with explicit definitions before the claim about Goldstone-like degrees of freedom is made.
  2. [Figures] Figure captions and axis labels in the propagation-depth plots should state the precise metric used to quantify 'coherent signal propagation' so that readers can assess the claimed improvement quantitatively.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and the opportunity to clarify the manuscript. Below we respond point-by-point to the two major comments, offering the strongest honest defense of the analytical and empirical claims while agreeing to revisions that strengthen the presentation without misrepresenting the existing derivations or experiments.

read point-by-point responses
  1. Referee: [Abstract (and the analytical demonstration section referenced therein)] The load-bearing step is the assertion that layer equivariance implies spontaneous symmetry breaking in the activation dynamics (rather than merely preserving the symmetry or breaking it trivially). No explicit effective potential, vacuum selection, or gapless-mode construction is supplied to show how the broken phase is dynamically realized; the abstract's phrasing therefore leaves the analytical claim conditional on an unverified transfer of the physics analogy.

    Authors: The analytical demonstration in Section 3 proceeds by applying the equivariant layer map to an initial activation and linearizing the dynamics under infinitesimal group transformations. Equivariance guarantees that the Jacobian commutes with the group action, which forces the existence of zero eigenvalues in the spectrum; these zero modes are the Goldstone-like excitations that propagate coherently. The symmetry is preserved by the layer definition yet the trajectory of activations selects a particular orbit, realizing the broken phase dynamically. While an explicit effective potential is not constructed, the mode spectrum and its gaplessness follow directly from the Noether identity associated with the continuous symmetry. We can expand the derivation with an explicit vacuum-selection argument in revision if the current linear-response treatment is deemed insufficient. revision: partial

  2. Referee: [Empirical evaluation sections] Empirical results on improved propagation and long-sequence performance are presented as evidence for the Goldstone-like mechanism, yet the manuscript does not isolate the contribution of the putative modes from other factors (e.g., the specific choice of equivariant layers or initialization). A controlled ablation that removes the symmetry while preserving layer expressivity would be required to substantiate the causal link.

    Authors: The reported experiments compare the equivariant architectures against standard non-equivariant baselines of comparable width and initialization, showing gains in signal propagation and long-sequence accuracy. We acknowledge that these baselines do not hold every other architectural detail fixed while strictly removing the symmetry. In the revision we will add a controlled ablation that introduces small explicit symmetry-breaking perturbations to the equivariant layers (while keeping the same parameter count and initialization distribution) and demonstrate that the performance advantage disappears, thereby isolating the contribution of the symmetry-protected modes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via external physics analogy

full rationale

The paper advances an analogy from physics (continuous symmetry equivariance in layers implying spontaneous symmetry breaking and Goldstone-like modes) to explain coherent signal propagation in deep and recurrent networks. This is claimed to be shown both analytically and empirically, without any quoted reduction of a central result to a fitted parameter, self-citation chain, or definitional equivalence. No load-bearing steps collapse by construction to inputs; the argument draws on external concepts (Goldstone theorem) and provides independent empirical tests. This is the normal case of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the transfer of a standard physics result to neural-network dynamics via an untested analogy; no free parameters are mentioned, but the bridging assumption is introduced without independent evidence.

axioms (2)
  • standard math Spontaneous symmetry breaking of a continuous symmetry produces massless Goldstone modes that propagate coherently.
    Standard result from condensed-matter and quantum-field theory.
  • ad hoc to paper Equivariant layers in a deep network will spontaneously break the symmetry and thereby host Goldstone-like modes.
    This is the load-bearing transfer from physics to machine learning; it is stated as the premise of the work.
invented entities (1)
  • Goldstone-like degrees of freedom in neural networks no independent evidence
    purpose: To mediate coherent signal propagation across depth and time without architectural stabilizers
    Postulated by direct analogy; no independent falsifiable prediction or external evidence is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5677 in / 1406 out tokens · 49390 ms · 2026-06-30T21:17:11.121584+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 4 canonical work pages

  1. [1]

    Broken Symmetries,

    J. Goldstone, A. Salam, and S. Weinberg, “Broken Symmetries,”Phys. Rev.127(1962) 965–970

  2. [2]

    Phonons as goldstone bosons,

    H. Leutwyler, “Phonons as goldstone bosons,”Helv. Phys. Acta70(1997) 275–286, arXiv:hep-ph/9609466

  3. [3]

    Quantum phase transitions,

    S. Sachdev, “Quantum phase transitions,”Physics world12no. 4, (1999) 33

  4. [4]

    Phenomenological Lagrangians,

    S. Weinberg, “Phenomenological Lagrangians,”Physica A96no. 1-2, (1979) 327–340

  5. [5]

    Generalized Global Symmetries,

    D. Gaiotto, A. Kapustin, N. Seiberg, and B. Willett, “Generalized Global Symmetries,”JHEP 02(2015) 172,arXiv:1412.5148 [hep-th]

  6. [6]

    Higher-form symmetries and spontaneous symmetry breaking,

    E. Lake, “Higher-form symmetries and spontaneous symmetry breaking,”arXiv:1802.07747 [hep-th]

  7. [7]

    Goldstone modes and photonization for higher form symmetries,

    D. M. Hofman and N. Iqbal, “Goldstone modes and photonization for higher form symmetries,” arXiv:1802.09512 [hep-th]

  8. [8]

    Exponential expressivity in deep neural networks through transient chaos,

    B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and S. Ganguli, “Exponential expressivity in deep neural networks through transient chaos,”Advances in neural information processing systems29(2016)

  9. [9]

    Deep information propagation,

    S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-Dickstein, “Deep information propagation,” inInternational Conference on Learning Representations. 2017. https://openreview.net/forum?id=H1W1UN9gg

  10. [10]

    Mean field residual networks: On the edge of chaos,

    G. Yang and S. Schoenholz, “Mean field residual networks: On the edge of chaos,”Advances in neural information processing systems30(2017)

  11. [11]

    Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks,

    L. Xiao, Y . Bahri, J. Sohl-Dickstein, S. Schoenholz, and J. Pennington, “Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks,” inInternational conference on machine learning, pp. 5393–5402, PMLR. 2018

  12. [12]

    Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks,

    M. Chen, J. Pennington, and S. Schoenholz, “Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks,” inInternational Conference on Machine Learning, pp. 873–882, PMLR. 2018

  13. [13]

    Artificial kuramoto oscillatory neurons,

    T. Miyato, S. Löwe, A. Geiger, and M. Welling, “Artificial kuramoto oscillatory neurons,” arXiv preprint arXiv:2410.13821(2024) . 10

  14. [14]

    Image segmentation with traveling waves in an exactly solvable recurrent neural network,

    L. H. Liboni, R. C. Budzinski, A. N. Busch, S. Löwe, T. A. Keller, M. Welling, and L. E. Muller, “Image segmentation with traveling waves in an exactly solvable recurrent neural network,”arXiv preprint arXiv:2311.16943(2023)

  15. [15]

    Traveling waves encode the recent past and enhance sequence learning,

    T. A. Keller, L. Muller, T. Sejnowski, and M. Welling, “Traveling waves encode the recent past and enhance sequence learning,”arXiv preprint arXiv:2309.08045(2023)

  16. [16]

    Traveling waves in the prefrontal cortex during working memory,

    S. Bhattacharya, S. L. Brincat, M. Lundqvist, and E. K. Miller, “Traveling waves in the prefrontal cortex during working memory,”PLOS Computational Biology18no. 1, (01, 2022) 1–22.https://doi.org/10.1371/journal.pcbi.1009827

  17. [17]

    Cortical travelling waves: mechanisms and computational principles,

    L. Muller, F. Chavane, J. Reynolds, and T. J. Sejnowski, “Cortical travelling waves: mechanisms and computational principles,”Nature Reviews Neuroscience19no. 5, (2018) 255–268.https://doi.org/10.1038/nrn.2018.20

  18. [18]

    Planar, spiral, and concentric traveling waves distinguish behavioral states in human memory,

    A. Das, E. Zabeh, B. Ermentrout, and J. Jacobs, “Planar, spiral, and concentric traveling waves distinguish behavioral states in human memory,”Nature Communications(2026) . https://doi.org/10.1038/s41467-026-71386-z

  19. [19]

    Group equivariant convolutional networks,

    T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational conference on machine learning, pp. 2990–2999, PMLR. 2016

  20. [20]

    Equivariant and coordinate independent convolutional networks,

    M. Weiler, P. Forré, E. Verlinde, and M. Welling, “Equivariant and coordinate independent convolutional networks,”A Gauge Field Theory of Neural Networks110(2023)

  21. [21]

    Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,

    M. M. Bronstein, J. Bruna, T. Cohen, and P. Veliˇckovi´c, “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,”arXiv preprint arXiv:2104.13478(2021)

  22. [22]

    Symmetry breaking and equivariant neural networks,

    S.-O. Kaba and S. Ravanbakhsh, “Symmetry breaking and equivariant neural networks,”arXiv preprint arXiv:2312.09016(2023)

  23. [23]

    Finding symmetry breaking order parameters with euclidean neural networks,

    T. E. Smidt, M. Geiger, and B. K. Miller, “Finding symmetry breaking order parameters with euclidean neural networks,”Physical Review Research3no. 1, (2021) L012002

  24. [24]

    Complex-valued autoencoders for object discovery,

    S. Löwe, P. Lippe, M. Rudolph, and M. Welling, “Complex-valued autoencoders for object discovery,”arXiv preprint arXiv:2204.02075(2022)

  25. [25]

    Binding dynamics in rotating features,

    S. Löwe, F. Locatello, and M. Welling, “Binding dynamics in rotating features,”arXiv preprint arXiv:2402.05627(2024)

  26. [26]

    Topological defects propagate information in deep neural networks,

    N. Iqbal and M. Welling, “Topological defects propagate information in deep neural networks,” inNeurIPS 2025 AI for Science Workshop. 2025. https://openreview.net/forum?id=fM5s2Tqe0t

  27. [27]

    Beyond relu: Bifurcation, oversmoothing, and topological priors,

    E. Turan, G. Abel, M. Behmanesh, E. Pierson, and M. Ovsjanikov, “Beyond relu: Bifurcation, oversmoothing, and topological priors,”arXiv preprint arXiv:2602.15634(2026)

  28. [28]

    Symmetry-protected lyapunov neutral modes in equivariant recurrent networks,

    H. H. Mo, “Symmetry-protected lyapunov neutral modes in equivariant recurrent networks,” arXiv preprint arXiv:2605.03338(2026)

  29. [29]

    Dynamic routing between capsules,

    S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,”Advances in neural information processing systems30(2017)

  30. [30]

    P. M. Chaikin, T. C. Lubensky, and T. A. Witten,Principles of condensed matter physics, vol. 10. Cambridge university press Cambridge, 1995

  31. [31]

    Deep neural networks as gaussian processes,

    J. Lee, Y . Bahri, R. Novak, S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, “Deep neural networks as gaussian processes,”arXiv preprint arXiv:1711.00165(2017)

  32. [32]

    Batch normalization provably avoids ranks collapse for randomly initialised deep networks,

    H. Daneshmand, J. Kohler, F. Bach, T. Hofmann, and A. Lucchi, “Batch normalization provably avoids ranks collapse for randomly initialised deep networks,”Advances in Neural Information Processing Systems33(2020) 18387–18398

  33. [33]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation9no. 8, (1997) 1735–1780. 11

  34. [34]

    Neural turing machines,

    A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,”arXiv:1410.5401 [cs.NE]

  35. [35]

    Recurrent orthogonal networks and long-memory tasks,

    M. Henaff, A. Szlam, and Y . LeCun, “Recurrent orthogonal networks and long-memory tasks,” inProceedings of the 33rd International Conference on Machine Learning, vol. 48 of Proceedings of Machine Learning Research, pp. 2034–2042. PMLR, 2016

  36. [36]

    Improving the gating mechanism of recurrent neural networks,

    A. Gu, C. Gulcehre, T. L. Paine, M. Hoffman, and R. Pascanu, “Improving the gating mechanism of recurrent neural networks,”arXiv:1910.09890 [cs.NE]

  37. [37]

    A simple way to initialize recurrent networks of rectified linear units,

    Q. V . Le, N. Jaitly, and G. E. Hinton, “A simple way to initialize recurrent networks of rectified linear units,”arXiv preprint arXiv:1504.00941(2015)

  38. [38]

    Unitary evolution recurrent neural networks,

    M. Arjovsky, A. Shah, and Y . Bengio, “Unitary evolution recurrent neural networks,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, p. 1120–1128. JMLR.org, 2016

  39. [39]

    Finding structure in time,

    J. L. Elman, “Finding structure in time,”Cognitive Science14no. 2, (1990) 179–211

  40. [40]

    Learning phrase representations using RNN encoder–decoder for statistical machine translation,

    K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, 2014

  41. [41]

    Independently recurrent neural network (indrnn): Building a longer and deeper rnn,

    S. Li, W. Li, C. Cook, C. Zhu, and Y . Gao, “Independently recurrent neural network (indrnn): Building a longer and deeper rnn,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5457–5466. IEEE Computer Society, Los Alamitos, CA, USA, Jun, 2018.https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00572

  42. [42]

    Lipschitz recurrent neural networks,

    N. B. Erichson, O. Azencot, A. Queiruga, L. Hodgkinson, and M. W. Mahoney, “Lipschitz recurrent neural networks,” inInternational Conference on Learning Representations. 2021. https://openreview.net/forum?id=-N7PBXqOUJZ

  43. [43]

    Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies,

    T. K. Rusch and S. Mishra, “Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies,” inInternational Conference on Learning Representations. 2021

  44. [44]

    Long expressive memory for sequence modeling,

    T. K. Rusch, S. Mishra, N. B. Erichson, and M. W. Mahoney, “Long expressive memory for sequence modeling,” inInternational Conference on Learning Representations. 2022

  45. [45]

    Coleman,Aspects of symmetry: selected Erice lectures

    S. Coleman,Aspects of symmetry: selected Erice lectures. Cambridge University Press, 1988

  46. [46]

    Solitons and instantons. an introduction to solitons and instantons in quantum field theory,

    R. Rajaraman, “Solitons and instantons. an introduction to solitons and instantons in quantum field theory,”

  47. [47]

    Interacting spiral wave patterns underlie complex brain dynamics and are related to cognitive processing,

    Y . Xu, X. Long, J. Feng, and P. Gong, “Interacting spiral wave patterns underlie complex brain dynamics and are related to cognitive processing,”Nature human behaviour7no. 7, (2023) 1196–1215

  48. [48]

    Recurrent convolutional neural networks for scene labeling,

    P. O. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” inProceedings of the 31st International Conference on Machine Learning, vol. 32 of Proceedings of Machine Learning Research, pp. 82–90. PMLR, 2014

  49. [49]

    Recurrent convolutional neural network for object recognition,

    M. Liang and X. Hu, “Recurrent convolutional neural network for object recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3367–3375. 2015

  50. [50]

    Convolutional lstm network: A machine learning approach for precipitation nowcasting,

    X. Shi, Z. Chen, H. Wang, D.-Y . Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” inAdvances in Neural Information Processing Systems, vol. 28. 2015

  51. [51]

    Delving deeper into convolutional networks for learning video representations,

    N. Ballas, L. Yao, C. Pal, and A. Courville, “Delving deeper into convolutional networks for learning video representations,” inInternational Conference on Learning Representations. 2016. 12

  52. [52]

    Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms,

    Y . Wang, M. Long, J. Wang, Z. Gao, and S. Y . Philip, “Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms,” inAdvances in Neural Information Processing Systems, vol. 30. 2017

  53. [53]

    Neural wave machines: Learning spatiotemporally structured representations with locally coupled oscillatory recurrent neural networks,

    T. A. Keller and M. Welling, “Neural wave machines: Learning spatiotemporally structured representations with locally coupled oscillatory recurrent neural networks,” inProceedings of the 40th International Conference on Machine Learning, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, eds., vol. 202 ofProceedings of Machine Lea...

  54. [54]

    M. E. Peskin and D. V . Schroeder,An Introduction to quantum field theory. Addison-Wesley, Reading, USA, 1995

  55. [55]

    Statistical dynamics of classical systems,

    P. C. Martin, E. D. Siggia, and H. A. Rose, “Statistical dynamics of classical systems,”Physical Review A8no. 1, (1973) 423

  56. [56]

    Chaos in random neural networks,

    H. Sompolinsky, A. Crisanti, and H.-J. Sommers, “Chaos in random neural networks,”Physical review letters61no. 3, (1988) 259

  57. [57]

    Path integral approach to random neural networks,

    A. Crisanti and H. Sompolinsky, “Path integral approach to random neural networks,”Physical Review E98no. 6, (2018) 062120

  58. [58]

    A correspondence between random neural networks and statistical field theory,

    S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, “A correspondence between random neural networks and statistical field theory,”arXiv preprint arXiv:1710.06570(2017)

  59. [59]

    Pytorch: An imperative style, high-performance deep learning library,

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems32(2019)

  60. [60]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv:1412.6980 [cs.LG].https://arxiv.org/abs/1412.6980

  61. [61]

    Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,

    H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,”arXiv preprint arXiv:1708.07747(2017) . 13 Appendices Table of Contents A Background on Goldstone modes 14 B Implementation of equivariant layers 15 C Path integral for stochastic systems 16 C.1U(1)equivariant feedforward network . . . ...