Spontaneous symmetry breaking and Goldstone modes for deep information propagation

Max Welling; Nabil Iqbal; Takeru Miyato; T. Anderson Keller; Yue Song

arxiv: 2605.14685 · v1 · pith:4MQ5UP5Znew · submitted 2026-05-14 · 💻 cs.LG · cond-mat.stat-mech· cs.AI

Spontaneous symmetry breaking and Goldstone modes for deep information propagation

Nabil Iqbal , T. Anderson Keller , Yue Song , Takeru Miyato , Max Welling This is my paper

Pith reviewed 2026-06-30 21:17 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.stat-mechcs.AI

keywords spontaneous symmetry breakingGoldstone modesequivariant layerssignal propagationdeep neural networksrecurrent neural networkstrainabilitylong-term memory

0 comments

The pith

Equivariant layers in neural networks undergo spontaneous symmetry breaking to generate Goldstone-like modes that propagate signals coherently through depth and recurrent steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that neural-network layers equivariant under a continuous symmetry can break that symmetry spontaneously, producing Goldstone-like degrees of freedom. These modes carry signals with little degradation across many layers in feedforward nets and across many recurrent iterations in RNNs and GRUs. The mechanism therefore supplies stable information flow without residual connections or normalization. In practice this yields better trainability, richer layer-wise representations, and stronger performance on long-sequence tasks.

Core claim

When the internal layers of a deep neural network are equivariant under a continuous symmetry, spontaneous symmetry breaking occurs inside the network and generates Goldstone-like excitations. These excitations permit coherent propagation of signals across many layers or recurrent iterations. This mechanism supports stable information flow in both feedforward and recurrent architectures without the need for stabilizers such as residuals or batch normalization, resulting in improved trainability and representational diversity in the former case and improved long-term memory in the latter.

What carries the argument

Goldstone-like degrees of freedom generated by spontaneous symmetry breaking in symmetry-equivariant layers, which carry coherent signals through depth and time.

If this is right

Improved trainability of deep feedforward networks without residual connections or normalization.
Increased representational diversity across successive layers.
Improved performance of RNNs and GRUs on long-sequence modeling tasks through better long-term memory.
Coherent signal propagation across depth and recurrent iterations without architectural stabilizers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same symmetry-breaking mechanism could be tested in architectures that already use explicit symmetry constraints, such as graph neural networks.
One could measure the spectrum of fluctuations in activations to isolate the Goldstone-like modes directly.
If the mechanism scales, it might reduce reliance on normalization layers in very deep or very wide models.
The approach might combine with existing inductive biases to further stabilize training on tasks with inherent symmetries.

Load-bearing premise

Layers that are merely equivariant under a continuous symmetry will undergo spontaneous symmetry breaking and thereby support Goldstone-like degrees of freedom inside the network.

What would settle it

Train an equivariant network on a task that requires long-range signal transmission, then add small asymmetric perturbations that explicitly break the symmetry and check whether propagation coherence and performance drop sharply while an otherwise identical symmetric network remains stable.

Figures

Figures reproduced from arXiv: 2605.14685 by Max Welling, Nabil Iqbal, Takeru Miyato, T. Anderson Keller, Yue Song.

**Figure 2.** Figure 2: Comparison of mean-field result for c ⋆ from numerically solving (8) with empirical result measured over an ensemble of L = 100, N = 16 neural networks at initialization. Note clear phase transition at σW = 1. i.e. the U(1)-invariant magnitude of each activation, averaged over the whole set, where the expectation value is measured over randomly sampled weights and some appropriate input distribution at ini… view at source ↗

**Figure 3.** Figure 3: Test accuracy after 5 epochs on Fashion-MNIST for [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Effective rank dynamics during training (left) and across network depth (right). [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 4.** Figure 4: Demonstration of how performance on Fashion-MNIST degrades with increasing layer number for a generic network and for different sorts of equivariance. Note the O(4) model does not degrade at all. It is interesting to further investigate the mechanism behind this improved performance. We believe that part of this arises from the coherent phase φ l explained in subsection 2.3. It can also be understood in… view at source ↗

**Figure 6.** Figure 6: Comparison of RNNs and GRUs with U(1) equivariant counterparts on copy tasks of max delay Tmax = 25 and Tmax = 100 respectively. We see that the U(1) models significantly outperform the nonequivariant counterparts for both architectures, even when generic baselines have more real trainable params. phase, even away from the critical point it is possible to train a 100-layer network with no other mitigation… view at source ↗

**Figure 7.** Figure 7: Test accuracy vs. # real trainable params on psMNIST. Equivariant RNNs and GRUs consistently outperform nonequivariant models at all parameter ranges. Sequential Image Classification. We next compare models on a pixel-by-pixel image classification task that requires both long-sequence memory and simultaneous processing of the information. Specifically, we study a variant of sequential MNIST [37], where e… view at source ↗

**Figure 8.** Figure 8: Excitations in symmetry unbroken (left) and symmetry spontaneously broken (right) phases respectively. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Protected component of Jacobian at initialization. In the symmetry unbroken phase, all components [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Here we demonstrate the “conventional” edge of chaos for a model with the same nonlinearity as [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Test accuracy after 5 epochs on MNIST (as compared to Fashion-MNIST in Figure 3) for [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Visualization of hidden state magnitude (top), phase (middle), and vorticity (bottom) for a 2D [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗

**Figure 13.** Figure 13: A second channel from the same 2D convolutional U(1) equivariant RNN. In this channel, we observe [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗

read the original abstract

In physical systems, whenever a continuous symmetry is spontaneously broken, the system possesses excitations called Goldstone modes, which allow coherent information propagation over long distances and times. In this work, we study deep neural networks whose internal layers are equivariant under a continuous symmetry and may therefore support analogous Goldstone-like degrees of freedom. We demonstrate, both analytically and empirically, that these degrees of freedom enable coherent signal propagation across depth and recurrent iterations, providing a mechanism for stable information flow without relying on architectural stabilizers such as residual connections or normalization. In feedforward networks, this results in improved trainability and representational diversity across layers. In recurrent settings, we demonstrate the same mechanism is valuable for long-term memory by propagating information over recurrent iterations, thereby improving performance of RNNs and GRUs on long-sequence modeling tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps Goldstone modes onto equivariant layers for propagation but the step from equivariance to actual spontaneous breaking remains the key unverified piece.

read the letter

The one thing to know is that this paper maps the idea of Goldstone modes from broken symmetries in physics onto equivariant neural network layers to explain stable signal propagation.

They show analytically and with experiments that these modes help information travel across many layers or recurrent steps without residuals or normalization. In feedforward cases it improves trainability and layer diversity, and in recurrent nets it aids long-term memory on sequence tasks.

The work is new in using this specific physics analogy for the mechanism. It does a decent job laying out how equivariance could lead to coherent propagation, and the empirical claims on RNN performance are concrete enough to check.

The soft spot is the step from equivariance to spontaneous symmetry breaking. The stress-test is correct that just having layers commute with a group action does not force the activations to break the symmetry or produce gapless modes. The paper needs to show explicitly how the network dynamics or loss creates the broken phase and the resulting modes, rather than assuming the analogy carries over. If that part is only sketched, the analytic support is weaker than claimed.

Minor issues might be in how they measure the propagation or control experiments, but the main one is this.

This is for people working on deep network theory, symmetry in ML, or recurrent model design. A reader looking for new explanations of why some architectures train well at depth would find it relevant.

It deserves a serious referee because the idea is not routine and the potential implications for architecture are real, even if revisions are needed on the breaking mechanism.

I would recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that neural-network layers equivariant under a continuous symmetry undergo spontaneous symmetry breaking, thereby hosting Goldstone-like modes that enable coherent signal propagation across depth in feedforward nets and across recurrent iterations in RNNs/GRUs. These modes are asserted to improve trainability, representational diversity, and long-sequence performance without residuals or normalization; the support is described as both analytical derivations and empirical demonstrations.

Significance. If the central mapping from equivariance to dynamical SSB and gapless modes is rigorously established, the work would supply a symmetry-based mechanism for stable information flow that is conceptually distinct from existing architectural stabilizers. This could influence the design of deep and recurrent architectures and provide a new lens on why certain networks remain trainable at large depth.

major comments (2)

[Abstract (and the analytical demonstration section referenced therein)] The load-bearing step is the assertion that layer equivariance implies spontaneous symmetry breaking in the activation dynamics (rather than merely preserving the symmetry or breaking it trivially). No explicit effective potential, vacuum selection, or gapless-mode construction is supplied to show how the broken phase is dynamically realized; the abstract's phrasing therefore leaves the analytical claim conditional on an unverified transfer of the physics analogy.
[Empirical evaluation sections] Empirical results on improved propagation and long-sequence performance are presented as evidence for the Goldstone-like mechanism, yet the manuscript does not isolate the contribution of the putative modes from other factors (e.g., the specific choice of equivariant layers or initialization). A controlled ablation that removes the symmetry while preserving layer expressivity would be required to substantiate the causal link.

minor comments (2)

[Introduction] Notation for the continuous symmetry group and the associated equivariant maps should be introduced with explicit definitions before the claim about Goldstone-like degrees of freedom is made.
[Figures] Figure captions and axis labels in the propagation-depth plots should state the precise metric used to quantify 'coherent signal propagation' so that readers can assess the claimed improvement quantitatively.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and the opportunity to clarify the manuscript. Below we respond point-by-point to the two major comments, offering the strongest honest defense of the analytical and empirical claims while agreeing to revisions that strengthen the presentation without misrepresenting the existing derivations or experiments.

read point-by-point responses

Referee: [Abstract (and the analytical demonstration section referenced therein)] The load-bearing step is the assertion that layer equivariance implies spontaneous symmetry breaking in the activation dynamics (rather than merely preserving the symmetry or breaking it trivially). No explicit effective potential, vacuum selection, or gapless-mode construction is supplied to show how the broken phase is dynamically realized; the abstract's phrasing therefore leaves the analytical claim conditional on an unverified transfer of the physics analogy.

Authors: The analytical demonstration in Section 3 proceeds by applying the equivariant layer map to an initial activation and linearizing the dynamics under infinitesimal group transformations. Equivariance guarantees that the Jacobian commutes with the group action, which forces the existence of zero eigenvalues in the spectrum; these zero modes are the Goldstone-like excitations that propagate coherently. The symmetry is preserved by the layer definition yet the trajectory of activations selects a particular orbit, realizing the broken phase dynamically. While an explicit effective potential is not constructed, the mode spectrum and its gaplessness follow directly from the Noether identity associated with the continuous symmetry. We can expand the derivation with an explicit vacuum-selection argument in revision if the current linear-response treatment is deemed insufficient. revision: partial
Referee: [Empirical evaluation sections] Empirical results on improved propagation and long-sequence performance are presented as evidence for the Goldstone-like mechanism, yet the manuscript does not isolate the contribution of the putative modes from other factors (e.g., the specific choice of equivariant layers or initialization). A controlled ablation that removes the symmetry while preserving layer expressivity would be required to substantiate the causal link.

Authors: The reported experiments compare the equivariant architectures against standard non-equivariant baselines of comparable width and initialization, showing gains in signal propagation and long-sequence accuracy. We acknowledge that these baselines do not hold every other architectural detail fixed while strictly removing the symmetry. In the revision we will add a controlled ablation that introduces small explicit symmetry-breaking perturbations to the equivariant layers (while keeping the same parameter count and initialization distribution) and demonstrate that the performance advantage disappears, thereby isolating the contribution of the symmetry-protected modes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via external physics analogy

full rationale

The paper advances an analogy from physics (continuous symmetry equivariance in layers implying spontaneous symmetry breaking and Goldstone-like modes) to explain coherent signal propagation in deep and recurrent networks. This is claimed to be shown both analytically and empirically, without any quoted reduction of a central result to a fitted parameter, self-citation chain, or definitional equivalence. No load-bearing steps collapse by construction to inputs; the argument draws on external concepts (Goldstone theorem) and provides independent empirical tests. This is the normal case of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the transfer of a standard physics result to neural-network dynamics via an untested analogy; no free parameters are mentioned, but the bridging assumption is introduced without independent evidence.

axioms (2)

standard math Spontaneous symmetry breaking of a continuous symmetry produces massless Goldstone modes that propagate coherently.
Standard result from condensed-matter and quantum-field theory.
ad hoc to paper Equivariant layers in a deep network will spontaneously break the symmetry and thereby host Goldstone-like modes.
This is the load-bearing transfer from physics to machine learning; it is stated as the premise of the work.

invented entities (1)

Goldstone-like degrees of freedom in neural networks no independent evidence
purpose: To mediate coherent signal propagation across depth and time without architectural stabilizers
Postulated by direct analogy; no independent falsifiable prediction or external evidence is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5677 in / 1406 out tokens · 49390 ms · 2026-06-30T21:17:11.121584+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 4 canonical work pages

[1]

Broken Symmetries,

J. Goldstone, A. Salam, and S. Weinberg, “Broken Symmetries,”Phys. Rev.127(1962) 965–970

1962
[2]

Phonons as goldstone bosons,

H. Leutwyler, “Phonons as goldstone bosons,”Helv. Phys. Acta70(1997) 275–286, arXiv:hep-ph/9609466

Pith/arXiv arXiv 1997
[3]

Quantum phase transitions,

S. Sachdev, “Quantum phase transitions,”Physics world12no. 4, (1999) 33

1999
[4]

Phenomenological Lagrangians,

S. Weinberg, “Phenomenological Lagrangians,”Physica A96no. 1-2, (1979) 327–340

1979
[5]

Generalized Global Symmetries,

D. Gaiotto, A. Kapustin, N. Seiberg, and B. Willett, “Generalized Global Symmetries,”JHEP 02(2015) 172,arXiv:1412.5148 [hep-th]

Pith/arXiv arXiv 2015
[6]

Higher-form symmetries and spontaneous symmetry breaking,

E. Lake, “Higher-form symmetries and spontaneous symmetry breaking,”arXiv:1802.07747 [hep-th]

Pith/arXiv arXiv
[7]

Goldstone modes and photonization for higher form symmetries,

D. M. Hofman and N. Iqbal, “Goldstone modes and photonization for higher form symmetries,” arXiv:1802.09512 [hep-th]

arXiv
[8]

Exponential expressivity in deep neural networks through transient chaos,

B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and S. Ganguli, “Exponential expressivity in deep neural networks through transient chaos,”Advances in neural information processing systems29(2016)

2016
[9]

Deep information propagation,

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-Dickstein, “Deep information propagation,” inInternational Conference on Learning Representations. 2017. https://openreview.net/forum?id=H1W1UN9gg

2017
[10]

Mean field residual networks: On the edge of chaos,

G. Yang and S. Schoenholz, “Mean field residual networks: On the edge of chaos,”Advances in neural information processing systems30(2017)

2017
[11]

Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks,

L. Xiao, Y . Bahri, J. Sohl-Dickstein, S. Schoenholz, and J. Pennington, “Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks,” inInternational conference on machine learning, pp. 5393–5402, PMLR. 2018

2018
[12]

Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks,

M. Chen, J. Pennington, and S. Schoenholz, “Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks,” inInternational Conference on Machine Learning, pp. 873–882, PMLR. 2018

2018
[13]

Artificial kuramoto oscillatory neurons,

T. Miyato, S. Löwe, A. Geiger, and M. Welling, “Artificial kuramoto oscillatory neurons,” arXiv preprint arXiv:2410.13821(2024) . 10

arXiv 2024
[14]

Image segmentation with traveling waves in an exactly solvable recurrent neural network,

L. H. Liboni, R. C. Budzinski, A. N. Busch, S. Löwe, T. A. Keller, M. Welling, and L. E. Muller, “Image segmentation with traveling waves in an exactly solvable recurrent neural network,”arXiv preprint arXiv:2311.16943(2023)

arXiv 2023
[15]

Traveling waves encode the recent past and enhance sequence learning,

T. A. Keller, L. Muller, T. Sejnowski, and M. Welling, “Traveling waves encode the recent past and enhance sequence learning,”arXiv preprint arXiv:2309.08045(2023)

arXiv 2023
[16]

Traveling waves in the prefrontal cortex during working memory,

S. Bhattacharya, S. L. Brincat, M. Lundqvist, and E. K. Miller, “Traveling waves in the prefrontal cortex during working memory,”PLOS Computational Biology18no. 1, (01, 2022) 1–22.https://doi.org/10.1371/journal.pcbi.1009827

work page doi:10.1371/journal.pcbi.1009827 2022
[17]

Cortical travelling waves: mechanisms and computational principles,

L. Muller, F. Chavane, J. Reynolds, and T. J. Sejnowski, “Cortical travelling waves: mechanisms and computational principles,”Nature Reviews Neuroscience19no. 5, (2018) 255–268.https://doi.org/10.1038/nrn.2018.20

work page doi:10.1038/nrn.2018.20 2018
[18]

Planar, spiral, and concentric traveling waves distinguish behavioral states in human memory,

A. Das, E. Zabeh, B. Ermentrout, and J. Jacobs, “Planar, spiral, and concentric traveling waves distinguish behavioral states in human memory,”Nature Communications(2026) . https://doi.org/10.1038/s41467-026-71386-z

work page doi:10.1038/s41467-026-71386-z 2026
[19]

Group equivariant convolutional networks,

T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational conference on machine learning, pp. 2990–2999, PMLR. 2016

2016
[20]

Equivariant and coordinate independent convolutional networks,

M. Weiler, P. Forré, E. Verlinde, and M. Welling, “Equivariant and coordinate independent convolutional networks,”A Gauge Field Theory of Neural Networks110(2023)

2023
[21]

Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,

M. M. Bronstein, J. Bruna, T. Cohen, and P. Veliˇckovi´c, “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,”arXiv preprint arXiv:2104.13478(2021)

Pith/arXiv arXiv 2021
[22]

Symmetry breaking and equivariant neural networks,

S.-O. Kaba and S. Ravanbakhsh, “Symmetry breaking and equivariant neural networks,”arXiv preprint arXiv:2312.09016(2023)

arXiv 2023
[23]

Finding symmetry breaking order parameters with euclidean neural networks,

T. E. Smidt, M. Geiger, and B. K. Miller, “Finding symmetry breaking order parameters with euclidean neural networks,”Physical Review Research3no. 1, (2021) L012002

2021
[24]

Complex-valued autoencoders for object discovery,

S. Löwe, P. Lippe, M. Rudolph, and M. Welling, “Complex-valued autoencoders for object discovery,”arXiv preprint arXiv:2204.02075(2022)

arXiv 2022
[25]

Binding dynamics in rotating features,

S. Löwe, F. Locatello, and M. Welling, “Binding dynamics in rotating features,”arXiv preprint arXiv:2402.05627(2024)

arXiv 2024
[26]

Topological defects propagate information in deep neural networks,

N. Iqbal and M. Welling, “Topological defects propagate information in deep neural networks,” inNeurIPS 2025 AI for Science Workshop. 2025. https://openreview.net/forum?id=fM5s2Tqe0t

2025
[27]

Beyond relu: Bifurcation, oversmoothing, and topological priors,

E. Turan, G. Abel, M. Behmanesh, E. Pierson, and M. Ovsjanikov, “Beyond relu: Bifurcation, oversmoothing, and topological priors,”arXiv preprint arXiv:2602.15634(2026)

Pith/arXiv arXiv 2026
[28]

Symmetry-protected lyapunov neutral modes in equivariant recurrent networks,

H. H. Mo, “Symmetry-protected lyapunov neutral modes in equivariant recurrent networks,” arXiv preprint arXiv:2605.03338(2026)

Pith/arXiv arXiv 2026
[29]

Dynamic routing between capsules,

S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,”Advances in neural information processing systems30(2017)

2017
[30]

P. M. Chaikin, T. C. Lubensky, and T. A. Witten,Principles of condensed matter physics, vol. 10. Cambridge university press Cambridge, 1995

1995
[31]

Deep neural networks as gaussian processes,

J. Lee, Y . Bahri, R. Novak, S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, “Deep neural networks as gaussian processes,”arXiv preprint arXiv:1711.00165(2017)

Pith/arXiv arXiv 2017
[32]

Batch normalization provably avoids ranks collapse for randomly initialised deep networks,

H. Daneshmand, J. Kohler, F. Bach, T. Hofmann, and A. Lucchi, “Batch normalization provably avoids ranks collapse for randomly initialised deep networks,”Advances in Neural Information Processing Systems33(2020) 18387–18398

2020
[33]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation9no. 8, (1997) 1735–1780. 11

1997
[34]

Neural turing machines,

A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,”arXiv:1410.5401 [cs.NE]

Pith/arXiv arXiv
[35]

Recurrent orthogonal networks and long-memory tasks,

M. Henaff, A. Szlam, and Y . LeCun, “Recurrent orthogonal networks and long-memory tasks,” inProceedings of the 33rd International Conference on Machine Learning, vol. 48 of Proceedings of Machine Learning Research, pp. 2034–2042. PMLR, 2016

2034
[36]

Improving the gating mechanism of recurrent neural networks,

A. Gu, C. Gulcehre, T. L. Paine, M. Hoffman, and R. Pascanu, “Improving the gating mechanism of recurrent neural networks,”arXiv:1910.09890 [cs.NE]

arXiv 1910
[37]

A simple way to initialize recurrent networks of rectified linear units,

Q. V . Le, N. Jaitly, and G. E. Hinton, “A simple way to initialize recurrent networks of rectified linear units,”arXiv preprint arXiv:1504.00941(2015)

Pith/arXiv arXiv 2015
[38]

Unitary evolution recurrent neural networks,

M. Arjovsky, A. Shah, and Y . Bengio, “Unitary evolution recurrent neural networks,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, p. 1120–1128. JMLR.org, 2016

2016
[39]

Finding structure in time,

J. L. Elman, “Finding structure in time,”Cognitive Science14no. 2, (1990) 179–211

1990
[40]

Learning phrase representations using RNN encoder–decoder for statistical machine translation,

K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, 2014

2014
[41]

Independently recurrent neural network (indrnn): Building a longer and deeper rnn,

S. Li, W. Li, C. Cook, C. Zhu, and Y . Gao, “Independently recurrent neural network (indrnn): Building a longer and deeper rnn,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5457–5466. IEEE Computer Society, Los Alamitos, CA, USA, Jun, 2018.https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00572

work page doi:10.1109/cvpr.2018.00572 2018
[42]

Lipschitz recurrent neural networks,

N. B. Erichson, O. Azencot, A. Queiruga, L. Hodgkinson, and M. W. Mahoney, “Lipschitz recurrent neural networks,” inInternational Conference on Learning Representations. 2021. https://openreview.net/forum?id=-N7PBXqOUJZ

2021
[43]

Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies,

T. K. Rusch and S. Mishra, “Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies,” inInternational Conference on Learning Representations. 2021

2021
[44]

Long expressive memory for sequence modeling,

T. K. Rusch, S. Mishra, N. B. Erichson, and M. W. Mahoney, “Long expressive memory for sequence modeling,” inInternational Conference on Learning Representations. 2022

2022
[45]

Coleman,Aspects of symmetry: selected Erice lectures

S. Coleman,Aspects of symmetry: selected Erice lectures. Cambridge University Press, 1988

1988
[46]

Solitons and instantons. an introduction to solitons and instantons in quantum field theory,

R. Rajaraman, “Solitons and instantons. an introduction to solitons and instantons in quantum field theory,”
[47]

Interacting spiral wave patterns underlie complex brain dynamics and are related to cognitive processing,

Y . Xu, X. Long, J. Feng, and P. Gong, “Interacting spiral wave patterns underlie complex brain dynamics and are related to cognitive processing,”Nature human behaviour7no. 7, (2023) 1196–1215

2023
[48]

Recurrent convolutional neural networks for scene labeling,

P. O. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” inProceedings of the 31st International Conference on Machine Learning, vol. 32 of Proceedings of Machine Learning Research, pp. 82–90. PMLR, 2014

2014
[49]

Recurrent convolutional neural network for object recognition,

M. Liang and X. Hu, “Recurrent convolutional neural network for object recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3367–3375. 2015

2015
[50]

Convolutional lstm network: A machine learning approach for precipitation nowcasting,

X. Shi, Z. Chen, H. Wang, D.-Y . Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” inAdvances in Neural Information Processing Systems, vol. 28. 2015

2015
[51]

Delving deeper into convolutional networks for learning video representations,

N. Ballas, L. Yao, C. Pal, and A. Courville, “Delving deeper into convolutional networks for learning video representations,” inInternational Conference on Learning Representations. 2016. 12

2016
[52]

Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms,

Y . Wang, M. Long, J. Wang, Z. Gao, and S. Y . Philip, “Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms,” inAdvances in Neural Information Processing Systems, vol. 30. 2017

2017
[53]

Neural wave machines: Learning spatiotemporally structured representations with locally coupled oscillatory recurrent neural networks,

T. A. Keller and M. Welling, “Neural wave machines: Learning spatiotemporally structured representations with locally coupled oscillatory recurrent neural networks,” inProceedings of the 40th International Conference on Machine Learning, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, eds., vol. 202 ofProceedings of Machine Lea...

2023
[54]

M. E. Peskin and D. V . Schroeder,An Introduction to quantum field theory. Addison-Wesley, Reading, USA, 1995

1995
[55]

Statistical dynamics of classical systems,

P. C. Martin, E. D. Siggia, and H. A. Rose, “Statistical dynamics of classical systems,”Physical Review A8no. 1, (1973) 423

1973
[56]

Chaos in random neural networks,

H. Sompolinsky, A. Crisanti, and H.-J. Sommers, “Chaos in random neural networks,”Physical review letters61no. 3, (1988) 259

1988
[57]

Path integral approach to random neural networks,

A. Crisanti and H. Sompolinsky, “Path integral approach to random neural networks,”Physical Review E98no. 6, (2018) 062120

2018
[58]

A correspondence between random neural networks and statistical field theory,

S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, “A correspondence between random neural networks and statistical field theory,”arXiv preprint arXiv:1710.06570(2017)

Pith/arXiv arXiv 2017
[59]

Pytorch: An imperative style, high-performance deep learning library,

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems32(2019)

2019
[60]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv:1412.6980 [cs.LG].https://arxiv.org/abs/1412.6980

Pith/arXiv arXiv
[61]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,

H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,”arXiv preprint arXiv:1708.07747(2017) . 13 Appendices Table of Contents A Background on Goldstone modes 14 B Implementation of equivariant layers 15 C Path integral for stochastic systems 16 C.1U(1)equivariant feedforward network . . . ...

Pith/arXiv arXiv 2017

[1] [1]

Broken Symmetries,

J. Goldstone, A. Salam, and S. Weinberg, “Broken Symmetries,”Phys. Rev.127(1962) 965–970

1962

[2] [2]

Phonons as goldstone bosons,

H. Leutwyler, “Phonons as goldstone bosons,”Helv. Phys. Acta70(1997) 275–286, arXiv:hep-ph/9609466

Pith/arXiv arXiv 1997

[3] [3]

Quantum phase transitions,

S. Sachdev, “Quantum phase transitions,”Physics world12no. 4, (1999) 33

1999

[4] [4]

Phenomenological Lagrangians,

S. Weinberg, “Phenomenological Lagrangians,”Physica A96no. 1-2, (1979) 327–340

1979

[5] [5]

Generalized Global Symmetries,

D. Gaiotto, A. Kapustin, N. Seiberg, and B. Willett, “Generalized Global Symmetries,”JHEP 02(2015) 172,arXiv:1412.5148 [hep-th]

Pith/arXiv arXiv 2015

[6] [6]

Higher-form symmetries and spontaneous symmetry breaking,

E. Lake, “Higher-form symmetries and spontaneous symmetry breaking,”arXiv:1802.07747 [hep-th]

Pith/arXiv arXiv

[7] [7]

Goldstone modes and photonization for higher form symmetries,

D. M. Hofman and N. Iqbal, “Goldstone modes and photonization for higher form symmetries,” arXiv:1802.09512 [hep-th]

arXiv

[8] [8]

Exponential expressivity in deep neural networks through transient chaos,

B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein, and S. Ganguli, “Exponential expressivity in deep neural networks through transient chaos,”Advances in neural information processing systems29(2016)

2016

[9] [9]

Deep information propagation,

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-Dickstein, “Deep information propagation,” inInternational Conference on Learning Representations. 2017. https://openreview.net/forum?id=H1W1UN9gg

2017

[10] [10]

Mean field residual networks: On the edge of chaos,

G. Yang and S. Schoenholz, “Mean field residual networks: On the edge of chaos,”Advances in neural information processing systems30(2017)

2017

[11] [11]

Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks,

L. Xiao, Y . Bahri, J. Sohl-Dickstein, S. Schoenholz, and J. Pennington, “Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks,” inInternational conference on machine learning, pp. 5393–5402, PMLR. 2018

2018

[12] [12]

Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks,

M. Chen, J. Pennington, and S. Schoenholz, “Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks,” inInternational Conference on Machine Learning, pp. 873–882, PMLR. 2018

2018

[13] [13]

Artificial kuramoto oscillatory neurons,

T. Miyato, S. Löwe, A. Geiger, and M. Welling, “Artificial kuramoto oscillatory neurons,” arXiv preprint arXiv:2410.13821(2024) . 10

arXiv 2024

[14] [14]

Image segmentation with traveling waves in an exactly solvable recurrent neural network,

L. H. Liboni, R. C. Budzinski, A. N. Busch, S. Löwe, T. A. Keller, M. Welling, and L. E. Muller, “Image segmentation with traveling waves in an exactly solvable recurrent neural network,”arXiv preprint arXiv:2311.16943(2023)

arXiv 2023

[15] [15]

Traveling waves encode the recent past and enhance sequence learning,

T. A. Keller, L. Muller, T. Sejnowski, and M. Welling, “Traveling waves encode the recent past and enhance sequence learning,”arXiv preprint arXiv:2309.08045(2023)

arXiv 2023

[16] [16]

Traveling waves in the prefrontal cortex during working memory,

S. Bhattacharya, S. L. Brincat, M. Lundqvist, and E. K. Miller, “Traveling waves in the prefrontal cortex during working memory,”PLOS Computational Biology18no. 1, (01, 2022) 1–22.https://doi.org/10.1371/journal.pcbi.1009827

work page doi:10.1371/journal.pcbi.1009827 2022

[17] [17]

Cortical travelling waves: mechanisms and computational principles,

L. Muller, F. Chavane, J. Reynolds, and T. J. Sejnowski, “Cortical travelling waves: mechanisms and computational principles,”Nature Reviews Neuroscience19no. 5, (2018) 255–268.https://doi.org/10.1038/nrn.2018.20

work page doi:10.1038/nrn.2018.20 2018

[18] [18]

Planar, spiral, and concentric traveling waves distinguish behavioral states in human memory,

A. Das, E. Zabeh, B. Ermentrout, and J. Jacobs, “Planar, spiral, and concentric traveling waves distinguish behavioral states in human memory,”Nature Communications(2026) . https://doi.org/10.1038/s41467-026-71386-z

work page doi:10.1038/s41467-026-71386-z 2026

[19] [19]

Group equivariant convolutional networks,

T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational conference on machine learning, pp. 2990–2999, PMLR. 2016

2016

[20] [20]

Equivariant and coordinate independent convolutional networks,

M. Weiler, P. Forré, E. Verlinde, and M. Welling, “Equivariant and coordinate independent convolutional networks,”A Gauge Field Theory of Neural Networks110(2023)

2023

[21] [21]

Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,

M. M. Bronstein, J. Bruna, T. Cohen, and P. Veliˇckovi´c, “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,”arXiv preprint arXiv:2104.13478(2021)

Pith/arXiv arXiv 2021

[22] [22]

Symmetry breaking and equivariant neural networks,

S.-O. Kaba and S. Ravanbakhsh, “Symmetry breaking and equivariant neural networks,”arXiv preprint arXiv:2312.09016(2023)

arXiv 2023

[23] [23]

Finding symmetry breaking order parameters with euclidean neural networks,

T. E. Smidt, M. Geiger, and B. K. Miller, “Finding symmetry breaking order parameters with euclidean neural networks,”Physical Review Research3no. 1, (2021) L012002

2021

[24] [24]

Complex-valued autoencoders for object discovery,

S. Löwe, P. Lippe, M. Rudolph, and M. Welling, “Complex-valued autoencoders for object discovery,”arXiv preprint arXiv:2204.02075(2022)

arXiv 2022

[25] [25]

Binding dynamics in rotating features,

S. Löwe, F. Locatello, and M. Welling, “Binding dynamics in rotating features,”arXiv preprint arXiv:2402.05627(2024)

arXiv 2024

[26] [26]

Topological defects propagate information in deep neural networks,

N. Iqbal and M. Welling, “Topological defects propagate information in deep neural networks,” inNeurIPS 2025 AI for Science Workshop. 2025. https://openreview.net/forum?id=fM5s2Tqe0t

2025

[27] [27]

Beyond relu: Bifurcation, oversmoothing, and topological priors,

E. Turan, G. Abel, M. Behmanesh, E. Pierson, and M. Ovsjanikov, “Beyond relu: Bifurcation, oversmoothing, and topological priors,”arXiv preprint arXiv:2602.15634(2026)

Pith/arXiv arXiv 2026

[28] [28]

Symmetry-protected lyapunov neutral modes in equivariant recurrent networks,

H. H. Mo, “Symmetry-protected lyapunov neutral modes in equivariant recurrent networks,” arXiv preprint arXiv:2605.03338(2026)

Pith/arXiv arXiv 2026

[29] [29]

Dynamic routing between capsules,

S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,”Advances in neural information processing systems30(2017)

2017

[30] [30]

P. M. Chaikin, T. C. Lubensky, and T. A. Witten,Principles of condensed matter physics, vol. 10. Cambridge university press Cambridge, 1995

1995

[31] [31]

Deep neural networks as gaussian processes,

J. Lee, Y . Bahri, R. Novak, S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, “Deep neural networks as gaussian processes,”arXiv preprint arXiv:1711.00165(2017)

Pith/arXiv arXiv 2017

[32] [32]

Batch normalization provably avoids ranks collapse for randomly initialised deep networks,

H. Daneshmand, J. Kohler, F. Bach, T. Hofmann, and A. Lucchi, “Batch normalization provably avoids ranks collapse for randomly initialised deep networks,”Advances in Neural Information Processing Systems33(2020) 18387–18398

2020

[33] [33]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Computation9no. 8, (1997) 1735–1780. 11

1997

[34] [34]

Neural turing machines,

A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,”arXiv:1410.5401 [cs.NE]

Pith/arXiv arXiv

[35] [35]

Recurrent orthogonal networks and long-memory tasks,

M. Henaff, A. Szlam, and Y . LeCun, “Recurrent orthogonal networks and long-memory tasks,” inProceedings of the 33rd International Conference on Machine Learning, vol. 48 of Proceedings of Machine Learning Research, pp. 2034–2042. PMLR, 2016

2034

[36] [36]

Improving the gating mechanism of recurrent neural networks,

A. Gu, C. Gulcehre, T. L. Paine, M. Hoffman, and R. Pascanu, “Improving the gating mechanism of recurrent neural networks,”arXiv:1910.09890 [cs.NE]

arXiv 1910

[37] [37]

A simple way to initialize recurrent networks of rectified linear units,

Q. V . Le, N. Jaitly, and G. E. Hinton, “A simple way to initialize recurrent networks of rectified linear units,”arXiv preprint arXiv:1504.00941(2015)

Pith/arXiv arXiv 2015

[38] [38]

Unitary evolution recurrent neural networks,

M. Arjovsky, A. Shah, and Y . Bengio, “Unitary evolution recurrent neural networks,” in Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, p. 1120–1128. JMLR.org, 2016

2016

[39] [39]

Finding structure in time,

J. L. Elman, “Finding structure in time,”Cognitive Science14no. 2, (1990) 179–211

1990

[40] [40]

Learning phrase representations using RNN encoder–decoder for statistical machine translation,

K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, 2014

2014

[41] [41]

Independently recurrent neural network (indrnn): Building a longer and deeper rnn,

S. Li, W. Li, C. Cook, C. Zhu, and Y . Gao, “Independently recurrent neural network (indrnn): Building a longer and deeper rnn,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5457–5466. IEEE Computer Society, Los Alamitos, CA, USA, Jun, 2018.https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00572

work page doi:10.1109/cvpr.2018.00572 2018

[42] [42]

Lipschitz recurrent neural networks,

N. B. Erichson, O. Azencot, A. Queiruga, L. Hodgkinson, and M. W. Mahoney, “Lipschitz recurrent neural networks,” inInternational Conference on Learning Representations. 2021. https://openreview.net/forum?id=-N7PBXqOUJZ

2021

[43] [43]

Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies,

T. K. Rusch and S. Mishra, “Coupled oscillatory recurrent neural network (cornn): An accurate and (gradient) stable architecture for learning long time dependencies,” inInternational Conference on Learning Representations. 2021

2021

[44] [44]

Long expressive memory for sequence modeling,

T. K. Rusch, S. Mishra, N. B. Erichson, and M. W. Mahoney, “Long expressive memory for sequence modeling,” inInternational Conference on Learning Representations. 2022

2022

[45] [45]

Coleman,Aspects of symmetry: selected Erice lectures

S. Coleman,Aspects of symmetry: selected Erice lectures. Cambridge University Press, 1988

1988

[46] [46]

Solitons and instantons. an introduction to solitons and instantons in quantum field theory,

R. Rajaraman, “Solitons and instantons. an introduction to solitons and instantons in quantum field theory,”

[47] [47]

Interacting spiral wave patterns underlie complex brain dynamics and are related to cognitive processing,

Y . Xu, X. Long, J. Feng, and P. Gong, “Interacting spiral wave patterns underlie complex brain dynamics and are related to cognitive processing,”Nature human behaviour7no. 7, (2023) 1196–1215

2023

[48] [48]

Recurrent convolutional neural networks for scene labeling,

P. O. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” inProceedings of the 31st International Conference on Machine Learning, vol. 32 of Proceedings of Machine Learning Research, pp. 82–90. PMLR, 2014

2014

[49] [49]

Recurrent convolutional neural network for object recognition,

M. Liang and X. Hu, “Recurrent convolutional neural network for object recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3367–3375. 2015

2015

[50] [50]

Convolutional lstm network: A machine learning approach for precipitation nowcasting,

X. Shi, Z. Chen, H. Wang, D.-Y . Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” inAdvances in Neural Information Processing Systems, vol. 28. 2015

2015

[51] [51]

Delving deeper into convolutional networks for learning video representations,

N. Ballas, L. Yao, C. Pal, and A. Courville, “Delving deeper into convolutional networks for learning video representations,” inInternational Conference on Learning Representations. 2016. 12

2016

[52] [52]

Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms,

Y . Wang, M. Long, J. Wang, Z. Gao, and S. Y . Philip, “Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms,” inAdvances in Neural Information Processing Systems, vol. 30. 2017

2017

[53] [53]

Neural wave machines: Learning spatiotemporally structured representations with locally coupled oscillatory recurrent neural networks,

T. A. Keller and M. Welling, “Neural wave machines: Learning spatiotemporally structured representations with locally coupled oscillatory recurrent neural networks,” inProceedings of the 40th International Conference on Machine Learning, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, eds., vol. 202 ofProceedings of Machine Lea...

2023

[54] [54]

M. E. Peskin and D. V . Schroeder,An Introduction to quantum field theory. Addison-Wesley, Reading, USA, 1995

1995

[55] [55]

Statistical dynamics of classical systems,

P. C. Martin, E. D. Siggia, and H. A. Rose, “Statistical dynamics of classical systems,”Physical Review A8no. 1, (1973) 423

1973

[56] [56]

Chaos in random neural networks,

H. Sompolinsky, A. Crisanti, and H.-J. Sommers, “Chaos in random neural networks,”Physical review letters61no. 3, (1988) 259

1988

[57] [57]

Path integral approach to random neural networks,

A. Crisanti and H. Sompolinsky, “Path integral approach to random neural networks,”Physical Review E98no. 6, (2018) 062120

2018

[58] [58]

A correspondence between random neural networks and statistical field theory,

S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, “A correspondence between random neural networks and statistical field theory,”arXiv preprint arXiv:1710.06570(2017)

Pith/arXiv arXiv 2017

[59] [59]

Pytorch: An imperative style, high-performance deep learning library,

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,et al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems32(2019)

2019

[60] [60]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv:1412.6980 [cs.LG].https://arxiv.org/abs/1412.6980

Pith/arXiv arXiv

[61] [61]

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,

H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,”arXiv preprint arXiv:1708.07747(2017) . 13 Appendices Table of Contents A Background on Goldstone modes 14 B Implementation of equivariant layers 15 C Path integral for stochastic systems 16 C.1U(1)equivariant feedforward network . . . ...

Pith/arXiv arXiv 2017