On the Spatiotemporal Dynamics of Generalization in Neural Networks
Pith reviewed 2026-05-16 08:57 UTC · model grok-4.3
The pith
A neural architecture derived from locality, symmetry and stability postulates achieves perfect addition on sequences up to a million digits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that any system capable of true generalization must satisfy the physical postulates of locality, symmetry and stability; enforcing them directly yields the SEAD neural cellular automaton whose iterated local rules produce scale-invariant behavior, including 100 percent accurate addition from 16-digit training to one-million-digit test cases and exact reproduction of the Turing-complete Rule 110 automaton without trajectory divergence.
What carries the argument
The SEAD architecture: a neural cellular automaton that applies fixed local convolutional rules iteratively until the state converges to a discrete attractor.
If this is right
- Parity is solved with perfect length generalization through explicit light-cone propagation of information.
- Addition exhibits input-adaptive computation, using more iterations only when needed, while remaining exactly correct up to one million digits.
- Rule 110, a Turing-complete cellular automaton, is learned without divergence or loss of long-term behavior.
- Generalization is obtained without increasing parameter count or training data volume.
Where Pith is reading between the lines
- The same iterative attractor mechanism might allow length generalization on other algorithmic tasks such as sorting or matrix multiplication.
- Requiring convergence to discrete attractors could restrict use on problems whose natural outputs are continuous or probabilistic.
- If the postulates truly capture the physics of computation, then any architecture ignoring them should fail at arbitrary-length generalization regardless of scale.
- Testing whether removing one postulate while keeping the others intact destroys generalization would directly probe the necessity claim.
Load-bearing premise
The three physical postulates are both necessary and sufficient for generalization, and the SEAD architecture follows from them without any additional task-specific design choices.
What would settle it
Training a network that violates at least one of the three postulates yet still achieves 100 percent accuracy on million-digit addition, or showing that SEAD itself fails to generalize on a new task whose solution satisfies locality, symmetry and stability.
Figures
read the original abstract
Why do neural networks fail to generalize addition from 16-digit to 32-digit numbers, while a child who learns the rule can apply it to arbitrarily long sequences? We argue that this failure is not an engineering problem but a violation of physical postulates. Drawing inspiration from physics, we identify three constraints that any generalizing system must satisfy: (1) Locality -- information propagates at finite speed; (2) Symmetry -- the laws of computation are invariant across space and time; (3) Stability -- the system converges to discrete attractors that resist noise accumulation. From these postulates, we derive -- rather than design -- the Spatiotemporal Evolution with Attractor Dynamics (SEAD) architecture: a neural cellular automaton where local convolutional rules are iterated until convergence. Experiments on three tasks validate our theory: (1) Parity -- demonstrating perfect length generalization via light-cone propagation; (2) Addition -- achieving scale-invariant inference from L=16 to L=1 million with 100% accuracy, exhibiting input-adaptive computation; (3) Rule 110 -- learning a Turing-complete cellular automaton without trajectory divergence. Our results suggest that the gap between statistical learning and logical reasoning can be bridged -- not by scaling parameters, but by respecting the physics of computation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that neural network generalization failures (e.g., addition from 16 to 32 digits) violate three physical postulates—locality (finite propagation speed), symmetry (invariance across space/time), and stability (convergence to noise-resistant discrete attractors). From these, the authors derive rather than design the SEAD architecture: a neural cellular automaton applying local convolutional rules iteratively until attractor convergence. Experiments report perfect length generalization on parity via light-cone propagation, 100% scale-invariant accuracy on addition from L=16 to L=1 million with input-adaptive computation, and learning of Rule 110 without trajectory divergence.
Significance. If the derivation is shown to be forced and the extreme-length results hold under rigorous controls, the work offers a principled route to scale-invariant logical generalization grounded in physical constraints rather than parameter scaling. The reported 100% accuracy on addition to 10^6 digits and the input-adaptive behavior would constitute a notable empirical advance for algorithmic reasoning tasks.
major comments (3)
- [Abstract / derivation of SEAD] Abstract and derivation section: The claim that SEAD follows directly from the three postulates lacks any equations or mapping steps showing how locality, symmetry, and stability uniquely force iterative local conv rules to a fixed-point attractor (as opposed to fixed-depth equivariant CNNs, translation-invariant RNNs, or non-iterated message-passing graphs that also satisfy the postulates). This is load-bearing for the central thesis that the architecture is derived rather than additionally chosen.
- [Addition experiments] Addition task results: The 100% accuracy claim from L=16 training to L=1 million inference is central but unsupported by any reported baseline comparisons, error bars, number of large-L test instances, or verification that no global pooling/attention leaks length information. Without these controls, it is impossible to confirm that the result stems from the postulates rather than task-specific implementation details.
- [Rule 110 experiments / stability analysis] Stability and convergence: The stability postulate is invoked to guarantee discrete attractors that resist noise, yet no formal criterion (e.g., Lyapunov function, contraction mapping, or explicit fixed-point condition) is supplied showing why iterated local rules converge without divergence on Rule 110 while satisfying the other postulates.
minor comments (2)
- [Abstract] Abstract: 'light-cone propagation' is used without definition or pointer to the relevant section or figure.
- [Throughout] Notation: Ensure the SEAD acronym and 'attractor dynamics' are introduced with consistent mathematical notation (e.g., update rule, convergence threshold) on first use.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments help clarify how to strengthen the presentation of the derivation and the rigor of the experimental claims. We address each major point below and indicate the revisions that will be incorporated in the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract / derivation of SEAD] Abstract and derivation section: The claim that SEAD follows directly from the three postulates lacks any equations or mapping steps showing how locality, symmetry, and stability uniquely force iterative local conv rules to a fixed-point attractor (as opposed to fixed-depth equivariant CNNs, translation-invariant RNNs, or non-iterated message-passing graphs that also satisfy the postulates). This is load-bearing for the central thesis that the architecture is derived rather than additionally chosen.
Authors: We agree that the derivation would be clearer with explicit equations. In the revised manuscript we will expand Section 3 to include a step-by-step formal mapping: (i) locality restricts updates to finite-support convolutional kernels; (ii) spatiotemporal symmetry requires the same kernel to be applied uniformly at every location and iteration; (iii) stability is encoded by requiring the update operator to be a contraction mapping whose unique fixed point is reached in finite steps for any input length. These conditions exclude fixed-depth CNNs (which violate time-invariance for arbitrary lengths) and non-iterated message-passing graphs (which lack guaranteed attractor convergence). The added equations will make the uniqueness explicit. revision: yes
-
Referee: [Addition experiments] Addition task results: The 100% accuracy claim from L=16 training to L=1 million inference is central but unsupported by any reported baseline comparisons, error bars, number of large-L test instances, or verification that no global pooling/attention leaks length information. Without these controls, it is impossible to confirm that the result stems from the postulates rather than task-specific implementation details.
Authors: We accept that additional controls are necessary. The revised version will report: (a) direct comparisons against Transformer and LSTM baselines trained on the same 16-digit regime and evaluated at 1 million digits; (b) mean accuracy and standard deviation across five independent seeds; (c) explicit counts (1 000 test instances for each length up to 10^6); and (d) an architectural audit confirming that only strictly local convolutions are used, with no global pooling or attention that could encode length. These additions will be placed in the experimental section and supplementary material. revision: yes
-
Referee: [Rule 110 experiments / stability analysis] Stability and convergence: The stability postulate is invoked to guarantee discrete attractors that resist noise, yet no formal criterion (e.g., Lyapunov function, contraction mapping, or explicit fixed-point condition) is supplied showing why iterated local rules converge without divergence on Rule 110 while satisfying the other postulates.
Authors: Stability is currently supported by empirical convergence on Rule 110 under injected noise. In the revision we will add a contraction-mapping argument in the stability subsection: the learned local update is shown to be Lipschitz with constant <1 on the discrete state space, guaranteeing a unique fixed point reached in bounded iterations. While a general Lyapunov function for arbitrary rules remains future work, the contraction condition directly ties the observed non-divergence to the stability postulate and will be stated formally. revision: partial
Circularity Check
No circularity: derivation asserted but not reduced to inputs by construction
full rationale
The paper asserts that the three postulates (locality, symmetry, stability) directly yield the SEAD architecture of iterated local convolutional rules to attractor, yet the provided abstract and description contain no equations, self-citations, or fitted parameters that reduce the claimed derivation to a tautology or prior result by the same authors. No load-bearing step equates the output architecture to its inputs by definition, renames a known result, or imports uniqueness via self-citation. The central claim remains an assertion whose validity can be checked against external benchmarks (e.g., whether other models satisfying the same postulates also generalize), but the derivation chain itself does not collapse into circularity.
Axiom & Free-Parameter Ledger
axioms (3)
- domain assumption Locality: information propagates at finite speed
- domain assumption Symmetry: laws of computation invariant across space and time
- domain assumption Stability: system converges to discrete attractors resisting noise
invented entities (1)
-
SEAD architecture
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction (spacetime-emergence certificate, Lorentzian signature, light-cone classification) matches?
matchesMATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.
Postulate 1 (Relativistic Causality): strict Causal Horizon ... Δx/Δt ≤ c ⟺ h_{t+1}(x) = f(h_t(N_c(x))) ... light-cone propagation
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction (translation invariance from distinction) matches?
matchesMATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.
Postulate 2 (Spacetime Symmetry): f_{x,t}(·) ≡ f_{x+Δx,t+Δt}(·) ≡ f_shared(·) ... translation invariance
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J(x)=½(x+x^{-1})−1 unique calibrated reciprocal cost whose minima are attractors) matches?
matchesMATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.
Postulate 3 (Thermodynamic Dissipation and Stability): lim dist(f^t(h+ε),A)=0 ... discrete attractors ... Contractive Nonlinearity
-
IndisputableMonolith/Foundation/DimensionForcing.lean8-tick period + recognition lattices echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Corollary 1: Isomorphism to Cellular Automata ... ⟨L,S,N,f⟩ ... SEAD: neural cellular automaton iterated until convergence
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 4 Pith papers
-
On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 ti...
-
Structural Generalization on SLOG without Hand-Written Rules
A neural cellular automaton model learns all compositional rules from data via local iteration and achieves 100% type-exact match on 11 of 17 structural generalization categories on the SLOG benchmark.
-
On the Emergence of Syntax by Means of Local Interaction
A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.
-
Structural Generalization on SLOG without Hand-Written Rules
A neural cellular automaton learns compositional rules from data alone to achieve structural generalization on the SLOG semantic parsing benchmark, reaching 67.3% accuracy and fully succeeding on 11 of 17 categories.
Reference graph
Works this paper leans on
-
[1]
What algorithms can transformers learn? a study in length generalization
H. Zhou et al., “What Algorithms Can Transformers Learn? A Study in Length Generalization, ” no. arXiv:2310.16028. arXiv, Oct. 2023. doi: 10.48550/arXiv.2310.16028
-
[2]
International Conference on Learning Representations , month =
G. Delétang et al., “Neural Networks and the Chomsky Hierarchy, ” no. arXiv:2207.02098. arXiv, Feb
-
[3]
doi: 10.48550/arXiv.2207.02098
-
[4]
On the ability and limitations of transformers to recognize formal languages
S. Bhattamishra, K. Ahuja, and N. Goyal, “On the Ability and Limitations of Transformers to Recognize Formal Languages, ” no. arXiv:2009.11264. arXiv, Oct. 2020. doi: 10.48550/arXiv.2009.11264
-
[5]
P. Cheung, M. Rubenson, and D. Barner, “To Infinity and beyond: Children Generalize the Successor Function to All Possible Numbers Years after Learning to Count, ” Cognitive Psychology, vol. 92, pp. 22– 36, Feb. 2017, doi: 10.1016/j.cogpsych.2016.11.002
-
[6]
Johan Håstad.Computational Limitations of Small-Depth Circuits
M. Hahn, “Theoretical Limitations of Self-Attention in Neural Sequence Models, ” Transactions of the Association for Computational Linguistics, vol. 8, pp. 156–171, Jan. 2020, doi: 10.1162/tacl_a_00306
-
[7]
The Parallelism Tradeoff: Limitations of Log-Precision Transformers,
W. Merrill and A. Sabharwal, “The Parallelism Tradeoff: Limitations of Log-Precision Transformers, ” Transactions of the Association for Computational Linguistics, vol. 11, pp. 531–545, June 2023, doi: 10.1162/ tacl_a_00562
work page 2023
-
[8]
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count,
H. Cho, J. Cha, S. Bhojanapalli, and C. Yun, “Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count, ” no. arXiv:2410.15787. arXiv, Apr. 2025. doi: 10.48550/arXiv.2410.15787
-
[9]
A Formal Framework for Understanding Length Generalization in Transformers,
X. Huang et al., “A Formal Framework for Understanding Length Generalization in Transformers, ” no. arXiv:2410.02140. arXiv, Apr. 2025. doi: 10.48550/arXiv.2410.02140
-
[10]
arXiv preprint arXiv:2312.17044 (2024)
L. Zhao et al., “Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding, ” no. arXiv:2312.17044. arXiv, Apr. 2024. doi: 10.48550/arXiv.2312.17044
-
[11]
Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic,
S. Duan, Y. Shi, and W. Xu, “Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic, ” pp. 0–18
-
[12]
Show Your Work: Scratchpads for Intermediate Computation with Language Models,
M. Nye et al., “Show Your Work: Scratchpads for Intermediate Computation with Language Models, ” Oct. 2021
work page 2021
-
[13]
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective,
G. Feng, B. Zhang, Y. Gu, H. Ye, D. He, and L. Wang, “Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective, ” Advances in Neural Information Processing Systems, vol. 36, pp. 70757–70798, Dec. 2023
work page 2023
-
[14]
An Overview of Statistical Learning Theory,
V. Vapnik, “An Overview of Statistical Learning Theory, ” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988–999, Sept. 1999, doi: 10.1109/72.788640
-
[15]
Pearl, Causality: Models, Reasoning, and Inference , 2 edition, reprinted with corrections
J. Pearl, Causality: Models, Reasoning, and Inference , 2 edition, reprinted with corrections. Cambridge New York, NY Port Melbourne New Delhi Singapore: Cambridge University Press, 2022
work page 2022
-
[16]
A Spacetime Perspective on Dynamical Computation in Neural Information Processing Systems,
T. A. Keller, L. Muller, T. J. Sejnowski, and M. Welling, “A Spacetime Perspective on Dynamical Computation in Neural Information Processing Systems, ” no. arXiv:2409.13669. arXiv, Sept. 2024. doi: 10.48550/arXiv.2409.13669
-
[17]
J. Von Neumann and A. W. (. W. Burks, Theory of Self-Reproducing Automata . Urbana, University of Illinois Press, 1966
work page 1966
-
[18]
Why Are Sensitive Functions Hard for Transformers?,
M. Hahn and M. Rofin, “Why Are Sensitive Functions Hard for Transformers?, ” no. arXiv:2402.09963. arXiv, May 2024. doi: 10.48550/arXiv.2402.09963
-
[19]
Universality in Elementary Cellular Automata,
M. Cook, “Universality in Elementary Cellular Automata, ” Complex Systems, vol. 15, no. 1, pp. 1–40, Mar. 2004, doi: 10.25088/ComplexSystems.15.1.1
-
[20]
The Role of Sparsity for Length Generalization in Transformers,
N. Golowich, S. Jelassi, D. Brandfonbrener, S. M. Kakade, and E. Malach, “The Role of Sparsity for Length Generalization in Transformers, ” no. arXiv:2502.16792. arXiv, Feb. 2025. doi: 10.48550/arXiv.2502.16792
-
[21]
Looped Transformers for Length Generalization,
Y. Fan, Y. Du, K. Ramchandran, and K. Lee, “Looped Transformers for Length Generalization, ” pp. 0– 19, 2025. 17
work page 2025
-
[22]
M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and Ł. Kaiser, “Universal Transformers, ” no. arXiv:1807.03819. arXiv, Mar. 2019. doi: 10.48550/arXiv.1807.03819
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1807.03819 2019
-
[23]
Ł. Kaiser and I. Sutskever, “Neural GPUs Learn Algorithms, ” no. arXiv:1511.08228. arXiv, Mar. 2016. doi: 10.48550/arXiv.1511.08228
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1511.08228 2016
-
[24]
To Infinity and beyond: Tool-Use Unlocks Length Generalization in State Space Models,
E. Malach et al. , “To Infinity and beyond: Tool-Use Unlocks Length Generalization in State Space Models, ” no. arXiv:2510.14826. arXiv, Oct. 2025. doi: 10.48550/arXiv.2510.14826
-
[25]
Softmax trans- formers are turing-complete.CoRR, abs/2511.20038, 2025
H. Jiang, M. Hahn, G. Zetzsche, and A. W. Lin, “Softmax Transformers Are Turing-Complete, ” no. arXiv:2511.20038. arXiv, Nov. 2025. doi: 10.48550/arXiv.2511.20038
-
[26]
Lower Bounds for Chain-of-Thought Reasoning in Hard- Attention Transformers,
A. Amiri, X. Huang, M. Rofin, and M. Hahn, “Lower Bounds for Chain-of-Thought Reasoning in Hard- Attention Transformers, ” no. arXiv:2502.02393. arXiv, July 2025. doi: 10.48550/arXiv.2502.02393
-
[27]
Cellular Automata as Convolutional Neural Networks,
W. Gilpin, “Cellular Automata as Convolutional Neural Networks, ” Physical Review E, vol. 100, no. 3, p. 32402, Sept. 2019, doi: 10.1103/PhysRevE.100.032402
-
[28]
Growing Neural Cellular Automata , volume =
A. Mordvintsev, E. Randazzo, E. Niklasson, and M. Levin, “Growing Neural Cellular Automata, ” Distill, vol. 5, no. 2, p. e23, Feb. 2020, doi: 10.23915/distill.00023
-
[29]
Neural Cellular Automata: Applications to Biology and beyond Classical AI,
B. Hartl, M. Levin, and L. Pio-Lopez, “Neural Cellular Automata: Applications to Biology and beyond Classical AI, ” Physics of Life Reviews, vol. 56, pp. 94–108, Mar. 2026, doi: 10.1016/j.plrev.2025.11.010
-
[30]
Neural Cellular Automata for ARC-AGI,
K. Xu and R. Miikkulainen, “Neural Cellular Automata for ARC-AGI, ” in ALIFE 2025: Ciphers of Life: Proceedings of the Artificial Life Conference 2025, MIT Press, Oct. 2025. doi: 10.1162/ISAL.a.844
-
[31]
S. Hooker, “The Hardware Lottery, ” no. arXiv:2009.06489. arXiv, Sept. 2020. doi: 10.48550/ arXiv.2009.06489
-
[32]
Noether Networks: Meta-learning Useful Conserved Quantities,
F. Alet, D. Doblar, A. Zhou, J. Tenenbaum, K. Kawaguchi, and C. Finn, “Noether Networks: Meta-learning Useful Conserved Quantities, ” pp. 0–21
-
[33]
Exploring the Long-Term Generalization of Counting Behavior in RNNs,
N. El-Naggar, P. Madhyastha, and T. Weyde, “Exploring the Long-Term Generalization of Counting Behavior in RNNs, ” no. arXiv:2211.16429. arXiv, Nov. 2022. doi: 10.48550/arXiv.2211.16429
-
[34]
Originally circulated 2019; published 2023
A. d'Avila Garcez and L. C. Lamb, “Neurosymbolic AI: The 3rd Wave, ” Artificial Intelligence Review, vol. 56, no. 11, pp. 12387–12406, Nov. 2023, doi: 10.1007/s10462-023-10448-w
-
[35]
Energy-Based Transformers Are Scalable Learners and Thinkers,
A. Gladstone et al. , “Energy-Based Transformers Are Scalable Learners and Thinkers, ” no. arXiv:2507.02092. arXiv, July 2025. doi: 10.48550/arXiv.2507.02092
-
[36]
A. Graves, G. Wayne, and I. Danihelka, “Neural Turing Machines, ” no. arXiv:1410.5401. arXiv, Dec. 2014. doi: 10.48550/arXiv.1410.5401
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1410.5401 2014
-
[37]
Thermodynamic State Machine Network,
T. Hylton, “Thermodynamic State Machine Network, ” Entropy, vol. 24, no. 6, p. 744, June 2022, doi: 10.3390/e24060744
-
[38]
Kahneman, Thinking, Fast and Slow
D. Kahneman, Thinking, Fast and Slow. London: PENGUIN, 2024
work page 2024
-
[39]
Exposing Attention Glitches with Flip-Flop Language Modeling,
B. Liu, J. Ash, S. Goel, A. Krishnamurthy, and C. Zhang, “Exposing Attention Glitches with Flip-Flop Language Modeling, ” Advances in Neural Information Processing Systems, vol. 36, pp. 25549–25583, Dec. 2023
work page 2023
-
[40]
Mamba Modulation: On the Length Generalization of Mamba,
P. Lu et al., “Mamba Modulation: On the Length Generalization of Mamba, ” no. arXiv:2509.19633. arXiv, Dec. 2025. doi: 10.48550/arXiv.2509.19633
-
[41]
G. J. Martinez, A. Adamatzky, F. Chen, and L. Chua, “On Soliton Collisions between Localizations in Complex Elementary Cellular Automata: Rules 54 and 110 and Beyond, ” no. arXiv:1301.6258. arXiv, Jan
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
doi: 10.48550/arXiv.1301.6258
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1301.6258
-
[43]
J. Park et al., “Can Mamba Learn How to Learn? A Comparative Study on in-Context Learning Tasks, ” no. arXiv:2402.04248. arXiv, Apr. 2024. doi: 10.48550/arXiv.2402.04248
-
[44]
RWKV: Reinventing RNNs for the Transformer Era
B. Peng et al., “RWKV: Reinventing RNNs for the Transformer Era, ” no. arXiv:2305.13048. arXiv, Dec
work page internal anchor Pith review Pith/arXiv arXiv
-
[45]
doi: 10.48550/arXiv.2305.13048
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.13048
-
[46]
LSTM Networks Can Perform Dynamic Counting,
M. Suzgun, Y. Belinkov, S. Shieber, and S. Gehrmann, “LSTM Networks Can Perform Dynamic Counting, ” in Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges , J. Eisner, M. Gallé, J. Heinz, A. Quattoni, and G. Rabusseau, Eds., Florence: Association for Computational Linguistics, Aug. 2019, pp. 44–54. doi: 10.18653/v1/W19-3905. 18
-
[47]
Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks,
H. Tanaka and D. Kunin, “Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks, ” in Advances in Neural Information Processing Systems, Nov. 2021
work page 2021
-
[48]
Computation Theory of Cellular Automata,
S. Wolfram, “Computation Theory of Cellular Automata, ” Communications in Mathematical Physics, vol. 96, no. 1, pp. 15–57, Mar. 1984, doi: 10.1007/BF01217347
-
[49]
Universality and Complexity in Cellular Automata
S. Wolfram, “Universality and Complexity in Cellular Automata, ” Physica D: Nonlinear Phenomena, vol. 10, no. 1, pp. 1–35, Jan. 1984, doi: 10.1016/0167-2789(84)90245-8
-
[50]
Wolfram, A New Kind of Science
S. Wolfram, A New Kind of Science. Champaign (Ill.): Wolfram, 2002. 19
work page 2002
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.