Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.
On the Ability and Limitations of Transformers to Recognize Formal Languages
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.
Linear RNNs track states from REPL code traces of permutations better than Transformers, but non-linear RNNs outperform them in partially observable probabilistic automata.
Deriving a neural cellular automaton from locality, symmetry, and stability postulates produces 100% accurate addition generalization from 16-digit to 1-million-digit inputs.
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
citing papers explorer
-
On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.
-
On the Emergence of Syntax by Means of Local Interaction
A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.
-
Learning State-Tracking from Code Using Linear RNNs
Linear RNNs track states from REPL code traces of permutations better than Transformers, but non-linear RNNs outperform them in partially observable probabilistic automata.
-
On the Spatiotemporal Dynamics of Generalization in Neural Networks
Deriving a neural cellular automaton from locality, symmetry, and stability postulates produces 100% accurate addition generalization from 16-digit to 1-million-digit inputs.
-
The Serial Scaling Hypothesis
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.