On the Ability and Limitations of Transformers to Recognize Formal Languages

Bhattamishra, S · 2009 · arXiv 2009.11264

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication

cs.LG · 2026-03-30 · unverdicted · novelty 8.0

Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.

On the Emergence of Syntax by Means of Local Interaction

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.

Learning State-Tracking from Code Using Linear RNNs

cs.LG · 2026-02-16 · unverdicted · novelty 7.0

Linear RNNs track states from REPL code traces of permutations better than Transformers, but non-linear RNNs outperform them in partially observable probabilistic automata.

On the Spatiotemporal Dynamics of Generalization in Neural Networks

cs.LG · 2026-02-02 · unverdicted · novelty 6.0

Deriving a neural cellular automaton from locality, symmetry, and stability postulates produces 100% accurate addition generalization from 16-digit to 1-million-digit inputs.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

citing papers explorer

Showing 5 of 5 citing papers.

On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication cs.LG · 2026-03-30 · unverdicted · none · ref 7
Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.
On the Emergence of Syntax by Means of Local Interaction cs.CL · 2026-04-20 · unverdicted · none · ref 26
A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.
Learning State-Tracking from Code Using Linear RNNs cs.LG · 2026-02-16 · unverdicted · none · ref 1
Linear RNNs track states from REPL code traces of permutations better than Transformers, but non-linear RNNs outperform them in partially observable probabilistic automata.
On the Spatiotemporal Dynamics of Generalization in Neural Networks cs.LG · 2026-02-02 · unverdicted · none · ref 4
Deriving a neural cellular automaton from locality, symmetry, and stability postulates produces 100% accurate addition generalization from 16-digit to 1-million-digit inputs.
The Serial Scaling Hypothesis cs.LG · 2025-07-16 · unverdicted · none · ref 9
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

On the Ability and Limitations of Transformers to Recognize Formal Languages

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer