pith. sign in

On the Ability and Limitations of Transformers to Recognize Formal Languages

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 4 cs.CL 1

years

2026 4 2025 1

verdicts

UNVERDICTED 5

roles

background 1

polarities

background 1

representative citing papers

On the Emergence of Syntax by Means of Local Interaction

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.

Learning State-Tracking from Code Using Linear RNNs

cs.LG · 2026-02-16 · unverdicted · novelty 7.0

Linear RNNs track states from REPL code traces of permutations better than Transformers, but non-linear RNNs outperform them in partially observable probabilistic automata.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

citing papers explorer

Showing 5 of 5 citing papers.

  • On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication cs.LG · 2026-03-30 · unverdicted · none · ref 7

    Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.

  • On the Emergence of Syntax by Means of Local Interaction cs.CL · 2026-04-20 · unverdicted · none · ref 26

    A 2D neural cellular automaton spontaneously self-organizes into a Proto-CKY representation that exhibits syntactic processing capabilities for context-free grammars when trained on membership problems.

  • Learning State-Tracking from Code Using Linear RNNs cs.LG · 2026-02-16 · unverdicted · none · ref 1

    Linear RNNs track states from REPL code traces of permutations better than Transformers, but non-linear RNNs outperform them in partially observable probabilistic automata.

  • On the Spatiotemporal Dynamics of Generalization in Neural Networks cs.LG · 2026-02-02 · unverdicted · none · ref 4

    Deriving a neural cellular automaton from locality, symmetry, and stability postulates produces 100% accurate addition generalization from 16-digit to 1-million-digit inputs.

  • The Serial Scaling Hypothesis cs.LG · 2025-07-16 · unverdicted · none · ref 9

    The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.