An Algebraic View of the Expressivity of Recurrent Language Models

Franz Nowak; Reda Boumasmoud; Ryan Cotterell

arxiv: 2606.01765 · v2 · pith:7AYOBDKYnew · submitted 2026-06-01 · 💻 cs.FL · cs.CL· cs.LG

An Algebraic View of the Expressivity of Recurrent Language Models

Franz Nowak , Ryan Cotterell , Reda Boumasmoud This is my paper

Pith reviewed 2026-06-28 12:05 UTC · model grok-4.3

classification 💻 cs.FL cs.CLcs.LG

keywords recurrent neural networksexpressivityformal languagessyntactic monoidwreath productarithmetic modelsstate-space models

0 comments

The pith

Recurrent neural language model expressivity reduces to whether its syntactic monoid divides a wreath product under the arithmetic model in use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper supplies a single algebraic account that explains why prior work reached conflicting conclusions about what languages recurrent models can recognize. Results that appeared to show Turing-completeness versus results that showed only regular languages are traced to the choice of floating-point versus integer arithmetic. Under this account, expressivity becomes the question of whether the syntactic monoid realized by a network divides a wreath product associated with the arithmetic. The same diagonal state-space architecture is shown to be unable to count even moduli once floating-point recurrences are required, yet able to realize every even-modulus counter once the arithmetic is switched to unsigned-integer quantization.

Core claim

Expressivity of recurrent neural language models is governed by the arithmetic model; it reduces to the algebraic question of whether a network's syntactic monoid divides a certain wreath product. The identical diagonal state-space architecture cannot implement an even-modulus counter once floating-point recurrences are enforced, yet realizes every even-modulus counter under unsigned-integer quantization.

What carries the argument

Syntactic monoid of the network and its division relation to a wreath product determined by the arithmetic model.

If this is right

Conflicting expressivity claims in the literature are reconciled once arithmetic is made explicit.
Diagonal state-space models lose the ability to count even moduli under floating-point arithmetic.
Switching to unsigned-integer quantization restores every even-modulus counter to the same architecture.
Expressivity of any recurrent model can be decided by checking monoid division rather than by simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same monoid-division test may classify other neural sequence models once their arithmetic is formalized.
Quantization choices in deployed models directly affect which formal languages they can recognize.
Mapping common floating-point precisions to their corresponding wreath products would make the algebraic test immediately usable for practitioners.

Load-bearing premise

That the discrepancy between prior Turing-completeness and regular-language results is caused entirely by differences in the underlying arithmetic model.

What would settle it

A concrete floating-point implementation of a diagonal state-space model that successfully recognizes an even-modulus counter language, or an unsigned-integer implementation that fails to do so.

Figures

Figures reproduced from arXiv: 2606.01765 by Franz Nowak, Reda Boumasmoud, Ryan Cotterell.

**Figure 1.** Figure 1: Abstraction of a deep RNN (left) to an algebraic RNN (center) to a wreath product of transformation monoids. rigorous statements about the resulting monoid and its structure, recovering or refining many results in the literature as straightforward consequences. If any of the underlying assumptions change, such as the numerical semantics or the possible recurrence parametrization of a core, only this part … view at source ↗

**Figure 2.** Figure 2: A structural perspective: from continuous idealizations (top) to algebraic structures (bottom). Definition 3.19. Let A = (Σ, R+, q◦ , e) be an RNN acceptor. For w = σ1 · · · σn ∈ Σ ∗ , write Fb R+ (q ◦ , w) ∈ QR+ for the global state obtained by iterating R+ from q ◦ on the encoded sequence e(σ1), . . . , e(σn), with the convention that the empty word acts as idQR+ , so that in particular Fb R+ (q ◦ , ε) … view at source ↗

read the original abstract

What formal languages can a recurrent neural language model recognize? Formal results in the literature conflict: some authors report Turing-completeness, while others show equivalence to regular languages. The reason for this discrepancy is that the underlying arithmetic model differs. The paper develops a unified algebraic account of the expressivity of recurrent neural networks, starting with a formal account of various arithmetic models. This account reduces expressivity to an algebraic question, e.g., whether a network's syntactic monoid divides a certain wreath product. As a case study, the paper revisits diagonal state-space models: the same architecture cannot implement an even-modulus counter once floating-point recurrences are enforced, yet realizes every even-modulus counter under unsigned-integer quantization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies conflicting RNN expressivity results through syntactic monoid division but its floating-point model risks not matching real IEEE 754 behavior closely enough for the SSM separation to hold.

read the letter

The main thing to know is that the authors reduce expressivity of recurrent models to whether the syntactic monoid divides a wreath product, and they use this to separate the same diagonal state-space architecture on even-modulus counters under floating-point versus unsigned-integer arithmetic.

What is new is the single algebraic reduction that pulls together the prior conflicting reports on Turing completeness versus regular languages, plus the concrete case study that pins the difference on the arithmetic regime. The paper does well by laying out formal accounts of the arithmetic models first and then applying the reduction directly to a modern architecture.

The soft spot is the fidelity of those arithmetic models to actual implementations. The floating-point version must capture rounding, overflow, and denormals exactly as they occur in hardware or libraries; if it deviates, the claimed inability to realize even-modulus counters does not transfer to deployed networks. The unsigned-integer side needs to match modular wrap-around without saturation. The abstract states the outcome but supplies no derivation steps, so the strength of the separation is still open.

This paper is for people working at the intersection of automata theory and neural language models who want a uniform lens on architecture and quantization choices. A reader already comfortable with syntactic monoids would get the most out of it.

It deserves a serious referee because the central algebraic claim is coherent and the case study is falsifiable in principle, even though the arithmetic modeling needs close checking.

Referee Report

1 major / 1 minor

Summary. The paper develops a unified algebraic framework for the expressivity of recurrent neural language models. It begins with formal definitions of arithmetic models (including floating-point and integer quantization), then reduces the question of which formal languages a recurrent network can recognize to whether the network's syntactic monoid divides a particular wreath product. As a case study on diagonal state-space models, the same recurrence is shown to be unable to realize even-modulus counters under the paper's floating-point model yet able to realize every even-modulus counter under its unsigned-integer model.

Significance. If the algebraic reduction and the arithmetic-model formalizations are correct, the work supplies a precise explanation for the conflicting Turing-completeness versus regular-language results in the literature and demonstrates that arithmetic semantics are load-bearing for expressivity claims. The explicit reduction to monoid division and the concrete separation for diagonal SSMs are the main contributions.

major comments (1)

[Case study section (diagonal SSMs)] The central separation for diagonal SSMs rests on the claim that the floating-point recurrence cannot implement even-modulus counters while the integer model can. The paper must supply an explicit verification that its floating-point model (including rounding and overflow rules) matches the behavior of actual IEEE 754 implementations used in deployed networks; otherwise the claimed discrepancy does not transfer to real systems.

minor comments (1)

[Algebraic framework] Notation for the syntactic monoid and wreath-product construction should be introduced with a short self-contained example before the general theorem.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the constructive suggestion regarding the floating-point model. We address the single major comment below.

read point-by-point responses

Referee: [Case study section (diagonal SSMs)] The central separation for diagonal SSMs rests on the claim that the floating-point recurrence cannot implement even-modulus counters while the integer model can. The paper must supply an explicit verification that its floating-point model (including rounding and overflow rules) matches the behavior of actual IEEE 754 implementations used in deployed networks; otherwise the claimed discrepancy does not transfer to real systems.

Authors: We agree that an explicit verification is needed for the results to transfer directly to deployed IEEE 754 systems. Our formal floating-point model was constructed to capture the standard rounding-to-nearest and overflow-to-infinity behaviors of IEEE 754 double precision, but we did not include a side-by-side mapping. In the revised manuscript we will add an appendix that (i) states the precise rounding and overflow rules used, (ii) proves equivalence to IEEE 754 for the arithmetic operations appearing in the diagonal SSM recurrence, and (iii) supplies concrete numerical examples confirming that the even-modulus counter separation holds under those rules. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses independent automata-theoretic primitives

full rationale

The paper reduces expressivity questions to whether a network's syntactic monoid divides a wreath product, invoking standard constructions from automata theory that pre-exist the paper and are not defined in terms of its own results or fitted values. The diagonal SSM case study then applies this external algebraic criterion separately to the paper's formal floating-point recurrence model versus its unsigned-integer model; neither model is constructed by reference to the desired counter realizability outcome. No self-definitional equations, fitted-input predictions, load-bearing self-citations, or smuggled ansatzes appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard algebraic automata theory; the abstract introduces no new free parameters, invented entities, or ad-hoc axioms beyond the domain assumption that expressivity reduces to monoid division.

axioms (1)

domain assumption Expressivity of recurrent language models reduces to whether the syntactic monoid divides a certain wreath product
This is the central reduction stated in the abstract as the unified account.

pith-pipeline@v0.9.1-grok · 5651 in / 1295 out tokens · 31282 ms · 2026-06-28T12:05:04.800926+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 19 canonical work pages

[1]

The Expressive Capacity of State Space Models:

Sarrof, Yash and Veitsman, Yana and Hahn, Michael , booktitle=. The Expressive Capacity of State Space Models:. 2024 , bdsk-url-1=

2024
[2]

Unlocking State-Tracking in Linear

Grazzi, Riccardo and Siems, Julien and Zela, Arber and Franke, J\". Unlocking State-Tracking in Linear. International Conference on Learning Representations , editor =. 2025 , Bdsk-Url-1 =

2025
[3]

Finite Automata, Formal Logic, and Circuit Complexity , year=

Howard Straubing , editor=. Finite Automata, Formal Logic, and Circuit Complexity , year=
[4]

Algebraic Theory of Machines

Kenneth Krohn and John Rhodes , journal=. Algebraic Theory of Machines
[5]

Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory:

Andrew Hryniowski and Alexander Wong , year=. Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory:. 1905.10901 , archiveprefix=

Pith/arXiv arXiv 1905
[6]

and Sontag, Eduardo D

Siegelmann, Hava T. and Sontag, Eduardo D. , title=. 1992 , isbn=. doi:10.1145/130385.130432 , booktitle=

work page doi:10.1145/130385.130432 1992
[7]

arXiv preprint arXiv:2310.12942 , url=

On the Representational Capacity of Recurrent Neural Language Models , author=. arXiv preprint arXiv:2310.12942 , url=. 2023 , eprint=

arXiv 2023
[8]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , month=

Lower Bounds on the Expressivity of Recurrent Neural Language Models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , month=. 2024 , address=. doi:10.18653/v1/2024.naacl-long.380 , pages=

work page doi:10.18653/v1/2024.naacl-long.380 2024
[9]

On the Practical Computational Power of Finite Precision

Weiss, Gail and Goldberg, Yoav and Yahav, Eran , editor=. On the Practical Computational Power of Finite Precision. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , month=. 2018 , address=. doi:10.18653/v1/P18-2117 , pages=

work page doi:10.18653/v1/p18-2117 2018
[10]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , month=

Recurrent Neural Language Models as Probabilistic Finite-state Automata , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , month=. 2023 , address=. doi:10.18653/v1/2023.emnlp-main.502 , pages=

work page doi:10.18653/v1/2023.emnlp-main.502 2023
[11]

, booktitle=

Indyk, P. , booktitle=. Optimal simulation of automata by neural nets , year=
[12]

Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges , month=

Sequential Neural Networks as Automata , author=. Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges , month=. 2019 , address=. doi:10.18653/v1/W19-3901 , pages=

work page doi:10.18653/v1/w19-3901 2019
[13]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url=

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and ichter, brian and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny , booktitle=. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url=
[14]

Turing, A. M. , title=. Proceedings of the London Mathematical Society , volume=. doi:https://doi.org/10.1112/plms/s2-42.1.230 , url=. https://londmathsoc.onlinelibrary.wiley.com/doi/pdf/10.1112/plms/s2-42.1.230 , year=

work page doi:10.1112/plms/s2-42.1.230
[15]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021
[16]

2303.08774 , archiveprefix=

OpenAI , year=. 2303.08774 , archiveprefix=

Pith/arXiv arXiv
[17]

Advances in Neural Information Processing Systems , editor=

Solving Quantitative Reasoning Problems with Language Models , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

2022
[18]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=. 2024 , address=. doi:10.18653/v1/2024.acl-long.676 , pages=

work page doi:10.18653/v1/2024.acl-long.676 2024
[19]

Proceedings of the 41st International Conference on Machine Learning , articleno=

Merrill, William and Petty, Jackson and Sabharwal, Ashish , title=. Proceedings of the 41st International Conference on Machine Learning , articleno=. 2024 , publisher=

2024
[20]

Anderson and Amo Alonso, Carmen and Sejnowski, Terrence J

Karuvally, Arjun and Nowak, Franz and Keller, T. Anderson and Amo Alonso, Carmen and Sejnowski, Terrence J. and Siegelmann, Hava T. , booktitle=. Bridging Expressivity and Scalability with Adaptive Unitary. 2025 , url=. 2507.05238 , archiveprefix=

arXiv 2025
[21]

Wang, Maolin and Rasoulinezhad, Seyedramin and Leong, Philip H. W. and So, Hayden K.-H. , journal=. 2022 , volume=. doi:10.1109/TPDS.2022.3149787 , url=

work page doi:10.1109/tpds.2022.3149787 2022
[22]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Quantization and training of neural networks for efficient integer-arithmetic-only inference , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=. 2018 , url=

2018
[23]

and Keutzer, Kurt , booktitle=

Kim, Sehoon and Gholami, Amir and Yao, Zhewei and Mahoney, Michael W. and Keutzer, Kurt , booktitle=
[24]

arXiv preprint arXiv:2109.09828 , year=

Sari, Eyy. arXiv preprint arXiv:2109.09828 , year=

arXiv
[25]

Training Integer-Only Deep Recurrent Neural Networks , year=

Nia, Vahid Partovi and Sari, Eyy\". Training Integer-Only Deep Recurrent Neural Networks , year=. doi:10.1007/s42979-023-01920-z , journal=

work page doi:10.1007/s42979-023-01920-z
[26]

Transactions of the Association for Computational Linguistics , volume=

Theoretical Limitations of Self-Attention in Neural Sequence Models , author=. Transactions of the Association for Computational Linguistics , volume=. 2020 , address=. doi:10.1162/tacl_a_00306 , pages=

work page doi:10.1162/tacl_a_00306 2020
[27]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=

Overcoming a Theoretical Limitation of Self-Attention , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=. 2022 , address=. doi:10.18653/v1/2022.acl-long.527 , pages=

work page doi:10.18653/v1/2022.acl-long.527 2022
[28]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , month=

On the Ability and Limitations of Transformers to Recognize Formal Languages , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , month=. 2020 , address=. doi:10.18653/v1/2020.emnlp-main.576 , pages=

work page doi:10.18653/v1/2020.emnlp-main.576 2020
[29]

Journal of Machine Learning Research , year=

Jorge Pérez and Pablo Barceló and Javier Marinkovic , title=. Journal of Machine Learning Research , year=
[30]

Transactions of the Association for Computational Linguistics , volume=

Saturated Transformers are Constant-Depth Threshold Circuits , author=. Transactions of the Association for Computational Linguistics , volume=. 2022 , address=. doi:10.1162/tacl_a_00493 , pages=

work page doi:10.1162/tacl_a_00493 2022
[31]

What Formal Languages Can Transformers Express?

Strobl, Lena and Merrill, William and Weiss, Gail and Chiang, David and Angluin, Dana , journal=. What Formal Languages Can Transformers Express?. 2024 , address=. doi:10.1162/tacl_a_00663 , pages=

work page doi:10.1162/tacl_a_00663 2024
[32]

Proceedings of the 2018 Conference of the North

Recurrent Neural Networks as Weighted Language Recognizers , author=. Proceedings of the 2018 Conference of the North. 2018 , address=. doi:10.18653/v1/N18-1205 , pages=

work page doi:10.18653/v1/n18-1205 2018
[33]

The Parallelism Tradeoff:

Merrill, William and Sabharwal, Ashish , journal=. The Parallelism Tradeoff:. 2023 , address=. doi:10.1162/tacl_a_00562 , pages=

work page doi:10.1162/tacl_a_00562 2023
[34]

Proceedings of the 38th International Conference on Machine Learning , series=

Linear Transformers are Secretly Fast Weight Programmers , author=. Proceedings of the 38th International Conference on Machine Learning , series=. 2021 , publisher=

2021
[35]

International Conference on Learning Representations , year=

Efficiently Modeling Long Sequences with Structured State Spaces , author=. International Conference on Learning Representations , year=
[36]

International Conference on Learning Representations , year=

Simplified State Space Layers for Sequence Modeling , author=. International Conference on Learning Representations , year=
[37]

First Conference on Language Modeling , year=

Mamba: Linear-Time Sequence Modeling with Selective State Spaces , author=. First Conference on Language Modeling , year=
[38]

URL https://aclanthology.org/2023

Peng, Bo and Alcaide, Eric and Anthony, Quentin and Albalak, Alon and Arcadinho, Samuel and Biderman, Stella and Cao, Huanqi and Cheng, Xin and Chung, Michael and Derczynski, Leon and Du, Xingjian and Grella, Matteo and Gv, Kranthi and He, Xuzheng and Hou, Haowen and Kazienko, Przemyslaw and Kocon, Jan and Kong, Jiaming and Koptyra, Bart. Findings of the ...

work page doi:10.18653/v1/2023.findings-emnlp.936 2023
[39]

and Wu, Tianyi and Wuttke, Daniel and Zhou-Zheng, Christian , year=

Peng, Bo and Zhang, Ruichong and Goldstein, Daniel and Alcaide, Eric and Hou, Haowen and Lu, Janna and Merrill, William and Song, Guangyu and Tan, Kaifeng and Utpala, Saiteja and Wilce, Nathan and Wind, Johan S. and Wu, Tianyi and Wuttke, Daniel and Zhou-Zheng, Christian , year=. 2503.14456 , archiveprefix=

arXiv
[40]

Transformers are

Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, Fran. Transformers are. Proceedings of the 37th International Conference on Machine Learning , series=. 2020 , publisher=

2020
[41]

2023 , publisher=

Modeling sequences with structured state spaces , author=. 2023 , publisher=

2023
[42]

2025 , eprint=

Grokking at the Edge of Numerical Stability , author=. 2025 , eprint=

2025
[43]

1976 , publisher=

Automata, Languages, and Machines , author=. 1976 , publisher=

1976
[44]

Moore , title =

Edward F. Moore , title =. Annals of Mathematics Studies , volume =. 1956 , pages =

1956
[45]

Mealy , title =

George H. Mealy , title =. Bell System Technical Journal , volume =. 1955 , url =

1955
[46]

Sch. Une th. S. 1955 , url=

1955
[47]

ACM computing surveys (CSUR) , volume=

What every computer scientist should know about floating-point arithmetic , author=. ACM computing surveys (CSUR) , volume=. 1991 , url=

1991
[48]

Neural computation , volume=

Long short-term memory , author=. Neural computation , volume=. 1997 , url=

1997
[49]

Cognitive science , volume=

Finding structure in time , author=. Cognitive science , volume=. 1990 , url=

1990
[50]

Learning phrase representations using

Cho, Kyunghyun and Van Merri. Learning phrase representations using. arXiv preprint arXiv:1406.1078 , year=

Pith/arXiv arXiv
[51]

IEEE Transactions on Computers , volume=

Monotonicity of multi-term floating-point adders , author=. IEEE Transactions on Computers , volume=. 2024 , url=

2024
[52]

Lecture notes LIAFA, Universit

Mathematical foundations of automata theory , author=. Lecture notes LIAFA, Universit. 2010 , url=

2010
[53]

Korsky and Robert C

Samuel A. Korsky and Robert C. Berwick , year=. On the Computational Power of. 1906.06349 , archivePrefix=

Pith/arXiv arXiv 1906
[54]

Lee Giles , doi =

John Stogin and Ankur Mali and C. Lee Giles , doi =. A provably stable neural network. Information Sciences , keywords =. 2024 , Bdsk-Url-1 =

2024
[55]

and Giles, C

Omlin, Christian W. and Giles, C. Lee , title =. 1996 , issue_date =. doi:10.1145/235809.235811 , journal =

work page doi:10.1145/235809.235811 1996
[56]

On pseudovarieties , url =

Samuel Eilenberg and Marcel-Paul Sch. On pseudovarieties , url =. Advances in Mathematics , number =. 1976 , Bdsk-Url-1 =. doi:https://doi.org/10.1016/0001-8708(76)90029-3 , issn =

work page doi:10.1016/0001-8708(76)90029-3 1976

[1] [1]

The Expressive Capacity of State Space Models:

Sarrof, Yash and Veitsman, Yana and Hahn, Michael , booktitle=. The Expressive Capacity of State Space Models:. 2024 , bdsk-url-1=

2024

[2] [2]

Unlocking State-Tracking in Linear

Grazzi, Riccardo and Siems, Julien and Zela, Arber and Franke, J\". Unlocking State-Tracking in Linear. International Conference on Learning Representations , editor =. 2025 , Bdsk-Url-1 =

2025

[3] [3]

Finite Automata, Formal Logic, and Circuit Complexity , year=

Howard Straubing , editor=. Finite Automata, Formal Logic, and Circuit Complexity , year=

[4] [4]

Algebraic Theory of Machines

Kenneth Krohn and John Rhodes , journal=. Algebraic Theory of Machines

[5] [5]

Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory:

Andrew Hryniowski and Alexander Wong , year=. Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory:. 1905.10901 , archiveprefix=

Pith/arXiv arXiv 1905

[6] [6]

and Sontag, Eduardo D

Siegelmann, Hava T. and Sontag, Eduardo D. , title=. 1992 , isbn=. doi:10.1145/130385.130432 , booktitle=

work page doi:10.1145/130385.130432 1992

[7] [7]

arXiv preprint arXiv:2310.12942 , url=

On the Representational Capacity of Recurrent Neural Language Models , author=. arXiv preprint arXiv:2310.12942 , url=. 2023 , eprint=

arXiv 2023

[8] [8]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , month=

Lower Bounds on the Expressivity of Recurrent Neural Language Models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , month=. 2024 , address=. doi:10.18653/v1/2024.naacl-long.380 , pages=

work page doi:10.18653/v1/2024.naacl-long.380 2024

[9] [9]

On the Practical Computational Power of Finite Precision

Weiss, Gail and Goldberg, Yoav and Yahav, Eran , editor=. On the Practical Computational Power of Finite Precision. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , month=. 2018 , address=. doi:10.18653/v1/P18-2117 , pages=

work page doi:10.18653/v1/p18-2117 2018

[10] [10]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , month=

Recurrent Neural Language Models as Probabilistic Finite-state Automata , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , month=. 2023 , address=. doi:10.18653/v1/2023.emnlp-main.502 , pages=

work page doi:10.18653/v1/2023.emnlp-main.502 2023

[11] [11]

, booktitle=

Indyk, P. , booktitle=. Optimal simulation of automata by neural nets , year=

[12] [12]

Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges , month=

Sequential Neural Networks as Automata , author=. Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges , month=. 2019 , address=. doi:10.18653/v1/W19-3901 , pages=

work page doi:10.18653/v1/w19-3901 2019

[13] [13]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url=

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and ichter, brian and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny , booktitle=. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url=

[14] [14]

Turing, A. M. , title=. Proceedings of the London Mathematical Society , volume=. doi:https://doi.org/10.1112/plms/s2-42.1.230 , url=. https://londmathsoc.onlinelibrary.wiley.com/doi/pdf/10.1112/plms/s2-42.1.230 , year=

work page doi:10.1112/plms/s2-42.1.230

[15] [15]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021

[16] [16]

2303.08774 , archiveprefix=

OpenAI , year=. 2303.08774 , archiveprefix=

Pith/arXiv arXiv

[17] [17]

Advances in Neural Information Processing Systems , editor=

Solving Quantitative Reasoning Problems with Language Models , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

2022

[18] [18]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=. 2024 , address=. doi:10.18653/v1/2024.acl-long.676 , pages=

work page doi:10.18653/v1/2024.acl-long.676 2024

[19] [19]

Proceedings of the 41st International Conference on Machine Learning , articleno=

Merrill, William and Petty, Jackson and Sabharwal, Ashish , title=. Proceedings of the 41st International Conference on Machine Learning , articleno=. 2024 , publisher=

2024

[20] [20]

Anderson and Amo Alonso, Carmen and Sejnowski, Terrence J

Karuvally, Arjun and Nowak, Franz and Keller, T. Anderson and Amo Alonso, Carmen and Sejnowski, Terrence J. and Siegelmann, Hava T. , booktitle=. Bridging Expressivity and Scalability with Adaptive Unitary. 2025 , url=. 2507.05238 , archiveprefix=

arXiv 2025

[21] [21]

Wang, Maolin and Rasoulinezhad, Seyedramin and Leong, Philip H. W. and So, Hayden K.-H. , journal=. 2022 , volume=. doi:10.1109/TPDS.2022.3149787 , url=

work page doi:10.1109/tpds.2022.3149787 2022

[22] [22]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Quantization and training of neural networks for efficient integer-arithmetic-only inference , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=. 2018 , url=

2018

[23] [23]

and Keutzer, Kurt , booktitle=

Kim, Sehoon and Gholami, Amir and Yao, Zhewei and Mahoney, Michael W. and Keutzer, Kurt , booktitle=

[24] [24]

arXiv preprint arXiv:2109.09828 , year=

Sari, Eyy. arXiv preprint arXiv:2109.09828 , year=

arXiv

[25] [25]

Training Integer-Only Deep Recurrent Neural Networks , year=

Nia, Vahid Partovi and Sari, Eyy\". Training Integer-Only Deep Recurrent Neural Networks , year=. doi:10.1007/s42979-023-01920-z , journal=

work page doi:10.1007/s42979-023-01920-z

[26] [26]

Transactions of the Association for Computational Linguistics , volume=

Theoretical Limitations of Self-Attention in Neural Sequence Models , author=. Transactions of the Association for Computational Linguistics , volume=. 2020 , address=. doi:10.1162/tacl_a_00306 , pages=

work page doi:10.1162/tacl_a_00306 2020

[27] [27]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=

Overcoming a Theoretical Limitation of Self-Attention , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=. 2022 , address=. doi:10.18653/v1/2022.acl-long.527 , pages=

work page doi:10.18653/v1/2022.acl-long.527 2022

[28] [28]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , month=

On the Ability and Limitations of Transformers to Recognize Formal Languages , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , month=. 2020 , address=. doi:10.18653/v1/2020.emnlp-main.576 , pages=

work page doi:10.18653/v1/2020.emnlp-main.576 2020

[29] [29]

Journal of Machine Learning Research , year=

Jorge Pérez and Pablo Barceló and Javier Marinkovic , title=. Journal of Machine Learning Research , year=

[30] [30]

Transactions of the Association for Computational Linguistics , volume=

Saturated Transformers are Constant-Depth Threshold Circuits , author=. Transactions of the Association for Computational Linguistics , volume=. 2022 , address=. doi:10.1162/tacl_a_00493 , pages=

work page doi:10.1162/tacl_a_00493 2022

[31] [31]

What Formal Languages Can Transformers Express?

Strobl, Lena and Merrill, William and Weiss, Gail and Chiang, David and Angluin, Dana , journal=. What Formal Languages Can Transformers Express?. 2024 , address=. doi:10.1162/tacl_a_00663 , pages=

work page doi:10.1162/tacl_a_00663 2024

[32] [32]

Proceedings of the 2018 Conference of the North

Recurrent Neural Networks as Weighted Language Recognizers , author=. Proceedings of the 2018 Conference of the North. 2018 , address=. doi:10.18653/v1/N18-1205 , pages=

work page doi:10.18653/v1/n18-1205 2018

[33] [33]

The Parallelism Tradeoff:

Merrill, William and Sabharwal, Ashish , journal=. The Parallelism Tradeoff:. 2023 , address=. doi:10.1162/tacl_a_00562 , pages=

work page doi:10.1162/tacl_a_00562 2023

[34] [34]

Proceedings of the 38th International Conference on Machine Learning , series=

Linear Transformers are Secretly Fast Weight Programmers , author=. Proceedings of the 38th International Conference on Machine Learning , series=. 2021 , publisher=

2021

[35] [35]

International Conference on Learning Representations , year=

Efficiently Modeling Long Sequences with Structured State Spaces , author=. International Conference on Learning Representations , year=

[36] [36]

International Conference on Learning Representations , year=

Simplified State Space Layers for Sequence Modeling , author=. International Conference on Learning Representations , year=

[37] [37]

First Conference on Language Modeling , year=

Mamba: Linear-Time Sequence Modeling with Selective State Spaces , author=. First Conference on Language Modeling , year=

[38] [38]

URL https://aclanthology.org/2023

Peng, Bo and Alcaide, Eric and Anthony, Quentin and Albalak, Alon and Arcadinho, Samuel and Biderman, Stella and Cao, Huanqi and Cheng, Xin and Chung, Michael and Derczynski, Leon and Du, Xingjian and Grella, Matteo and Gv, Kranthi and He, Xuzheng and Hou, Haowen and Kazienko, Przemyslaw and Kocon, Jan and Kong, Jiaming and Koptyra, Bart. Findings of the ...

work page doi:10.18653/v1/2023.findings-emnlp.936 2023

[39] [39]

and Wu, Tianyi and Wuttke, Daniel and Zhou-Zheng, Christian , year=

Peng, Bo and Zhang, Ruichong and Goldstein, Daniel and Alcaide, Eric and Hou, Haowen and Lu, Janna and Merrill, William and Song, Guangyu and Tan, Kaifeng and Utpala, Saiteja and Wilce, Nathan and Wind, Johan S. and Wu, Tianyi and Wuttke, Daniel and Zhou-Zheng, Christian , year=. 2503.14456 , archiveprefix=

arXiv

[40] [40]

Transformers are

Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, Fran. Transformers are. Proceedings of the 37th International Conference on Machine Learning , series=. 2020 , publisher=

2020

[41] [41]

2023 , publisher=

Modeling sequences with structured state spaces , author=. 2023 , publisher=

2023

[42] [42]

2025 , eprint=

Grokking at the Edge of Numerical Stability , author=. 2025 , eprint=

2025

[43] [43]

1976 , publisher=

Automata, Languages, and Machines , author=. 1976 , publisher=

1976

[44] [44]

Moore , title =

Edward F. Moore , title =. Annals of Mathematics Studies , volume =. 1956 , pages =

1956

[45] [45]

Mealy , title =

George H. Mealy , title =. Bell System Technical Journal , volume =. 1955 , url =

1955

[46] [46]

Sch. Une th. S. 1955 , url=

1955

[47] [47]

ACM computing surveys (CSUR) , volume=

What every computer scientist should know about floating-point arithmetic , author=. ACM computing surveys (CSUR) , volume=. 1991 , url=

1991

[48] [48]

Neural computation , volume=

Long short-term memory , author=. Neural computation , volume=. 1997 , url=

1997

[49] [49]

Cognitive science , volume=

Finding structure in time , author=. Cognitive science , volume=. 1990 , url=

1990

[50] [50]

Learning phrase representations using

Cho, Kyunghyun and Van Merri. Learning phrase representations using. arXiv preprint arXiv:1406.1078 , year=

Pith/arXiv arXiv

[51] [51]

IEEE Transactions on Computers , volume=

Monotonicity of multi-term floating-point adders , author=. IEEE Transactions on Computers , volume=. 2024 , url=

2024

[52] [52]

Lecture notes LIAFA, Universit

Mathematical foundations of automata theory , author=. Lecture notes LIAFA, Universit. 2010 , url=

2010

[53] [53]

Korsky and Robert C

Samuel A. Korsky and Robert C. Berwick , year=. On the Computational Power of. 1906.06349 , archivePrefix=

Pith/arXiv arXiv 1906

[54] [54]

Lee Giles , doi =

John Stogin and Ankur Mali and C. Lee Giles , doi =. A provably stable neural network. Information Sciences , keywords =. 2024 , Bdsk-Url-1 =

2024

[55] [55]

and Giles, C

Omlin, Christian W. and Giles, C. Lee , title =. 1996 , issue_date =. doi:10.1145/235809.235811 , journal =

work page doi:10.1145/235809.235811 1996

[56] [56]

On pseudovarieties , url =

Samuel Eilenberg and Marcel-Paul Sch. On pseudovarieties , url =. Advances in Mathematics , number =. 1976 , Bdsk-Url-1 =. doi:https://doi.org/10.1016/0001-8708(76)90029-3 , issn =

work page doi:10.1016/0001-8708(76)90029-3 1976