An Algebraic View of the Expressivity of Recurrent Language Models
Pith reviewed 2026-06-28 12:05 UTC · model grok-4.3
The pith
Recurrent neural language model expressivity reduces to whether its syntactic monoid divides a wreath product under the arithmetic model in use.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Expressivity of recurrent neural language models is governed by the arithmetic model; it reduces to the algebraic question of whether a network's syntactic monoid divides a certain wreath product. The identical diagonal state-space architecture cannot implement an even-modulus counter once floating-point recurrences are enforced, yet realizes every even-modulus counter under unsigned-integer quantization.
What carries the argument
Syntactic monoid of the network and its division relation to a wreath product determined by the arithmetic model.
If this is right
- Conflicting expressivity claims in the literature are reconciled once arithmetic is made explicit.
- Diagonal state-space models lose the ability to count even moduli under floating-point arithmetic.
- Switching to unsigned-integer quantization restores every even-modulus counter to the same architecture.
- Expressivity of any recurrent model can be decided by checking monoid division rather than by simulation.
Where Pith is reading between the lines
- The same monoid-division test may classify other neural sequence models once their arithmetic is formalized.
- Quantization choices in deployed models directly affect which formal languages they can recognize.
- Mapping common floating-point precisions to their corresponding wreath products would make the algebraic test immediately usable for practitioners.
Load-bearing premise
That the discrepancy between prior Turing-completeness and regular-language results is caused entirely by differences in the underlying arithmetic model.
What would settle it
A concrete floating-point implementation of a diagonal state-space model that successfully recognizes an even-modulus counter language, or an unsigned-integer implementation that fails to do so.
Figures
read the original abstract
What formal languages can a recurrent neural language model recognize? Formal results in the literature conflict: some authors report Turing-completeness, while others show equivalence to regular languages. The reason for this discrepancy is that the underlying arithmetic model differs. The paper develops a unified algebraic account of the expressivity of recurrent neural networks, starting with a formal account of various arithmetic models. This account reduces expressivity to an algebraic question, e.g., whether a network's syntactic monoid divides a certain wreath product. As a case study, the paper revisits diagonal state-space models: the same architecture cannot implement an even-modulus counter once floating-point recurrences are enforced, yet realizes every even-modulus counter under unsigned-integer quantization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a unified algebraic framework for the expressivity of recurrent neural language models. It begins with formal definitions of arithmetic models (including floating-point and integer quantization), then reduces the question of which formal languages a recurrent network can recognize to whether the network's syntactic monoid divides a particular wreath product. As a case study on diagonal state-space models, the same recurrence is shown to be unable to realize even-modulus counters under the paper's floating-point model yet able to realize every even-modulus counter under its unsigned-integer model.
Significance. If the algebraic reduction and the arithmetic-model formalizations are correct, the work supplies a precise explanation for the conflicting Turing-completeness versus regular-language results in the literature and demonstrates that arithmetic semantics are load-bearing for expressivity claims. The explicit reduction to monoid division and the concrete separation for diagonal SSMs are the main contributions.
major comments (1)
- [Case study section (diagonal SSMs)] The central separation for diagonal SSMs rests on the claim that the floating-point recurrence cannot implement even-modulus counters while the integer model can. The paper must supply an explicit verification that its floating-point model (including rounding and overflow rules) matches the behavior of actual IEEE 754 implementations used in deployed networks; otherwise the claimed discrepancy does not transfer to real systems.
minor comments (1)
- [Algebraic framework] Notation for the syntactic monoid and wreath-product construction should be introduced with a short self-contained example before the general theorem.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the constructive suggestion regarding the floating-point model. We address the single major comment below.
read point-by-point responses
-
Referee: [Case study section (diagonal SSMs)] The central separation for diagonal SSMs rests on the claim that the floating-point recurrence cannot implement even-modulus counters while the integer model can. The paper must supply an explicit verification that its floating-point model (including rounding and overflow rules) matches the behavior of actual IEEE 754 implementations used in deployed networks; otherwise the claimed discrepancy does not transfer to real systems.
Authors: We agree that an explicit verification is needed for the results to transfer directly to deployed IEEE 754 systems. Our formal floating-point model was constructed to capture the standard rounding-to-nearest and overflow-to-infinity behaviors of IEEE 754 double precision, but we did not include a side-by-side mapping. In the revised manuscript we will add an appendix that (i) states the precise rounding and overflow rules used, (ii) proves equivalence to IEEE 754 for the arithmetic operations appearing in the diagonal SSM recurrence, and (iii) supplies concrete numerical examples confirming that the even-modulus counter separation holds under those rules. revision: yes
Circularity Check
No circularity: derivation uses independent automata-theoretic primitives
full rationale
The paper reduces expressivity questions to whether a network's syntactic monoid divides a wreath product, invoking standard constructions from automata theory that pre-exist the paper and are not defined in terms of its own results or fitted values. The diagonal SSM case study then applies this external algebraic criterion separately to the paper's formal floating-point recurrence model versus its unsigned-integer model; neither model is constructed by reference to the desired counter realizability outcome. No self-definitional equations, fitted-input predictions, load-bearing self-citations, or smuggled ansatzes appear in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Expressivity of recurrent language models reduces to whether the syntactic monoid divides a certain wreath product
Reference graph
Works this paper leans on
-
[1]
The Expressive Capacity of State Space Models:
Sarrof, Yash and Veitsman, Yana and Hahn, Michael , booktitle=. The Expressive Capacity of State Space Models:. 2024 , bdsk-url-1=
2024
-
[2]
Unlocking State-Tracking in Linear
Grazzi, Riccardo and Siems, Julien and Zela, Arber and Franke, J\". Unlocking State-Tracking in Linear. International Conference on Learning Representations , editor =. 2025 , Bdsk-Url-1 =
2025
-
[3]
Finite Automata, Formal Logic, and Circuit Complexity , year=
Howard Straubing , editor=. Finite Automata, Formal Logic, and Circuit Complexity , year=
-
[4]
Algebraic Theory of Machines
Kenneth Krohn and John Rhodes , journal=. Algebraic Theory of Machines
-
[5]
Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory:
Andrew Hryniowski and Alexander Wong , year=. Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory:. 1905.10901 , archiveprefix=
Pith/arXiv arXiv 1905
-
[6]
Siegelmann, Hava T. and Sontag, Eduardo D. , title=. 1992 , isbn=. doi:10.1145/130385.130432 , booktitle=
-
[7]
arXiv preprint arXiv:2310.12942 , url=
On the Representational Capacity of Recurrent Neural Language Models , author=. arXiv preprint arXiv:2310.12942 , url=. 2023 , eprint=
arXiv 2023
-
[8]
Lower Bounds on the Expressivity of Recurrent Neural Language Models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , month=. 2024 , address=. doi:10.18653/v1/2024.naacl-long.380 , pages=
-
[9]
On the Practical Computational Power of Finite Precision
Weiss, Gail and Goldberg, Yoav and Yahav, Eran , editor=. On the Practical Computational Power of Finite Precision. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , month=. 2018 , address=. doi:10.18653/v1/P18-2117 , pages=
-
[10]
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , month=
Recurrent Neural Language Models as Probabilistic Finite-state Automata , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , month=. 2023 , address=. doi:10.18653/v1/2023.emnlp-main.502 , pages=
-
[11]
, booktitle=
Indyk, P. , booktitle=. Optimal simulation of automata by neural nets , year=
-
[12]
Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges , month=
Sequential Neural Networks as Automata , author=. Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges , month=. 2019 , address=. doi:10.18653/v1/W19-3901 , pages=
-
[13]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url=
Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and ichter, brian and Xia, Fei and Chi, Ed and Le, Quoc V and Zhou, Denny , booktitle=. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , url=
-
[14]
Turing, A. M. , title=. Proceedings of the London Mathematical Society , volume=. doi:https://doi.org/10.1112/plms/s2-42.1.230 , url=. https://londmathsoc.onlinelibrary.wiley.com/doi/pdf/10.1112/plms/s2-42.1.230 , year=
-
[15]
2021 , eprint=
Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=
2021
- [16]
-
[17]
Advances in Neural Information Processing Systems , editor=
Solving Quantitative Reasoning Problems with Language Models , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=
2022
-
[18]
On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=. 2024 , address=. doi:10.18653/v1/2024.acl-long.676 , pages=
-
[19]
Proceedings of the 41st International Conference on Machine Learning , articleno=
Merrill, William and Petty, Jackson and Sabharwal, Ashish , title=. Proceedings of the 41st International Conference on Machine Learning , articleno=. 2024 , publisher=
2024
-
[20]
Anderson and Amo Alonso, Carmen and Sejnowski, Terrence J
Karuvally, Arjun and Nowak, Franz and Keller, T. Anderson and Amo Alonso, Carmen and Sejnowski, Terrence J. and Siegelmann, Hava T. , booktitle=. Bridging Expressivity and Scalability with Adaptive Unitary. 2025 , url=. 2507.05238 , archiveprefix=
arXiv 2025
-
[21]
Wang, Maolin and Rasoulinezhad, Seyedramin and Leong, Philip H. W. and So, Hayden K.-H. , journal=. 2022 , volume=. doi:10.1109/TPDS.2022.3149787 , url=
-
[22]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Quantization and training of neural networks for efficient integer-arithmetic-only inference , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=. 2018 , url=
2018
-
[23]
and Keutzer, Kurt , booktitle=
Kim, Sehoon and Gholami, Amir and Yao, Zhewei and Mahoney, Michael W. and Keutzer, Kurt , booktitle=
- [24]
-
[25]
Training Integer-Only Deep Recurrent Neural Networks , year=
Nia, Vahid Partovi and Sari, Eyy\". Training Integer-Only Deep Recurrent Neural Networks , year=. doi:10.1007/s42979-023-01920-z , journal=
-
[26]
Transactions of the Association for Computational Linguistics , volume=
Theoretical Limitations of Self-Attention in Neural Sequence Models , author=. Transactions of the Association for Computational Linguistics , volume=. 2020 , address=. doi:10.1162/tacl_a_00306 , pages=
-
[27]
Overcoming a Theoretical Limitation of Self-Attention , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=. 2022 , address=. doi:10.18653/v1/2022.acl-long.527 , pages=
-
[28]
On the Ability and Limitations of Transformers to Recognize Formal Languages , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , month=. 2020 , address=. doi:10.18653/v1/2020.emnlp-main.576 , pages=
-
[29]
Journal of Machine Learning Research , year=
Jorge Pérez and Pablo Barceló and Javier Marinkovic , title=. Journal of Machine Learning Research , year=
-
[30]
Transactions of the Association for Computational Linguistics , volume=
Saturated Transformers are Constant-Depth Threshold Circuits , author=. Transactions of the Association for Computational Linguistics , volume=. 2022 , address=. doi:10.1162/tacl_a_00493 , pages=
-
[31]
What Formal Languages Can Transformers Express?
Strobl, Lena and Merrill, William and Weiss, Gail and Chiang, David and Angluin, Dana , journal=. What Formal Languages Can Transformers Express?. 2024 , address=. doi:10.1162/tacl_a_00663 , pages=
-
[32]
Proceedings of the 2018 Conference of the North
Recurrent Neural Networks as Weighted Language Recognizers , author=. Proceedings of the 2018 Conference of the North. 2018 , address=. doi:10.18653/v1/N18-1205 , pages=
-
[33]
Merrill, William and Sabharwal, Ashish , journal=. The Parallelism Tradeoff:. 2023 , address=. doi:10.1162/tacl_a_00562 , pages=
-
[34]
Proceedings of the 38th International Conference on Machine Learning , series=
Linear Transformers are Secretly Fast Weight Programmers , author=. Proceedings of the 38th International Conference on Machine Learning , series=. 2021 , publisher=
2021
-
[35]
International Conference on Learning Representations , year=
Efficiently Modeling Long Sequences with Structured State Spaces , author=. International Conference on Learning Representations , year=
-
[36]
International Conference on Learning Representations , year=
Simplified State Space Layers for Sequence Modeling , author=. International Conference on Learning Representations , year=
-
[37]
First Conference on Language Modeling , year=
Mamba: Linear-Time Sequence Modeling with Selective State Spaces , author=. First Conference on Language Modeling , year=
-
[38]
Findings of the Association for Computational Linguistics: EMNLP 2023 , month=
Peng, Bo and Alcaide, Eric and Anthony, Quentin and Albalak, Alon and Arcadinho, Samuel and Biderman, Stella and Cao, Huanqi and Cheng, Xin and Chung, Michael and Derczynski, Leon and Du, Xingjian and Grella, Matteo and Gv, Kranthi and He, Xuzheng and Hou, Haowen and Kazienko, Przemyslaw and Kocon, Jan and Kong, Jiaming and Koptyra, Bart. Findings of the ...
-
[39]
and Wu, Tianyi and Wuttke, Daniel and Zhou-Zheng, Christian , year=
Peng, Bo and Zhang, Ruichong and Goldstein, Daniel and Alcaide, Eric and Hou, Haowen and Lu, Janna and Merrill, William and Song, Guangyu and Tan, Kaifeng and Utpala, Saiteja and Wilce, Nathan and Wind, Johan S. and Wu, Tianyi and Wuttke, Daniel and Zhou-Zheng, Christian , year=. 2503.14456 , archiveprefix=
-
[40]
Transformers are
Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, Fran. Transformers are. Proceedings of the 37th International Conference on Machine Learning , series=. 2020 , publisher=
2020
-
[41]
2023 , publisher=
Modeling sequences with structured state spaces , author=. 2023 , publisher=
2023
-
[42]
2025 , eprint=
Grokking at the Edge of Numerical Stability , author=. 2025 , eprint=
2025
-
[43]
1976 , publisher=
Automata, Languages, and Machines , author=. 1976 , publisher=
1976
-
[44]
Moore , title =
Edward F. Moore , title =. Annals of Mathematics Studies , volume =. 1956 , pages =
1956
-
[45]
Mealy , title =
George H. Mealy , title =. Bell System Technical Journal , volume =. 1955 , url =
1955
-
[46]
Sch. Une th. S. 1955 , url=
1955
-
[47]
ACM computing surveys (CSUR) , volume=
What every computer scientist should know about floating-point arithmetic , author=. ACM computing surveys (CSUR) , volume=. 1991 , url=
1991
-
[48]
Neural computation , volume=
Long short-term memory , author=. Neural computation , volume=. 1997 , url=
1997
-
[49]
Cognitive science , volume=
Finding structure in time , author=. Cognitive science , volume=. 1990 , url=
1990
-
[50]
Learning phrase representations using
Cho, Kyunghyun and Van Merri. Learning phrase representations using. arXiv preprint arXiv:1406.1078 , year=
-
[51]
IEEE Transactions on Computers , volume=
Monotonicity of multi-term floating-point adders , author=. IEEE Transactions on Computers , volume=. 2024 , url=
2024
-
[52]
Lecture notes LIAFA, Universit
Mathematical foundations of automata theory , author=. Lecture notes LIAFA, Universit. 2010 , url=
2010
-
[53]
Samuel A. Korsky and Robert C. Berwick , year=. On the Computational Power of. 1906.06349 , archivePrefix=
Pith/arXiv arXiv 1906
-
[54]
Lee Giles , doi =
John Stogin and Ankur Mali and C. Lee Giles , doi =. A provably stable neural network. Information Sciences , keywords =. 2024 , Bdsk-Url-1 =
2024
-
[55]
Omlin, Christian W. and Giles, C. Lee , title =. 1996 , issue_date =. doi:10.1145/235809.235811 , journal =
-
[56]
Samuel Eilenberg and Marcel-Paul Sch. On pseudovarieties , url =. Advances in Mathematics , number =. 1976 , Bdsk-Url-1 =. doi:https://doi.org/10.1016/0001-8708(76)90029-3 , issn =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.