Space-Efficient Language Generation in the Limit

Chirag Pabbaraju; Hristo Papazov; Miltiadis Stouras; Nicolas Flammarion; Ola Svensson

arxiv: 2606.25777 · v1 · pith:CXZU6S6Znew · submitted 2026-06-24 · 💻 cs.DS · cs.AI· cs.CL· cs.LG

Space-Efficient Language Generation in the Limit

Nicolas Flammarion , Chirag Pabbaraju , Hristo Papazov , Miltiadis Stouras , Ola Svensson This is my paper

Pith reviewed 2026-06-25 20:24 UTC · model grok-4.3

classification 💻 cs.DS cs.AIcs.CLcs.LG

keywords streaming algorithmslanguage learning in the limitDFA identificationspace complexitygeneration gapadversarial streamscommunication complexity

0 comments

The pith

A poly(s,k)-space streaming algorithm converges to a hypothesis for s-state DFA languages with generation gap O(k^{2s-2}) while capturing all strings of length at least 2s-1.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a resource-bounded version of language learning in the limit where a learner sees an adversarial stream of strings from an unknown target language K and must output a hypothesis L contained in K that omits only a bounded number of strings from K. For the natural class of languages recognized by DFAs with s states over a k-symbol alphabet, it shows that polynomial space is enough to achieve a generation gap of O(k to the power 2s minus 2) and to match every sufficiently long string exactly. This sits between the exponential-space regime where exact identification is possible and weaker memory regimes where larger gaps are forced, establishing a concrete tradeoff between memory and the quality of the generated language.

Core claim

We present a streaming algorithm using poly(s,k) space that converges to a hypothesis with generation gap Δ = O(k^{2s-2}). Moreover, the learned hypothesis captures every string in K of length at least 2s-1. We complement this result with a near-matching lower bound through a reduction from a standard communication complexity problem: achieving generation gap Δ ≤ k^{(1-ε)s} requires k^{Ω(ε s)} memory.

What carries the argument

the poly(s,k)-space streaming algorithm that processes a positive adversarial stream and outputs a hypothesis language L ⊆ K with bounded omissions

If this is right

Polynomial space is sufficient to guarantee that all strings of length 2s-1 or longer are included in the hypothesis.
Exact identification of the target DFA language requires exponential space.
The generation gap cannot be improved to k^{(1-ε)s} without increasing memory to k^{Ω(ε s)}.
The algorithm works against an adversarial ordering of the positive examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The exponential dependence on s in the gap size implies that the approach is most useful when the underlying automaton is very small.
One could instantiate the learner on concrete small DFAs to measure the actual number of missed strings versus the O(k^{2s-2}) upper bound.
The communication-complexity reduction technique may apply to other hypothesis classes beyond DFAs.

Load-bearing premise

The target language belongs to the class of languages recognized by deterministic finite automata with at most s states.

What would settle it

An explicit DFA with s states and alphabet size k together with an adversarial stream on which every poly(s,k)-space learner outputs a hypothesis that either misses more than O(k^{2s-2}) strings or misses some string of length at least 2s-1.

Figures

Figures reproduced from arXiv: 2606.25777 by Chirag Pabbaraju, Hristo Papazov, Miltiadis Stouras, Nicolas Flammarion, Ola Svensson.

read the original abstract

We initiate a resource-aware theory of \textit{language generation in the limit} under the minimal constraint of space efficiency. In our framework, a learner observes an adversarial positive stream from a target language $K$ and must eventually output a hallucination-free hypothesis language $L \subseteq K$ while omitting at most $\Delta$ strings of $K$. We focus on $\mathcal{C}_{s,k}$, the collection of languages recognized by DFAs with at most $s$ states over an alphabet of size $k$, as the natural hypothesis class for memory-bounded learners. In the exponential-space regime, we prove that a learner can exactly identify the target $K$. Under a stricter memory budget, we characterize the strongest possible generation guarantees. In particular, we present a streaming algorithm using $\mathrm{poly}(s,k)$ space that converges to a hypothesis with generation gap $\Delta = O(k^{2s-2})$. Moreover, the learned hypothesis captures every string in $K$ of length at least $2s-1$. We complement this result with a near-matching lower bound through a reduction from a standard communication complexity problem. Specifically, achieving generation gap $\Delta \le k^{(1-\varepsilon)s}$ requires $k^{\Omega(\varepsilon s)}$ memory. Together, these results reveal a sharp transition between polynomial-space generation and exponential-space exact identification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets up a resource-bounded version of language generation in the limit for DFA languages and supplies matching poly-space upper and lower bounds on the generation gap.

read the letter

The main takeaway is that the authors give a streaming algorithm using poly(s,k) space that, for languages accepted by s-state DFAs over k symbols, eventually outputs a hypothesis missing only O(k^{2s-2}) strings from the target while including all strings of length at least 2s-1. They also show a lower bound that smaller gaps require more space.

What stands out is the explicit characterization of the space-accuracy tradeoff. The upper bound construction seems to leverage the limited number of short strings in an s-state automaton, and the lower bound comes from a standard communication complexity reduction. This organizes the transition from polynomial space (approximate generation) to exponential space (exact identification) in a clean way.

The results look internally consistent. The claimed gap matches the maximum number of strings shorter than 2s-1 that an s-state DFA can have, so the guarantee reduces to correctly handling the infinite tail. No obvious circularity or hidden fitting in the bounds.

One soft spot is that the abstract only sketches the algorithm and reduction; without the full proofs it's hard to check for edge cases in the streaming model or the exact communication problem used. But the stress test notes no inconsistency from what's stated.

This work is aimed at researchers in automata learning and streaming algorithms who care about resource bounds. It should interest people thinking about memory-limited learners in the limit.

I would send it to peer review. The framework is new enough and the bounds are sharp enough to warrant referee attention, even if the proofs need careful verification.

Referee Report

0 major / 0 minor

Summary. The paper initiates a resource-aware theory of language generation in the limit. For the class C_{s,k} of languages recognized by DFAs with at most s states over a k-letter alphabet, it gives a poly(s,k)-space streaming algorithm that outputs a hypothesis L ⊆ K with generation gap Δ = O(k^{2s-2}) while capturing every string of K of length at least 2s-1. It also shows that exact identification of K is possible with exponential space and proves a near-matching lower bound: achieving Δ ≤ k^{(1-ε)s} requires k^{Ω(ε s)} space via reduction from a communication-complexity problem.

Significance. If the stated results hold, the work cleanly separates the polynomial-space regime (approximate generation with gap polynomial in k of degree linear in s) from the exponential-space regime (exact identification). The upper bound is consistent with the fact that any s-state DFA accepts at most O(k^{2s-2}) strings of length < 2s-1, and the lower bound is obtained by standard communication-complexity techniques. This supplies quantitative, falsifiable guarantees on the best achievable generation gap under memory constraints.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript, accurate summary of the results, and recommendation to accept. The referee's assessment correctly identifies the separation between the polynomial-space regime (approximate generation) and the exponential-space regime (exact identification).

Circularity Check

0 steps flagged

No significant circularity; self-contained algorithmic result

full rationale

The paper states direct algorithmic constructions achieving the stated poly(s,k)-space streaming learner and the Δ = O(k^{2s-2}) generation gap, together with a lower bound obtained via reduction from a standard communication-complexity problem. No load-bearing step reduces by definition or by self-citation to the claimed output; the short-string bound follows from the known size of the language of an s-state DFA rather than from any fitted parameter or renamed input. The derivation therefore remains independent of its own conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the standard definition of DFA-recognizable languages and the adversarial streaming model; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The target language belongs to C_{s,k}, the class of languages recognized by DFAs with ≤ s states over alphabet size k.
All stated guarantees are parameterized by membership in this class.

pith-pipeline@v0.9.1-grok · 5793 in / 1264 out tokens · 16782 ms · 2026-06-25T20:24:17.526623+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 14 canonical work pages · 1 internal anchor

[1]

2026 , eprint=

On Language Generation in the Limit with Bounded Memory , author=. 2026 , eprint=

2026
[2]

Advances in Neural Information Processing Systems , volume=

Language generation in the limit , author=. Advances in Neural Information Processing Systems , volume=
[3]

arXiv preprint arXiv:2412.18530 , pages=

Characterizations of language generation with breadth , author=. arXiv preprint arXiv:2412.18530 , pages=

work page arXiv
[4]

1985 , publisher=

Systems that learn: An introduction to learning theory for cognitive and computer scientists , author=. 1985 , publisher=

1985
[5]

arXiv preprint arXiv:2506.15543 , year=

Learning Algorithms in the Limit , author=. arXiv preprint arXiv:2506.15543 , year=

work page arXiv
[6]

arXiv preprint arXiv:2411.15364 , year=

Exploring facets of language generation in the limit , author=. arXiv preprint arXiv:2411.15364 , year=

work page arXiv
[7]

arXiv preprint arXiv:2504.14370 , year=

Density Measures for Language Generation , author=. arXiv preprint arXiv:2504.14370 , year=

work page arXiv
[8]

arXiv preprint arXiv:2410.13714 , year=

Generation through the lens of learning theory , author=. arXiv preprint arXiv:2410.13714 , year=

work page arXiv
[9]

arXiv preprint arXiv:2510.02795 , year=

Pareto-optimal Non-uniform Language Generation , author=. arXiv preprint arXiv:2510.02795 , year=

work page arXiv
[10]

arXiv preprint arXiv:2501.04179 , year=

Generation from Noisy Examples , author=. arXiv preprint arXiv:2501.04179 , year=

work page arXiv
[11]

arXiv preprint arXiv:2505.21819 , year=

Representative Language Generation , author=. arXiv preprint arXiv:2505.21819 , year=

work page arXiv
[12]

arXiv preprint arXiv:2506.18642 , year=

On Union-Closedness of Language Generation , author=. arXiv preprint arXiv:2506.18642 , year=

work page arXiv
[13]

arXiv preprint arXiv:2511.04103 , year=

A Characterization of List Language Identification in the Limit , author=. arXiv preprint arXiv:2511.04103 , year=

work page arXiv
[14]

Machine Learning , volume=

Characteristic sets for polynomial grammatical inference , author=. Machine Learning , volume=. 1997 , publisher=

1997
[15]

Information and control , volume=

Inductive inference of formal languages from positive data , author=. Information and control , volume=. 1980 , publisher=

1980
[16]

Information and control , volume=

Language identification in the limit , author=. Information and control , volume=. 1967 , publisher=

1967
[17]

Information and control , volume=

Complexity of automaton identification from given data , author=. Information and control , volume=. 1978 , publisher=

1978
[18]

Machine Learning , volume=

Learning DFA from simple examples , author=. Machine Learning , volume=. 2001 , publisher=

2001
[19]

Annual Symposium on Theoretical Aspects of Computer Science , pages=

PAC learning with simple examples , author=. Annual Symposium on Theoretical Aspects of Computer Science , pages=. 1996 , organization=

1996
[20]

International Workshop on Algorithmic Learning Theory , pages=

PAC learning under helpful distributions , author=. International Workshop on Algorithmic Learning Theory , pages=. 1997 , organization=

1997
[21]

2023 , publisher=

Automata theory: An algorithmic approach , author=. 2023 , publisher=

2023
[22]

ACM Sigact News , volume=

Introduction to the Theory of Computation , author=. ACM Sigact News , volume=. 1996 , publisher=

1996
[23]

Acm Sigact News , volume=

Introduction to automata theory, languages, and computation , author=. Acm Sigact News , volume=. 2001 , publisher=

2001
[24]

Theory of machines and computations , pages=

An n log n algorithm for minimizing states in a finite automaton , author=. Theory of machines and computations , pages=. 1971 , publisher=

1971
[25]

Acta Informatica , volume=

Describing an algorithm by Hopcroft , author=. Acta Informatica , volume=. 1973 , publisher=

1973
[26]

RAIRO-Theoretical Informatics and Applications , volume=

Hyper-minimizing minimized deterministic finite state automata , author=. RAIRO-Theoretical Informatics and Applications , volume=. 2009 , publisher=

2009
[27]

Theoretical computer science , volume=

An nlogn algorithm for hyper-minimizing a (minimized) deterministic automaton , author=. Theoretical computer science , volume=. 2010 , publisher=

2010
[28]

International Symposium on Mathematical Foundations of Computer Science , pages=

Hyper-minimisation made efficient , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 2009 , organization=

2009
[29]

International Symposium on Mathematical Foundations of Computer Science , pages=

On minimising automata with errors , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 2011 , organization=

2011
[30]

International Journal of Foundations of Computer Science , volume=

Optimal hyper-minimization , author=. International Journal of Foundations of Computer Science , volume=. 2011 , publisher=

2011
[31]

Problemy Kibernetiki , volume=

Enumeration of finite automata , author=. Problemy Kibernetiki , volume=
[32]

Theoretical Computer Science , volume=

Enumeration and random generation of accessible automata , author=. Theoretical Computer Science , volume=. 2007 , publisher=

2007
[33]

Discrete Mathematics & Theoretical Computer Science , volume=

On the asymptotic enumeration of accessible automata , author=. Discrete Mathematics & Theoretical Computer Science , volume=. 2010 , publisher=

2010
[34]

Asymptotic enumeration of Minimal Automata

Asymptotic enumeration of minimal automata , author=. arXiv preprint arXiv:1109.5683 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[35]

International Symposium on Mathematical Foundations of Computer Science , pages=

Random deterministic automata , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 2014 , organization=

2014
[36]

Stirling numbers of the second kind , author=
[37]

IRE Transactions on information theory , volume=

Three models for the description of language , author=. IRE Transactions on information theory , volume=. 1956 , publisher=

1956
[38]

Finitary models of language users , author=
[39]

Cognition , volume=

Negative evidence in language acquisition , author=. Cognition , volume=. 1993 , publisher=

1993
[40]

Journal of child language , volume=

Adult reformulations of child errors as negative evidence , author=. Journal of child language , volume=. 2003 , publisher=

2003
[41]

arXiv preprint arXiv:2310.05161 , year=

Recurrent neural language models as probabilistic finite-state automata , author=. arXiv preprint arXiv:2310.05161 , year=

work page arXiv
[42]

Trends in cognitive sciences , volume=

Dissociating language and thought in large language models , author=. Trends in cognitive sciences , volume=. 2024 , publisher=

2024
[43]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
[44]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
[45]

Coding for Interactive Communication: A Survey , volume=

Communication complexity (for algorithm designers) , author=. Coding for Interactive Communication: A Survey , volume=. 2016 , publisher=

2016
[46]

Journal of computer and system sciences , volume=

Relationships between nondeterministic and deterministic tape complexities , author=. Journal of computer and system sciences , volume=. 1970 , publisher=

1970
[47]

arXiv preprint arXiv:2404.08819 , year=

The illusion of state in state-space models , author=. arXiv preprint arXiv:2404.08819 , year=

work page arXiv
[48]

arXiv preprint arXiv:2404.14994 , year=

Transformers Can Represent n -gram Language Models , author=. arXiv preprint arXiv:2404.14994 , year=

work page arXiv
[49]

Jack , title =

Copeland, B. Jack , title =. The. 2026 , edition =

2026
[50]

Communications of the ACM , volume=

A theory of the learnable , author=. Communications of the ACM , volume=. 1984 , publisher=

1984
[51]

1994 , publisher=

An introduction to computational learning theory , author=. 1994 , publisher=

1994
[52]

2010 , publisher=

Grammatical inference: learning automata and grammars , author=. 2010 , publisher=

2010
[53]

1999 , publisher=

Systems that learn: an introduction to learning theory , author=. 1999 , publisher=

1999
[54]

2007 , publisher=

Data streams: models and algorithms , author=. 2007 , publisher=

2007
[55]

Information Theory Methods in Communication Complexity , booktitle =

Ziv Bar. Information Theory Methods in Communication Complexity , booktitle =. 2002 , doi =

2002
[56]

T. S. Jayram and Ravi Kumar and D. Sivakumar , title =. Theory Comput. , volume =. 2008 , doi =

2008
[57]

Advances in Computers , volume=

Communication complexity , author=. Advances in Computers , volume=. 1997 , publisher=

1997

[1] [1]

2026 , eprint=

On Language Generation in the Limit with Bounded Memory , author=. 2026 , eprint=

2026

[2] [2]

Advances in Neural Information Processing Systems , volume=

Language generation in the limit , author=. Advances in Neural Information Processing Systems , volume=

[3] [3]

arXiv preprint arXiv:2412.18530 , pages=

Characterizations of language generation with breadth , author=. arXiv preprint arXiv:2412.18530 , pages=

work page arXiv

[4] [4]

1985 , publisher=

Systems that learn: An introduction to learning theory for cognitive and computer scientists , author=. 1985 , publisher=

1985

[5] [5]

arXiv preprint arXiv:2506.15543 , year=

Learning Algorithms in the Limit , author=. arXiv preprint arXiv:2506.15543 , year=

work page arXiv

[6] [6]

arXiv preprint arXiv:2411.15364 , year=

Exploring facets of language generation in the limit , author=. arXiv preprint arXiv:2411.15364 , year=

work page arXiv

[7] [7]

arXiv preprint arXiv:2504.14370 , year=

Density Measures for Language Generation , author=. arXiv preprint arXiv:2504.14370 , year=

work page arXiv

[8] [8]

arXiv preprint arXiv:2410.13714 , year=

Generation through the lens of learning theory , author=. arXiv preprint arXiv:2410.13714 , year=

work page arXiv

[9] [9]

arXiv preprint arXiv:2510.02795 , year=

Pareto-optimal Non-uniform Language Generation , author=. arXiv preprint arXiv:2510.02795 , year=

work page arXiv

[10] [10]

arXiv preprint arXiv:2501.04179 , year=

Generation from Noisy Examples , author=. arXiv preprint arXiv:2501.04179 , year=

work page arXiv

[11] [11]

arXiv preprint arXiv:2505.21819 , year=

Representative Language Generation , author=. arXiv preprint arXiv:2505.21819 , year=

work page arXiv

[12] [12]

arXiv preprint arXiv:2506.18642 , year=

On Union-Closedness of Language Generation , author=. arXiv preprint arXiv:2506.18642 , year=

work page arXiv

[13] [13]

arXiv preprint arXiv:2511.04103 , year=

A Characterization of List Language Identification in the Limit , author=. arXiv preprint arXiv:2511.04103 , year=

work page arXiv

[14] [14]

Machine Learning , volume=

Characteristic sets for polynomial grammatical inference , author=. Machine Learning , volume=. 1997 , publisher=

1997

[15] [15]

Information and control , volume=

Inductive inference of formal languages from positive data , author=. Information and control , volume=. 1980 , publisher=

1980

[16] [16]

Information and control , volume=

Language identification in the limit , author=. Information and control , volume=. 1967 , publisher=

1967

[17] [17]

Information and control , volume=

Complexity of automaton identification from given data , author=. Information and control , volume=. 1978 , publisher=

1978

[18] [18]

Machine Learning , volume=

Learning DFA from simple examples , author=. Machine Learning , volume=. 2001 , publisher=

2001

[19] [19]

Annual Symposium on Theoretical Aspects of Computer Science , pages=

PAC learning with simple examples , author=. Annual Symposium on Theoretical Aspects of Computer Science , pages=. 1996 , organization=

1996

[20] [20]

International Workshop on Algorithmic Learning Theory , pages=

PAC learning under helpful distributions , author=. International Workshop on Algorithmic Learning Theory , pages=. 1997 , organization=

1997

[21] [21]

2023 , publisher=

Automata theory: An algorithmic approach , author=. 2023 , publisher=

2023

[22] [22]

ACM Sigact News , volume=

Introduction to the Theory of Computation , author=. ACM Sigact News , volume=. 1996 , publisher=

1996

[23] [23]

Acm Sigact News , volume=

Introduction to automata theory, languages, and computation , author=. Acm Sigact News , volume=. 2001 , publisher=

2001

[24] [24]

Theory of machines and computations , pages=

An n log n algorithm for minimizing states in a finite automaton , author=. Theory of machines and computations , pages=. 1971 , publisher=

1971

[25] [25]

Acta Informatica , volume=

Describing an algorithm by Hopcroft , author=. Acta Informatica , volume=. 1973 , publisher=

1973

[26] [26]

RAIRO-Theoretical Informatics and Applications , volume=

Hyper-minimizing minimized deterministic finite state automata , author=. RAIRO-Theoretical Informatics and Applications , volume=. 2009 , publisher=

2009

[27] [27]

Theoretical computer science , volume=

An nlogn algorithm for hyper-minimizing a (minimized) deterministic automaton , author=. Theoretical computer science , volume=. 2010 , publisher=

2010

[28] [28]

International Symposium on Mathematical Foundations of Computer Science , pages=

Hyper-minimisation made efficient , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 2009 , organization=

2009

[29] [29]

International Symposium on Mathematical Foundations of Computer Science , pages=

On minimising automata with errors , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 2011 , organization=

2011

[30] [30]

International Journal of Foundations of Computer Science , volume=

Optimal hyper-minimization , author=. International Journal of Foundations of Computer Science , volume=. 2011 , publisher=

2011

[31] [31]

Problemy Kibernetiki , volume=

Enumeration of finite automata , author=. Problemy Kibernetiki , volume=

[32] [32]

Theoretical Computer Science , volume=

Enumeration and random generation of accessible automata , author=. Theoretical Computer Science , volume=. 2007 , publisher=

2007

[33] [33]

Discrete Mathematics & Theoretical Computer Science , volume=

On the asymptotic enumeration of accessible automata , author=. Discrete Mathematics & Theoretical Computer Science , volume=. 2010 , publisher=

2010

[34] [34]

Asymptotic enumeration of Minimal Automata

Asymptotic enumeration of minimal automata , author=. arXiv preprint arXiv:1109.5683 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[35] [35]

International Symposium on Mathematical Foundations of Computer Science , pages=

Random deterministic automata , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 2014 , organization=

2014

[36] [36]

Stirling numbers of the second kind , author=

[37] [37]

IRE Transactions on information theory , volume=

Three models for the description of language , author=. IRE Transactions on information theory , volume=. 1956 , publisher=

1956

[38] [38]

Finitary models of language users , author=

[39] [39]

Cognition , volume=

Negative evidence in language acquisition , author=. Cognition , volume=. 1993 , publisher=

1993

[40] [40]

Journal of child language , volume=

Adult reformulations of child errors as negative evidence , author=. Journal of child language , volume=. 2003 , publisher=

2003

[41] [41]

arXiv preprint arXiv:2310.05161 , year=

Recurrent neural language models as probabilistic finite-state automata , author=. arXiv preprint arXiv:2310.05161 , year=

work page arXiv

[42] [42]

Trends in cognitive sciences , volume=

Dissociating language and thought in large language models , author=. Trends in cognitive sciences , volume=. 2024 , publisher=

2024

[43] [43]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

[44] [44]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

[45] [45]

Coding for Interactive Communication: A Survey , volume=

Communication complexity (for algorithm designers) , author=. Coding for Interactive Communication: A Survey , volume=. 2016 , publisher=

2016

[46] [46]

Journal of computer and system sciences , volume=

Relationships between nondeterministic and deterministic tape complexities , author=. Journal of computer and system sciences , volume=. 1970 , publisher=

1970

[47] [47]

arXiv preprint arXiv:2404.08819 , year=

The illusion of state in state-space models , author=. arXiv preprint arXiv:2404.08819 , year=

work page arXiv

[48] [48]

arXiv preprint arXiv:2404.14994 , year=

Transformers Can Represent n -gram Language Models , author=. arXiv preprint arXiv:2404.14994 , year=

work page arXiv

[49] [49]

Jack , title =

Copeland, B. Jack , title =. The. 2026 , edition =

2026

[50] [50]

Communications of the ACM , volume=

A theory of the learnable , author=. Communications of the ACM , volume=. 1984 , publisher=

1984

[51] [51]

1994 , publisher=

An introduction to computational learning theory , author=. 1994 , publisher=

1994

[52] [52]

2010 , publisher=

Grammatical inference: learning automata and grammars , author=. 2010 , publisher=

2010

[53] [53]

1999 , publisher=

Systems that learn: an introduction to learning theory , author=. 1999 , publisher=

1999

[54] [54]

2007 , publisher=

Data streams: models and algorithms , author=. 2007 , publisher=

2007

[55] [55]

Information Theory Methods in Communication Complexity , booktitle =

Ziv Bar. Information Theory Methods in Communication Complexity , booktitle =. 2002 , doi =

2002

[56] [56]

T. S. Jayram and Ravi Kumar and D. Sivakumar , title =. Theory Comput. , volume =. 2008 , doi =

2008

[57] [57]

Advances in Computers , volume=

Communication complexity , author=. Advances in Computers , volume=. 1997 , publisher=

1997