Hierarchical synthetic languages require Ω(n) context length for faithful autoregressive sampling but only Θ(log n) working memory with reasoning for exact generation from the true distribution.
There ared·(d−1)choices for them (where the first choice is taken twice), and given this choice, there are three permutations
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning
Hierarchical synthetic languages require Ω(n) context length for faithful autoregressive sampling but only Θ(log n) working memory with reasoning for exact generation from the true distribution.