MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.
What Formal Languages Can Transformers Express? A Survey
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Encoder-decoder transformers are characterized by a temporal logic extending propositional logic with a counting global modality on the encoder and a past modality on the decoder, equivalently via distributed automata.
A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.
citing papers explorer
-
MinMax Recurrent Neural Cascades
MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.
-
Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization
Encoder-decoder transformers are characterized by a temporal logic extending propositional logic with a counting global modality on the encoder and a past modality on the decoder, equivalently via distributed automata.
-
A framework for analyzing concept representations in neural models
A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.
-
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
-
How Many Different Outputs Can a Transformer Generate?
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
-
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits
Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.
-
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.