What Formal Languages Can Transformers Express? A Survey

Strobl, Lena, Merrill, William, Weiss, Gail, Chiang, David, Angluin, Dana · 2024 · DOI 10.1162/tacl_a_00663

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

MinMax Recurrent Neural Cascades

cs.LG · 2026-05-07 · conditional · novelty 8.0 · 2 refs

MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.

Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization

cs.LO · 2026-05-08 · unverdicted · novelty 7.0

Encoder-decoder transformers are characterized by a temporal logic extending propositional logic with a counting global modality on the encoder and a past modality on the decoder, equivalently via distributed automata.

A framework for analyzing concept representations in neural models

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation

cs.CL · 2025-02-28 · unverdicted · novelty 7.0

CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.

How Many Different Outputs Can a Transformer Generate?

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.

A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

cs.CL · 2023-05-30 · conditional · novelty 5.0

Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.

citing papers explorer

Showing 7 of 7 citing papers.

MinMax Recurrent Neural Cascades cs.LG · 2026-05-07 · conditional · none · ref 7 · 2 links
MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.
Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization cs.LO · 2026-05-08 · unverdicted · none · ref 18
Encoder-decoder transformers are characterized by a temporal logic extending propositional logic with a counting global modality on the encoder and a past modality on the decoder, equivalently via distributed automata.
A framework for analyzing concept representations in neural models cs.CL · 2026-05-02 · unverdicted · none · ref 192
A new framework shows concept subspaces are not unique, estimator choice affects containment and disentanglement, LEACE works well but generalizes poorly, and HuBERT encodes phone info as contained and disentangled from speaker info while speaker info resists compact containment.
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation cs.CL · 2025-02-28 · unverdicted · none · ref 125
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
How Many Different Outputs Can a Transformer Generate? cs.LG · 2026-05-21 · unverdicted · none · ref 96
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits cs.LG · 2026-05-19 · unverdicted · none · ref 25
Applies optimal transport to bound OOD generalization error in Transformers via Lipschitz continuity and TC^0 circuit depth lower bounds for Dyck-k backtracking, supported by evaluations on 54 configurations.
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate cs.CL · 2023-05-30 · conditional · none · ref 212
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.

What Formal Languages Can Transformers Express? A Survey

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer