pith. sign in

arxiv: 2405.18512 · v1 · pith:HFHG6TPHnew · submitted 2024-05-28 · 💻 cs.LG · cs.AI

Understanding Transformer Reasoning Capabilities via Graph Algorithms

classification 💻 cs.LG cs.AI
keywords graphreasoningalgorithmicregimestaskstransformerscapabilitiesclasses
0
0 comments X
read the original abstract

Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extra tokens for algorithm execution. Our novel representational hierarchy separates 9 algorithmic reasoning problems into classes solvable by transformers in different realistic parameter scaling regimes. We prove that logarithmic depth is necessary and sufficient for tasks like graph connectivity, while single-layer transformers with small embedding dimensions can solve contextual retrieval tasks. We also support our theoretical analysis with ample empirical evidence using the GraphQA benchmark. These results show that transformers excel at many graph reasoning tasks, even outperforming specialized graph neural networks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers

    cs.LG 2026-05 accept novelty 8.0

    Graph tokenizations for Transformers induce distinct depth regimes with proven separations and impossibility results for converting between them at limited depth.

  2. Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks

    cs.LG 2026-05 unverdicted novelty 7.0

    EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.

  3. Deep sequence models tend to memorize geometrically; it is unclear why

    cs.LG 2025-10 unverdicted novelty 6.0

    Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.