Understanding Transformer Reasoning Capabilities via Graph Algorithms

Anton Tsitsulin; Bahare Fatemi; Bryan Perozzi; Clayton Sanford; Ethan Hall; Jonathan Halcrow; Mehran Kazemi; Vahab Mirrokni

arxiv: 2405.18512 · v1 · pith:HFHG6TPHnew · submitted 2024-05-28 · 💻 cs.LG · cs.AI

Understanding Transformer Reasoning Capabilities via Graph Algorithms

Clayton Sanford , Bahare Fatemi , Ethan Hall , Anton Tsitsulin , Mehran Kazemi , Jonathan Halcrow , Bryan Perozzi , Vahab Mirrokni This is my paper

classification 💻 cs.LG cs.AI

keywords graphreasoningalgorithmicregimestaskstransformerscapabilitiesclasses

0 comments

read the original abstract

Which transformer scaling regimes are able to perfectly solve different classes of algorithmic problems? While tremendous empirical advances have been attained by transformer-based neural networks, a theoretical understanding of their algorithmic reasoning capabilities in realistic parameter regimes is lacking. We investigate this question in terms of the network's depth, width, and number of extra tokens for algorithm execution. Our novel representational hierarchy separates 9 algorithmic reasoning problems into classes solvable by transformers in different realistic parameter scaling regimes. We prove that logarithmic depth is necessary and sufficient for tasks like graph connectivity, while single-layer transformers with small embedding dimensions can solve contextual retrieval tasks. We also support our theoretical analysis with ample empirical evidence using the GraphQA benchmark. These results show that transformers excel at many graph reasoning tasks, even outperforming specialized graph neural networks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers
cs.LG 2026-05 accept novelty 8.0

Graph tokenizations for Transformers induce distinct depth regimes with proven separations and impossibility results for converting between them at limited depth.
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
cs.LG 2026-05 unverdicted novelty 7.0

EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
Deep sequence models tend to memorize geometrically; it is unclear why
cs.LG 2025-10 unverdicted novelty 6.0

Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.