Large Language Models as Markov Chains

Abdelhakim Benechehab; Ambroise Odonnat; Ievgen Redko; Linus Bleistein; Nicolas Boull\'e; Oussama Zekri

arxiv: 2410.02724 · v2 · pith:6UYD4QFCnew · submitted 2024-10-03 · 📊 stat.ML · cs.AI· cs.CL· cs.LG

Large Language Models as Markov Chains

Oussama Zekri , Ambroise Odonnat , Abdelhakim Benechehab , Linus Bleistein , Nicolas Boull\'e , Ievgen Redko This is my paper

classification 📊 stat.ML cs.AIcs.CLcs.LG

keywords llmslanguagemodelsbehaviorchainsgeneralizationlargemarkov

0 comments

read the original abstract

Large language models (LLMs) are remarkably efficient across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the LLMs' generalization capabilities remains elusive. In our paper, we approach this task by drawing an equivalence between autoregressive transformer-based language models and Markov chains defined on a finite state space. This allows us to study the multi-step inference mechanism of LLMs from first principles. We relate the obtained results to the pathological behavior observed with LLMs such as repetitions and incoherent replies with high temperature. Finally, we leverage the proposed formalization to derive pre-training and in-context learning generalization bounds for LLMs under realistic data and model assumptions. Experiments with the most recent Llama and Gemma herds of models show that our theory correctly captures their behavior in practice.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
Thermodynamic coprocessor for linear operations with input-size-independent calculation time based on open quantum system
quant-ph 2025-09 unverdicted novelty 7.0

An open quantum system with bosonic modes and reservoirs implements parallel stochastic matrix-vector multiplications whose computation time is independent of input dimension, with results given by stationary energy f...
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 6.0

REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-sou...
Supervising the search process produces reliable and generalizable information-seeking agents
cs.CL 2025-02 unverdicted novelty 6.0

Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.
Generalization Bounds for Transformer-Based Next-Token Prediction in a Language Model
math.ST 2026-06 unverdicted novelty 5.0

Derives generalization bounds for transformer next-token prediction under an extended log-bilinear text data model, depending on architecture, vocabulary size, document count and length.
Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance
cs.AI 2026-05 unverdicted novelty 5.0

The authors propose creating data probes—synthetic sequences from defined random processes—to reveal how data properties drive LLM behavior across workflow stages.