pith. sign in

arxiv: 2410.02724 · v2 · pith:6UYD4QFCnew · submitted 2024-10-03 · 📊 stat.ML · cs.AI· cs.CL· cs.LG

Large Language Models as Markov Chains

classification 📊 stat.ML cs.AIcs.CLcs.LG
keywords llmslanguagemodelsbehaviorchainsgeneralizationlargemarkov
0
0 comments X
read the original abstract

Large language models (LLMs) are remarkably efficient across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the LLMs' generalization capabilities remains elusive. In our paper, we approach this task by drawing an equivalence between autoregressive transformer-based language models and Markov chains defined on a finite state space. This allows us to study the multi-step inference mechanism of LLMs from first principles. We relate the obtained results to the pathological behavior observed with LLMs such as repetitions and incoherent replies with high temperature. Finally, we leverage the proposed formalization to derive pre-training and in-context learning generalization bounds for LLMs under realistic data and model assumptions. Experiments with the most recent Llama and Gemma herds of models show that our theory correctly captures their behavior in practice.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

    cs.CL 2026-05 unverdicted novelty 8.0

    REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...

  2. Thermodynamic coprocessor for linear operations with input-size-independent calculation time based on open quantum system

    quant-ph 2025-09 unverdicted novelty 7.0

    An open quantum system with bosonic modes and reservoirs implements parallel stochastic matrix-vector multiplications whose computation time is independent of input dimension, with results given by stationary energy f...

  3. REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

    cs.CL 2026-05 unverdicted novelty 6.0

    REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-sou...

  4. Supervising the search process produces reliable and generalizable information-seeking agents

    cs.CL 2025-02 unverdicted novelty 6.0

    Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.

  5. Generalization Bounds for Transformer-Based Next-Token Prediction in a Language Model

    math.ST 2026-06 unverdicted novelty 5.0

    Derives generalization bounds for transformer next-token prediction under an extended log-bilinear text data model, depending on architecture, vocabulary size, document count and length.

  6. Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

    cs.AI 2026-05 unverdicted novelty 5.0

    The authors propose creating data probes—synthetic sequences from defined random processes—to reveal how data properties drive LLM behavior across workflow stages.