pith. sign in

arxiv: 2507.11166 · v2 · pith:NGQ6JNLUnew · submitted 2025-07-15 · 🧮 math.PR

Large deviations for possibly reducible Markov chains on discrete state spaces

Pith reviewed 2026-05-19 04:57 UTC · model grok-4.3

classification 🧮 math.PR
keywords large deviationsMarkov chainsreducible chainslevel-2 LDPlevel-3 LDPDonsker-Varadhan entropysubadditive argumentsrate functions
0
0 comments X

The pith

Markov chains on discrete spaces satisfy large deviation principles even when reducible, with possibly nonconvex rate functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves level-2 and level-3 large deviation principles for the empirical measures and processes of Markov chains on any discrete state space. It requires only that the transition kernel is Markovian, without assuming irreducibility or exponential tightness of the measures. The proofs use subadditive arguments to bound the probabilities of atypical empirical behavior in an elementary, self-contained way. Because the chain can have multiple communicating classes or transient states, the resulting rate functions need not be convex and can differ from the Donsker-Varadhan entropy outside the set of invariant measures.

Core claim

The central claim is that both the level-2 large deviation principle for the empirical occupation measure and the level-3 principle for the empirical process hold for any Markov chain on a discrete state space. The associated rate functions are lower-semicontinuous and coincide with the Donsker-Varadhan entropy precisely on the set of invariant measures, but may be strictly larger and nonconvex elsewhere. These statements follow from subadditive inequalities applied to the logarithms of the probabilities of cylinder sets generated by the chain.

What carries the argument

Subadditive arguments applied to the sequence of log-probabilities of the empirical occupation measures and processes.

Load-bearing premise

The subadditive ergodic theorem applies to the relevant sequences built from the empirical measures even when the chain is reducible.

What would settle it

A concrete finite-state reducible Markov chain in which the observed decay rate of the probability of a particular empirical measure falls outside the interval predicted by the claimed rate function.

Figures

Figures reproduced from arXiv: 2507.11166 by L\'eo Daures.

Figure 1
Figure 1. Figure 1: The 7-letter Markov chain of Example 2.10. The transition probabilities of each arrow do not matter for this example, as long as they are positive on each edge of the graph and zero otherwise. and thus, the existence of linking words ξ as in Example (2.9) is not guaranteed. The two challenges brought by reducibility are the following. First, there is a notion of irreversibility. Since it is impossible to ‘… view at source ↗
Figure 2
Figure 2. Figure 2: Definition of the coupling map. The definition of the coupling map as the composition of several maps is sketched in [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of use of the coupling map, with two words [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The definition of the decoupling map as composition of slicing and stitching maps. [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Let (Xn) be the Markov chain on S = {1, 2, 3, 4} whose allowed transitions are given by the graph on the right-hand side of the figure, and whose initial measure is β = δ1. The values of transition probabilities do not matter for this example as long as they are positive on each edge of the graph and null otherwise. The set P(S) is a (finite-dimensional) simplex with extreme points {δ1, δ2, δ3, δ4}. The se… view at source ↗
Figure 6
Figure 6. Figure 6: The 3-state Markov chain of Example D.1. Example D.1. Let S = {1, 2, 3}, β = δ1, and p =   1 − ρ ρ2 ρ3 0 1 0 0 0 1   , where ρ2, ρ3 ∈ [0, 1] are such that ρ = ρ2 + ρ3 ⩽ 1. Although very simple and easily solvable by hand, the problem in this example does not fit into the irreducibility framework. We get A (1) = {λδ1 + (1 − λ)δ2 | λ ∈ [0, 1]} ∪ {λδ1 + (1 − λ)δ3 | λ ∈ [0, 1]}, and L (1) n satisfies the f… view at source ↗
Figure 7
Figure 7. Figure 7: The right-only one-dimensional random walk on [PITH_FULL_IMAGE:figures/full_fig_p056_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The Markov chain of Example D.3. The state 0 is reachable from every other state in one step and the state x is reachable from 0 in x steps, thus the Markov chain is matrix-irreducible. 53Yet it is 1-Lipschitz, thus continuous with respect to the uniform convergence topology. The weak topology is defined in Appendix B. 56 [PITH_FULL_IMAGE:figures/full_fig_p056_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual representation of the Markov chain of example [PITH_FULL_IMAGE:figures/full_fig_p057_9.png] view at source ↗
read the original abstract

We study the large deviations of Markov chains under the sole assumption that the state space is discrete. In particular, we do not require any of the usual irreducibility and exponential tightness assumptions. Using subadditive arguments, we provide an elementary and self-contained proof of the level-2 and level-3 large deviation principles. Due to the possible reducibility of the Markov chain, the rate functions may be nonconvex and may differ, outside a specific set, from the Donsker-Varadhan entropy and other classical rate functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript establishes level-2 and level-3 large deviation principles for Markov chains on discrete (possibly countably infinite) state spaces under the sole assumption that the transition kernel is a Markov kernel, without requiring irreducibility or exponential tightness. It employs subadditive arguments to obtain an elementary, self-contained proof. The resulting rate functions are permitted to be nonconvex and are shown to coincide with the Donsker-Varadhan entropy only on a specific subset of the space of measures.

Significance. If the arguments close, the work meaningfully relaxes standard hypotheses in large-deviation theory for Markov processes and supplies an accessible proof technique that applies directly to reducible chains. The explicit treatment of nonconvexity and the departure from classical rate functions outside a distinguished set constitute a genuine contribution. The self-contained character of the derivation is a clear strength.

major comments (2)
  1. [§3] §3 (upper bound for level-2 LDP): the subadditive control on limsup (1/n) log P(μ_n ∈ F) for closed F in the weak topology is stated without a separate argument that rules out mass escape to infinity on countable spaces; for reducible chains containing transient states, this leaves open whether the bound holds uniformly or requires an auxiliary tightness estimate that the paper claims to avoid.
  2. [Theorem 4.3] Theorem 4.3 (rate-function identification): the assertion that the constructed rate function equals the Donsker-Varadhan functional only on a specific set is load-bearing for the novelty claim, yet the proof sketch does not exhibit an explicit example of a reducible chain where the two functionals differ on a positive-measure set, making verification of the identification step difficult.
minor comments (2)
  1. [§2.1] The notation for the empirical occupation measure in the reducible case could be made more explicit by distinguishing the absorbing and transient components.
  2. [§3.1] A short remark on the applicability of the subadditive ergodic theorem to the sequence of log-probabilities for non-irreducible kernels would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below with clarifications and indicate revisions where they will strengthen the exposition without altering the core arguments.

read point-by-point responses
  1. Referee: [§3] §3 (upper bound for level-2 LDP): the subadditive control on limsup (1/n) log P(μ_n ∈ F) for closed F in the weak topology is stated without a separate argument that rules out mass escape to infinity on countable spaces; for reducible chains containing transient states, this leaves open whether the bound holds uniformly or requires an auxiliary tightness estimate that the paper claims to avoid.

    Authors: We appreciate the referee highlighting this potential gap in the presentation. The subadditive upper bound in §3 is obtained directly from the Markov property applied to cylinder sets and the fact that the empirical measure is always a probability measure on the discrete space; any putative mass escape would violate the normalization of μ_n, which is enforced by construction. Consequently, the limsup bound holds for every closed F in the weak topology without a separate tightness step. Nevertheless, to make this reasoning fully explicit for readers concerned with transient states, we will insert a short clarifying paragraph in the revised §3. This constitutes a partial revision. revision: partial

  2. Referee: [Theorem 4.3] Theorem 4.3 (rate-function identification): the assertion that the constructed rate function equals the Donsker-Varadhan functional only on a specific set is load-bearing for the novelty claim, yet the proof sketch does not exhibit an explicit example of a reducible chain where the two functionals differ on a positive-measure set, making verification of the identification step difficult.

    Authors: We agree that an explicit example would facilitate verification of the identification result. Theorem 4.3 establishes equality between the subadditive rate function and the Donsker-Varadhan functional precisely on the set of measures whose support lies in recurrent classes (or, more generally, on measures invariant under the transition kernel restricted to communicating classes). Outside this set the functionals differ because the subadditive construction captures the cost of visiting transient states. In the revision we will add a concrete example: a three-state chain with two absorbing states and one transient state. For this chain the rate functions differ on any measure that places positive mass on the transient state, which has positive Lebesgue measure in the simplex. This addition will be placed immediately after the statement of Theorem 4.3. revision: yes

Circularity Check

0 steps flagged

No circularity; self-contained subadditive proof

full rationale

The paper explicitly presents its level-2/3 LDP proofs as elementary and self-contained, relying on subadditive arguments applied directly to the empirical measures of a Markov chain on a discrete space without invoking irreducibility or exponential tightness. No load-bearing self-citations, fitted parameters renamed as predictions, or reductions of the rate function to its own inputs appear in the derivation chain. The rate functions are constructed via the subadditive limit and shown to satisfy the LDP properties, with the nonconvexity and deviation from Donsker-Varadhan entropy arising as a consequence rather than an input. This is the standard case of an independent proof on a countable space.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proof relies on the subadditive ergodic theorem applied to the empirical occupation measures of a possibly reducible chain; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Subadditive ergodic theorem applies to the sequence of empirical measures generated by the Markov chain
    Invoked to obtain the large-deviation upper and lower bounds without irreducibility

pith-pipeline@v0.9.0 · 5604 in / 1186 out tokens · 30135 ms · 2026-05-19T04:57:52.908505+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Large deviations for non-irreducible Markov chains on Euclidean spaces

    math.PR 2026-04 unverdicted novelty 7.0

    The weak large deviations principle holds for empirical measures of non-irreducible Markov chains on R^d with arbitrary initial measure, proved via subadditivity, with the rate function not necessarily convex.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper

  1. [1]

    Azencott, R. (1980). Grandes d´ eviations et applications. In Eighth Saint Flour Probability Summer School—1978 (Saint Flour, 1978) , Volume 774 of Lecture Notes in Math. , pp. 1–176. Springer, Berlin

  2. [2]

    Bahadur, R. R. and S. L. Zabell (1979). Large deviations of the sample mean in general vector spaces. Ann. Probab. 7 (4), 587–621

  3. [3]

    Baxter, J. R., N. C. Jain, and S. R. S. Varadhan (1991). Some familiar examples for which the large deviation principle does not hold. Comm. Pure Appl. Math. 44 (8-9), 911–923

  4. [4]

    Bourbaki, N. (1971). ´El´ ements de math´ ematique. Topologie g´ en´ erale. Chapitres 1 ` a 4. Hermann, Paris

  5. [5]

    Bryc, W. and A. Dembo (1996). Large deviations and strong mixing. Ann. Inst. H. Poincar´ e Probab. Statist. 32 (4), 549–569

  6. [6]

    Jakˇ si´ c, C.-A

    Cuneo, N., V. Jakˇ si´ c, C.-A. Pillet, and A. Shirikyan (2019). Large deviations and fluctuation theorem for selectively decoupled measures on shift spaces. Rev. Math. Phys. 31 (10), 1950036, 54

  7. [7]

    Dawson, D. A. and J. G¨ artner (1987). Large deviations from the McKean-Vlasov limit for weakly interacting diffusions. Stochastics 20 (4), 247–308

  8. [8]

    de Acosta, A. (1990). Large deviations for empirical measures of Markov chains. J. Theoret. Probab. 3(3), 395–431

  9. [9]

    de Acosta, A. and P. Ney (1998). Large deviation lower bounds for arbitrary additive functionals of a Markov chain. Ann. Probab. 26 (4), 1660–1682

  10. [10]

    de Acosta, A. D. (2022). Large deviations for Markov chains , Volume 229 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge

  11. [11]

    Dembo, A. and O. Zeitouni (2010). Large deviations techniques and applications , Volume 38 of Stochastic Modelling and Applied Probability . Springer-Verlag, Berlin. Corrected reprint of the second (1998) edition

  12. [12]

    den Hollander, F. (2000). Large deviations, Volume 14 of Fields Institute Monographs . American Mathematical Society, Providence, RI

  13. [13]

    Deuschel, J.-D. and D. W. Stroock (1989). Large deviations, Volume 137 of Pure and Applied Mathematics. Academic Press, Inc., Boston, MA

  14. [14]

    Dinwoodie, I. H. (1993). Identifying a large deviation rate function. Ann. Probab. 21 (1), 216–231

  15. [15]

    Dinwoodie, I. H. and P. Ney (1995). Occupation measures for Markov chains. J. Theoret. Probab. 8(3), 679–691

  16. [16]

    Donsker, M. D. and S. R. S. Varadhan (1975a). Asymptotic evaluation of certain Markov process expectations for large time. I. Comm. Pure Appl. Math. 28 , 1–47

  17. [17]

    Donsker, M. D. and S. R. S. Varadhan (1975b). Asymptotic evaluation of certain Markov process expectations for large time. II. Comm. Pure Appl. Math. 28 , 279–301

  18. [18]

    Donsker, M. D. and S. R. S. Varadhan (1976). Asymptotic evaluation of certain Markov process expectations for large time. III. Comm. Pure Appl. Math. 29 (4), 389–461

  19. [19]

    Donsker, M. D. and S. R. S. Varadhan (1983). Asymptotic evaluation of certain Markov process expectations for large time. IV. Comm. Pure Appl. Math. 36 (2), 183–212. 59

  20. [20]

    Ellis, R. S. (2006). Entropy, large deviations, and statistical mechanics . Classics in Mathematics. Springer-Verlag, Berlin. Reprint of the 1985 original

  21. [21]

    Fayolle, G. and A. de La Fortelle (2002). Entropy and the principle of large deviations for discrete-time Markov chains. Problemy Peredachi Informatsii 38 (4), 121–135

  22. [22]

    Jiang, Y. W. and L. M. Wu (2005). Large deviations for empirical measures of not necessarily irreducible countable Markov chains with arbitrary initial measures. Acta Math. Sin. (Engl. Ser.) 21 (6), 1377–1390

  23. [23]

    Lanford, O. E. (1973). Entropy and equilibrium states in classical statistical mechanics. pp. 1–113

  24. [24]

    and C.-E

    Lewis, J. and C.-E. Pfister (1995). Thermodynamic probability theory: Some aspects of large deviations. Russian Mathematical Surveys 50

  25. [25]

    Parthasarathy, K. R. (2005). Probability measures on metric spaces / K. R. Parthasarathy . Providence (R.I.): AMS Chelsea Publishing, American Mathematical Society

  26. [26]

    Pfister, C.-E. (2002). Thermodynamical aspects of classical lattice systems. In In and out of equilibrium (Mambucaba, 2000), Volume 51 of Progr. Probab., pp. 393–472. Birkh¨ auser Boston, Boston, MA

  27. [27]

    Pinsker, M. S. (1964). Information and information stability of random variables and processes. Holden-Day, Inc., San Francisco, Calif.-London-Amsterdam

  28. [28]

    Rassoul-Agha, F. and T. Sepp¨ al¨ ainen (2015).A course on large deviations with an introduction to Gibbs measures, Volume 162 of Graduate Studies in Mathematics . American Mathematical Society, Providence, RI

  29. [29]

    Ruelle, D. (1965). Correlation functionals. J. Mathematical Phys. 6 , 201–220

  30. [30]

    Wu, L. (2000). Uniformly integrable operators and large deviations for Markov processes. J. Funct. Anal. 172 (2), 301–376

  31. [31]

    (2002).Convex analysis in general vector spaces

    Z˘ alinescu, C. (2002).Convex analysis in general vector spaces . World Scientific Publishing Co., Inc., River Edge, NJ. 60