pith. sign in

arxiv: 2604.13259 · v1 · submitted 2026-04-14 · 🧮 math.DS

Global attractors and fast-slow reduction for finite-state actor-critic mean dynamics

Pith reviewed 2026-05-10 13:44 UTC · model grok-4.3

classification 🧮 math.DS
keywords actor-criticmean dynamicsglobal attractorsfast-slow systemsMarkov chainsreinforcement learningdynamical systemsLean formalization
0
0 comments X

The pith

Finite-state actor-critic mean dynamics on policy, critic, and state-law variables admit compact global attractors for any positive separation parameter.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the joint evolution of an actor parameter, a critic state, and the induced state distribution in finite-state actor-critic learning. It enlarges the phase space to include the fast distribution variable whose evolution is governed by a controlled Markov generator scaled by a small separation parameter δ. Under a softmax actor with box confinement, a coercive linear critic, and Lipschitz generators, the resulting autonomous semiflow is shown to possess a compact global attractor for every δ > 0. When the fast variable mixes exponentially uniformly, the invariant distribution map becomes Lipschitz, the reduced slow system on the actor and critic coordinates is well-posed, and the full flow tracks the reduced flow after a short initial layer while the attractors converge upper-semicontinuously to the lifted reduced attractor as δ tends to zero. All claims are stated and proved inside Lean 4 without additional axioms.

Core claim

For each δ > 0 the autonomous semiflow on the enlarged space (θ, w, μ) possesses a compact global attractor. Under a uniform exponential-mixing assumption the map θ ↦ μ_θ is Lipschitz and the reduced invariant-law system on (θ, w) is well-posed. Under an additional pathwise exponential-stability estimate for the non-autonomous fast equation, the exact flow tracks the reduced flow on every finite time interval up to the initial layer, and the exact attractors converge upper semicontinuously to the lifted reduced attractor as δ → 0. A concrete finite-state reference-state minorization condition is given that implies the pathwise hypothesis.

What carries the argument

The enlarged autonomous semiflow on the product space (θ, w, μ) whose fast coordinate obeys the exact controlled-Markov equation δ μ̇ = Q_θ^* μ, together with the compact global attractor of this semiflow and its reduction to the slow invariant-law system obtained by replacing μ with its θ-dependent stationary measure μ_θ.

If this is right

  • For every fixed positive separation δ the enlarged dynamics admit a nonempty compact global attractor.
  • The stationary distribution map θ ↦ μ_θ is Lipschitz continuous once uniform exponential mixing holds.
  • The reduced slow system on the actor and critic coordinates alone is well-posed and inherits existence of equilibria or limit sets from the full system.
  • After a short initial layer, solutions of the full system remain close to solutions of the reduced system on any finite time interval.
  • As the separation parameter tends to zero the attractors of the full system converge upper-semicontinuously onto the lift of the reduced attractor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reduction supplies a rigorous justification for replacing the fast distribution dynamics by its stationary measure in long-run analyses of adaptive reinforcement-learning algorithms.
  • The explicit minorization condition on a reference state gives a directly checkable criterion that practitioners can verify on a given finite-state MDP to guarantee the pathwise stability hypothesis.
  • Because the entire argument is machine-checked in Lean 4, the same formalization style could be reused to obtain verified global-attractor statements for other mean-field learning models.
  • The upper-semicontinuous convergence of attractors suggests that limit sets computed on the reduced system remain valid approximations of the long-run behavior even when δ is only moderately small.

Load-bearing premise

The uniform exponential-mixing assumption on the family of Markov chains that makes the stationary map Lipschitz and supplies the pathwise exponential stability needed for the tracking and convergence statements.

What would settle it

An explicit finite-state MDP and choice of actor-critic updates in which trajectories of the full three-variable system remain bounded away from every trajectory of the reduced two-variable system for arbitrarily small positive δ after the initial transient, or a concrete generator family where the global attractor ceases to exist once the uniform coercivity or Lipschitz condition on Q_θ is dropped.

Figures

Figures reproduced from arXiv: 2604.13259 by Vladyslav Prytula (zooplus SE).

Figure 1
Figure 1. Figure 1: Two-state example with asymmetric rewards. (a) Phase portrait projected onto ( [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Finite-time reduction in the two-state example. (a) State-law defect [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
read the original abstract

When a learning algorithm reshapes the data distribution it trains on, the long-run behavior depends on the joint evolution of the policy, the value estimate, and the data distribution. We study finite-state actor-critic mean dynamics on the enlarged phase space $(\theta,w,\mu)$, where $\theta$ is the actor parameter, $w$ is an auxiliary critic state, and $\mu$ is a state-law variable (the distribution over states induced by the current policy). The state-law coordinate follows the exact controlled-Markov equation $\delta \dot\mu = Q_\theta^*\mu$. Under a softmax actor with box confinement (a smooth proxy for parameter clipping), a uniformly coercive linear critic equation, and a Lipschitz generator family $\theta \mapsto Q_\theta$, we prove that for each $\delta > 0$ the resulting autonomous semiflow possesses a compact global attractor. Under a uniform exponential-mixing assumption, we prove that the invariant-law map $\theta \mapsto \mu_\theta$ is Lipschitz and that the reduced invariant-law system on $(\theta,w)$ is well posed. Under an additional pathwise exponential-stability estimate for the non-autonomous fast state equation, we show that the exact flow tracks the reduced flow on every finite time interval up to the initial layer, and that the exact attractors converge upper semicontinuously to the lifted reduced attractor as $\delta \to 0$. We also give a concrete finite-state reference-state minorization condition implying the pathwise hypothesis. All results are formalized in Lean 4 without custom axioms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript studies the mean dynamics of finite-state actor-critic algorithms on the enlarged phase space (θ, w, μ), where μ satisfies the controlled Markov equation δ μ̇ = Q_θ^* μ. Under a softmax actor with box confinement, a uniformly coercive linear critic, and Lipschitz continuity of θ ↦ Q_θ, it proves that the autonomous semiflow possesses a compact global attractor for each fixed δ > 0. Under uniform exponential mixing, the invariant-law map θ ↦ μ_θ is shown to be Lipschitz and the reduced system on (θ, w) is well-posed. With an additional pathwise exponential-stability estimate for the fast non-autonomous equation (implied by a concrete finite-state reference-state minorization condition), the exact flow tracks the reduced flow on finite time intervals up to an initial layer, and the exact attractors converge upper semicontinuously to the lifted reduced attractor as δ → 0. All results are formalized in Lean 4 without custom axioms.

Significance. If the claims hold, the work supplies a rigorous justification for fast-slow reductions and long-term behavior in actor-critic mean dynamics, including explicit conditions under which reduced models accurately capture attractor structure. The machine-checked Lean 4 formalization without ad-hoc axioms is a notable strength that enhances verifiability, and the concrete minorization condition provides a practical bridge from abstract hypotheses to finite-state models. These contributions strengthen the dynamical-systems analysis of reinforcement-learning algorithms.

minor comments (2)
  1. [Abstract] The abstract introduces 'box confinement (a smooth proxy for parameter clipping)' without stating its explicit functional form; adding the precise expression (e.g., the smoothing function and its derivative bounds) would improve immediate readability.
  2. The statement that the minorization condition implies the pathwise exponential-stability estimate is central; a short remark on the quantitative constants obtained from the minorization (e.g., the resulting mixing rate) would help readers assess the strength of the hypothesis.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment and recommendation to accept the manuscript. The referee's summary correctly captures the main theorems on compact global attractors for each fixed δ > 0, the Lipschitz continuity of the invariant-law map under uniform exponential mixing, well-posedness of the reduced system, finite-time tracking up to the initial layer, upper semicontinuous convergence of attractors as δ → 0, the concrete reference-state minorization condition, and the Lean 4 formalization without custom axioms.

Circularity Check

0 steps flagged

No significant circularity; derivations are self-contained mathematical proofs

full rationale

The paper establishes existence of global attractors for fixed δ>0, well-posedness of the reduced slow system, finite-time tracking, and upper-semicontinuous attractor convergence as δ→0. These rest on explicitly listed hypotheses (softmax actor with box confinement, uniformly coercive linear critic, Lipschitz θ↦Q_θ, uniform exponential mixing, pathwise exponential stability) plus a concrete finite-state minorization condition that implies the pathwise hypothesis. All steps are formalized in Lean 4 with no custom axioms, so the derivation chain is machine-verified and does not reduce any claimed result to a fitted parameter, self-definition, or load-bearing self-citation. No equations equate a prediction to its own input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 5 axioms · 0 invented entities

The central claims rest on several domain assumptions about mixing rates, stability, and Lipschitz continuity that are standard in dynamical-systems treatments of RL but must be verified for each concrete MDP.

axioms (5)
  • domain assumption Uniform exponential-mixing assumption on the controlled Markov chain
    Invoked to obtain Lipschitz continuity of the invariant-law map θ ↦ μ_θ.
  • domain assumption Pathwise exponential-stability estimate for the non-autonomous fast state equation
    Required to prove tracking of the reduced flow on finite time intervals.
  • domain assumption Lipschitz continuity of the generator family θ ↦ Q_θ
    Used to guarantee existence of the autonomous semiflow.
  • domain assumption Uniform coercivity of the linear critic equation
    Ensures well-posedness of the critic dynamics.
  • domain assumption Softmax actor with box confinement
    Smooth proxy for parameter clipping that keeps the semiflow inside a compact set.

pith-pipeline@v0.9.0 · 5584 in / 1811 out tokens · 25773 ms · 2026-05-10T13:44:33.312460+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    Benaim, A dynamical system approach to stochastic approximations,SIAM J

    M. Benaim, A dynamical system approach to stochastic approximations,SIAM J. Control Optim.34(1996), 437–472

  2. [2]

    V. S. Borkar and S. P. Meyn, The ODE method for convergence of stochastic approximation and reinforcement learning,SIAM J. Control Optim.38(2000), 447–469

  3. [3]

    V. R. Konda and V. S. Borkar, Actor-critic–type learning algorithms for Markov decision processes,SIAM J. Control Optim.38(1999), 94–123

  4. [4]

    S. D. Liu, S. Chen, and S. Zhang, The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise,J. Mach. Learn. Res.26(2025), Paper 24, 1–76

  5. [5]

    V. R. Konda and J. N. Tsitsiklis, On the convergence of actor-critic algorithms,SIAM J. Control Optim.42(2003), 1143–1166

  6. [6]

    A. Y. Mitrophanov, Sensitivity and convergence of uniformly ergodic Markov chains,J. Appl. Probab.42(2005), 1003–1014. 14

  7. [7]

    S. P. Meyn and R. L. Tweedie,Markov Chains and Stochastic Stability, 2nd ed., Cambridge University Press, Cambridge, 2009

  8. [8]

    J. C. Perdomo, T. Zrnic, C. Mendler-D¨ unner, and M. Hardt, Performative prediction, inProc. 37th International Conference on Machine Learning, PMLR 119, 2020, pp. 7599–7609

  9. [9]

    J. R. Norris,Markov Chains, Cambridge University Press, Cambridge, 1997

  10. [10]

    J. C. Robinson,Infinite-Dimensional Dynamical Systems, Cambridge University Press, Cam- bridge, 2001

  11. [11]

    P. J. Schweitzer, Perturbation theory and finite Markov chains,J. Appl. Probab.5(1968), 401–413

  12. [12]

    Truquet, A Perturbation Analysis of Markov Chains Models with Time-Varying Parameters, Bernoulli26(2020), 2876–2906

    L. Truquet, A Perturbation Analysis of Markov Chains Models with Time-Varying Parameters, Bernoulli26(2020), 2876–2906

  13. [13]

    Y¨ uksel, On Borkar and Young Relaxed Control Topologies and Continuous Dependence of Invariant Measures on Control Policy,SIAM J

    S. Y¨ uksel, On Borkar and Young Relaxed Control Topologies and Continuous Dependence of Invariant Measures on Control Policy,SIAM J. Control Optim.62(2024), 2367–2386

  14. [14]

    J. K. Hale, X.-B. Lin, and G. Raugel, Upper semicontinuity of attractors for approximations of semigroups and partial differential equations,Math. Comp.50(1988), 89–123

  15. [15]

    Temam,Infinite-Dimensional Dynamical Systems in Mechanics and Physics, 2nd ed., Springer, New York, 1997

    R. Temam,Infinite-Dimensional Dynamical Systems in Mechanics and Physics, 2nd ed., Springer, New York, 1997

  16. [16]

    H. K. Khalil,Nonlinear Systems, 3rd ed., Prentice Hall, Upper Saddle River, NJ, 2002

  17. [17]

    Verhulst,Methods and Applications of Singular Perturbations, Springer, New York, 2005

    F. Verhulst,Methods and Applications of Singular Perturbations, Springer, New York, 2005. Acknowledgments The Lean 4 formalization accompanying this paper was developed with assistance from Claude (Anthropic). 15