pith. sign in

Propagation of Chaos in Contextual Flow Maps

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

We develop a quantitative statistical theory of transformers in the large-context regime by adopting the abstraction of contextual flow maps (CFMs): dynamical systems that evolve a distinguished token in the presence of a contextual measure across a stack of attention blocks. Within this framework, the finite-context model approximates an idealized infinite-context system in which the contextual measure is replaced by its underlying population, so that the context length $n$ becomes a statistical resource. Exploiting the McKean--Vlasov structure of the dynamics and the classical machinery of propagation of chaos, we establish a forward bound controlling the deviation between the finite- and infinite-context CFMs uniformly along depth, and a backward bound controlling the deviation between the corresponding training trajectories uniformly across iterations of online gradient descent. Both bounds achieve the optimal Wasserstein rate $n^{-1/d}$ for general CFMs and parametric rate $n^{-1/2}$ for a restricted class of CFMs that includes transformers as a special case. The analysis rests on a new Eulerian adjoint formulation of the loss gradient and stability estimates for the resulting forward--adjoint system, both of which may be of independent interest.

fields

stat.ML 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks

stat.ML · 2026-05-21 · unverdicted · novelty 7.0

Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.

citing papers explorer

Showing 1 of 1 citing paper.

  • Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks stat.ML · 2026-05-21 · unverdicted · none · ref 4 · internal anchor

    Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.