Recognition: unknown
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models
Pith reviewed 2026-05-07 10:49 UTC · model grok-4.3
The pith
Finite transformer token evolution converges pathwise to a stochastic particle system with noise-driven synchronization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction energy on average, provided that the common noise is coercive
What carries the argument
The common noise term in the limiting stochastic interacting particle system, which drives synchronization by overpowering the deterministic self-attention drift when sufficiently coercive.
If this is right
- The layerwise token dynamics admit quantitative approximation by the continuous-time stochastic particle system.
- The distribution of tokens obeys a specific stochastic partial differential equation in the scaling limit.
- Propagation of chaos holds, so the tokens behave as independent copies of the limiting distribution for large token counts.
- Synchronization by noise occurs with exponential average dissipation of interaction energy when the coercivity condition holds.
Where Pith is reading between the lines
- The scaling limit opens the possibility of analyzing very deep transformers using tools from stochastic analysis instead of discrete recursion.
- The synchronization phenomenon might motivate adding controlled shared noise to finite-width transformers to encourage alignment of token representations during training.
- The characterization of suitable activation functions supplies a concrete design criterion for nonlinearities that promote dissipative behavior in the continuous limit.
Load-bearing premise
The common noise must be sufficiently coercive relative to the deterministic self-attention drift, together with the specific conditions on activation functions; if this coercivity fails, the synchronization and exponential energy dissipation claims do not hold.
What would settle it
Numerical integration of the limiting stochastic particle system with noise intensity below the coercivity threshold set by the self-attention drift, showing that the expected interaction energy fails to decay exponentially.
Figures
read the original abstract
We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction energy on average, provided that the common noise is sufficiently coercive relative to the deterministic self-attention drift. We finally characterize the activation functions satisfying the former condition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proves pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MLP blocks to a continuous-time stochastic interacting particle system. It identifies the SPDE for the limiting token distribution, establishes quantitative bounds with commuting limits, proves propagation of chaos for large token counts, and shows that the limiting model exhibits synchronization by noise with exponential dissipation of interaction energy on average, provided the common noise is sufficiently coercive relative to the deterministic self-attention drift. It also characterizes the activation functions satisfying this coercivity condition.
Significance. If the results hold, this work supplies a rigorous mathematical link between discrete transformer dynamics and continuous stochastic particle systems, with potential to explain synchronization phenomena in large models. The quantitative bounds, commuting limits, and explicit characterization of admissible activation functions are strengths that could support further analysis of scaling and emergent behavior in neural networks. The conditional nature of the synchronization result (tied to coercivity) is appropriately stated.
minor comments (3)
- The introduction could more explicitly state the precise scaling regime (e.g., how the layer step size and width enter the quantitative error bounds) to make the limit passage clearer to readers outside the immediate subfield.
- Notation for the interacting particle system and the common noise term should be introduced with a dedicated table or diagram in §2 to improve readability when tracking the passage from the discrete transformer to the SPDE.
- A brief remark on the well-posedness of the limiting SPDE under the stated coercivity assumption would help readers verify that the synchronization result applies to a unique solution.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our results on pathwise convergence of transformer token dynamics to stochastic particle systems and SPDEs, as well as the recognition of the quantitative bounds, commuting limits, propagation of chaos, and the conditional synchronization-by-noise result under coercivity. We appreciate the recommendation for minor revision and the acknowledgment that the conditional nature of the synchronization is appropriately stated. No major comments were raised in the report.
Circularity Check
No significant circularity; derivation is self-contained mathematical proof
full rationale
The paper establishes pathwise convergence of finite-depth finite-width transformer token dynamics (with MLP blocks) to a continuous-time stochastic interacting particle system, identifies the limiting SPDE, proves propagation of chaos, and shows conditional synchronization by noise with exponential energy dissipation. All steps are quantitative limit passages under explicitly stated assumptions on activation functions and a coercivity condition on common noise relative to self-attention drift. The final characterization of admissible activation functions is derived as part of the proof rather than presupposed. No parameter fitting, self-definitional reductions, load-bearing self-citations, or imported uniqueness theorems appear in the claimed chain; the argument remains conditional and externally falsifiable via the stated coercivity requirement.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The transformer has finite depth and width with standard self-attention and MLP blocks whose evolution can be tracked layerwise.
- standard math Existence, uniqueness, and well-posedness of solutions to the limiting stochastic particle system and SPDE.
Forward citations
Cited by 2 Pith papers
-
Uniform Scaling Limits in AdamW-Trained Transformers
AdamW-trained transformer hidden states and backpropagated variables converge uniformly in L2 to a forward-backward ODE system (McKean-Vlasov when non-causal) at rate O(L^{-1}+L^{-1/3}H^{-1/2}) as depth L and heads H ...
-
Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime
In the low-temperature regime, the token distribution in mean-field transformers concentrates onto the push-forward under a key-query-value projection with Wasserstein distance scaling as √(log(β+1)/β) exp(Ct) + exp(-ct).
Reference graph
Works this paper leans on
-
[1]
Adler and Jonathan E
Robert J. Adler and Jonathan E. Taylor. Orthogonal expansions. InRandom Fields and Geometry, Springer Monographs in Mathematics, chapter 3, pages 65–71. Springer, 2007
2007
-
[2]
Antonio Álvarez-López, Borjan Geshkovski, and Domènec Ruiz-Balet. Perceptrons and localization of attention’s mean-field landscape.arXiv preprint arXiv:2601.21366, 2026
work page internal anchor Pith review arXiv 2026
-
[3]
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473, 2014
work page internal anchor Pith review arXiv 2014
-
[4]
Krishnakumar Balasubramanian, Sayan Banerjee, and Philippe Rigollet. On the structure of stationary solutions to McKean-Vlasov equations with applications to noisy transformers. arXiv preprint arXiv:2510.20094, 2025
-
[5]
Quantitative Gaussian approximation of randomly initialized deep neural networks.Machine Learning, 113(9):6373–6393, Sep 2024
Andrea Basteri and Dario Trevisan. Quantitative Gaussian approximation of randomly initialized deep neural networks.Machine Learning, 113(9):6373–6393, Sep 2024
2024
-
[6]
Geodesic distance Riesz energy on the sphere.Transactions of the American mathematical Society, 372(5):3141–3166, 2019
Dmitriy Bilyk and Feng Dai. Geodesic distance Riesz energy on the sphere.Transactions of the American mathematical Society, 372(5):3141–3166, 2019
2019
-
[7]
Emergence of meta-stable clus- tering in mean-field transformer models
Giuseppe Bruno, Federico Pasqualotto, and Andrea Agazzi. Emergence of meta-stable clus- tering in mean-field transformer models. InInternational Conference on Learning Repre- sentations (ICLR 2025), 2025
2025
-
[8]
A multiscale analysis of mean- field transformers in the moderate interaction regime
Giuseppe Bruno, Federico Pasqualotto, and Andrea Agazzi. A multiscale analysis of mean- field transformers in the moderate interaction regime. InThe Thirty-ninth Annual Confer- ence on Neural Information Processing Systems (NeurIPS), 2025
2025
-
[9]
A Unified Perspective on the Dynamics of Deep Transformers.arXiv preprint arXiv:2501.18322, 2025
Valérie Castin, Pierre Ablin, José Antonio Carrillo, and Gabriel Peyré. A unified perspective on the dynamics of deep transformers.arXiv preprint arXiv:2501.18322, 2025
-
[10]
arXiv preprint arXiv:2603.18168 , year=
Louis-Pierre Chaintron, Lénaïc Chizat, and Javier Maass. ResNets of all shapes and sizes: Convergence of training dynamics in the large-scale limit.arXiv preprint arXiv:2603.18168, 2026. 38
-
[11]
Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018
Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018
2018
-
[12]
Quantitative Clustering in Mean-Field Transformer Models
Shi Chen, Zhengjiang Lin, Yury Polyanskiy, and Philippe Rigollet. Quantitative clustering in mean-field transformer models.arXiv preprint arXiv:2504.14697, 2025
work page internal anchor Pith review arXiv 2025
-
[13]
The Hidden Width of Deep ResNets: Tight Error Bounds and Phase Diagrams
Lénaïc Chizat. The hidden width of deep ResNets: Tight error bounds and phase diagrams. arXiv preprint arXiv:2509.10167, 2025
-
[14]
Propagation of chaos for interacting particles subject to environmental noise.The Annals of Applied Probability, 26(3):1407 – 1442, 2016
Michele Coghi and Franco Flandoli. Propagation of chaos for interacting particles subject to environmental noise.The Annals of Applied Probability, 26(3):1407 – 1442, 2016
2016
-
[15]
Weak synchronization for isotropic flows.Discrete and Continuous Dynamical Systems - B, 21(9):3003–3014, 2016
Michael Cranston, Benjamin Gess, and Michael Scheutzow. Weak synchronization for isotropic flows.Discrete and Continuous Dynamical Systems - B, 21(9):3003–3014, 2016
2016
-
[16]
Christopher Criscitiello, Quentin Rebjock, Andrew D McRae, and Nicolas Boumal. Synchro- nization on circles and spheres with nonlinear interactions.arXiv preprint arXiv:2405.18273, 2024
- [17]
-
[18]
A proposal on machine learning via dynamical systems.Communications in Mathematics and Statistics, 1(5):1–11, 2017
Weinan E. A proposal on machine learning via dynamical systems.Communications in Mathematics and Statistics, 1(5):1–11, 2017
2017
-
[19]
Maximilian Engel and Anna Shalova. Random quadratic form on a sphere: Synchronization by common noise.arXiv preprint arXiv:2603.06187, 2026
-
[20]
Favaro, B
S. Favaro, B. Hanin, D. Marinucci, I. Nourdin, and G. Peccati. Quantitative CLTs in deep neural networks.Probability Theory and Related Fields, 191(3):933–977, Apr 2025
2025
-
[21]
Lev Fedorov, Michaël E Sander, Romuald Elie, Pierre Marion, and Mathieu Laurière. Clus- tering in deep stochastic transformers.arXiv preprint arXiv:2601.21942, 2026
-
[22]
Synchronization by noise for order-preserving random dynamical systems.The Annals of Probability, pages 1325–1350, 2017
Franco Flandoli, Benjamin Gess, and Michael Scheutzow. Synchronization by noise for order-preserving random dynamical systems.The Annals of Probability, pages 1325–1350, 2017
2017
-
[23]
On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3-4):707–738, 2015
Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3-4):707–738, 2015
2015
- [24]
-
[25]
Dynamic metastability in the self-attention model
Borjan Geshkovski, Hugo Koubbi, Yury Polyanskiy, and Philippe Rigollet. Dynamic metastability in the self-attention model.arXiv preprint arXiv:2410.06833, 2024
-
[26]
The emergence of clusters in self-attention dynamics.Advances in Neural Information Processing Systems, 36:57026–57037, 2023
Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, and Philippe Rigollet. The emergence of clusters in self-attention dynamics.Advances in Neural Information Processing Systems, 36:57026–57037, 2023. 39
2023
-
[27]
A mathematical perspective on transformers.Bulletin of the American Mathematical Society, 62(3):427–479, 2025
Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, and Philippe Rigollet. A mathematical perspective on transformers.Bulletin of the American Mathematical Society, 62(3):427–479, 2025
2025
-
[28]
Measure-to-measure inter- polation using transformers.arXiv preprint arXiv:2411.04551, 2024
Borjan Geshkovski, Philippe Rigollet, and Domènec Ruiz-Balet. Measure-to-measure inter- polation using transformers.arXiv preprint arXiv:2411.04551, 2024
-
[29]
Commutative scaling of width and depth in deep neural networks.Journal of Machine Learning Research, 25(299):1–41, 2024
Soufiane Hayou. Commutative scaling of width and depth in deep neural networks.Journal of Machine Learning Research, 25(299):1–41, 2024
2024
-
[30]
Width and depth limits commute in residual networks
Soufiane Hayou and Greg Yang. Width and depth limits commute in residual networks. In International Conference on Machine Learning, pages 12700–12723. PMLR, 2023
2023
-
[31]
Inequalities involving gegenbauer polynomials and their tangent lines.Mathematical Inequalities and Applications, 22(1):353–360, 2019
Tomasz Hrycak and Sebastian Schmutzhard. Inequalities involving gegenbauer polynomials and their tangent lines.Mathematical Inequalities and Applications, 22(1):353–360, 2019
2019
-
[32]
Neural tangent kernel: Convergence and generalization in neural networks
Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grau- man, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018
2018
-
[33]
Normalization in atten- tion dynamics
Nikita Karagodin, Shu Ge, Yury Polyanskiy, and Philippe Rigollet. Normalization in atten- tion dynamics. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems
-
[34]
Clustering in causal attention masking.Advances in Neural Information Processing Systems, 37:115652–115681, 2024
Nikita Karagodin, Yury Polyanskiy, and Philippe Rigollet. Clustering in causal attention masking.Advances in Neural Information Processing Systems, 37:115652–115681, 2024
2024
-
[35]
arXiv preprint arXiv:2604.01978 , year =
Hugo Koubbi, Borjan Geshkovski, and Philippe Rigollet. Homogenized transformers.arXiv preprint arXiv:2604.01978, 2026
-
[36]
Deep neural networks as Gaussian processes
Jaehoon Lee, Jascha Sohl-dickstein, Jeffrey Pennington, Roman Novak, Sam Schoenholz, and Yasaman Bahri. Deep neural networks as Gaussian processes. InInternational Confer- ence on Learning Representations, 2018
2018
-
[37]
A mean field analysis of deep ResNet and beyond: Towards provably optimization via overparameterization from depth
Yiping Lu, Chao Ma, Yulong Lu, Jianfeng Lu, and Lexing Ying. A mean field analysis of deep ResNet and beyond: Towards provably optimization via overparameterization from depth. InInternational Conference on Machine Learning, pages 6426–6436. PMLR, 2020
2020
-
[38]
Scaling ResNets in the large-depth regime.Journal of Machine Learning Research, 26(56):1–48, 2025
Pierre Marion, Adeline Fermanian, Gérard Biau, and Jean-Philippe Vert. Scaling ResNets in the large-depth regime.Journal of Machine Learning Research, 26(56):1–48, 2025
2025
-
[39]
Almost global consensus on the n-sphere.IEEE Transactions on Automatic Control, 63(6):1664–1675, 2017
Johan Markdahl, Johan Thunberg, and Jorge Gonçalves. Almost global consensus on the n-sphere.IEEE Transactions on Automatic Control, 63(6):1664–1675, 2017
2017
-
[40]
R. M. Neal.Bayesian Learning for Neural Networks, Vol. 118 of Lecture Notes in Statistics. Springer-Verlag, 1996
1996
-
[41]
Infinitely deep neural networks as diffusion pro- cesses
Stefano Peluchetti and Stefano Favaro. Infinitely deep neural networks as diffusion pro- cesses. InInternational Conference on Artificial Intelligence and Statistics, pages 1126–1136. PMLR, 2020
2020
-
[42]
Synchronization of mean-field models on the circle.arXiv preprint arXiv:2507.22857, 2025
Yury Polyanskiy, Philippe Rigollet, and Andrew Yao. Synchronization of mean-field models on the circle.arXiv preprint arXiv:2507.22857, 2025. 40
-
[43]
Flots browniens isotropes sur la sphère.Annales de l’Institut Henri Poincare (B) Probability and Statistics, 35(3):313–354, 1999
Olivier Raimond. Flots browniens isotropes sur la sphère.Annales de l’Institut Henri Poincare (B) Probability and Statistics, 35(3):313–354, 1999
1999
-
[44]
Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. The MIT Press, 11 2005
2005
-
[45]
The mean-field dynamics of transformers
Philippe Rigollet. The mean-field dynamics of transformers.arXiv preprint arXiv:2512.01868, 2025
-
[46]
Sinkformers: Trans- formers with doubly stochastic attention
Michael E Sander, Pierre Ablin, Mathieu Blondel, and Gabriel Peyré. Sinkformers: Trans- formers with doubly stochastic attention. InInternational Conference on Artificial Intelli- gence and Statistics, pages 3515–3530. PMLR, 2022
2022
-
[47]
I. J. Schoenberg. Positive definite functions on spheres.Duke Mathematical Journal, 9(1):96 – 108, 1942
1942
-
[48]
Solutions of stationary McKean-Vlasov equation on a high-dimensional sphere and other Riemannian manifolds.Advances in Nonlinear Analysis, 15(1):20250141, 2026
Anna Shalova and André Schlichting. Solutions of stationary McKean-Vlasov equation on a high-dimensional sphere and other Riemannian manifolds.Advances in Nonlinear Analysis, 15(1):20250141, 2026
2026
-
[49]
American Mathematical Soc., 1939
Gabor Szeg.Orthogonal polynomials, volume 23. American Mathematical Soc., 1939
1939
-
[50]
Topics in propagation of chaos.Ecole d’été de probabilités de Saint- Flour XIX—1989, 1464:165–251, 1991
Alain-Sol Sznitman. Topics in propagation of chaos.Ecole d’été de probabilités de Saint- Flour XIX—1989, 1464:165–251, 1991
1989
-
[51]
Dario Trevisan. Wide deep neural networks with Gaussian weights are very close to Gaussian processes.arXiv preprint arXiv:2312.11737, 2023
-
[52]
Attention is all you need.Advances in neural informa- tion processing systems, 30, 2017
AshishVaswani, NoamShazeer, NikiParmar, JakobUszkoreit, LlionJones, AidanNGomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural informa- tion processing systems, 30, 2017
2017
-
[53]
Cédric Villani.Optimal transport – Old and new, volume 338, pages xxii+973. 01 2008
2008
-
[54]
Root mean square layer normalization.Advances in neural information processing systems, 32, 2019
Biao Zhang and Rico Sennrich. Root mean square layer normalization.Advances in neural information processing systems, 32, 2019
2019
-
[55]
Yuriiformer: A suite of nesterov- accelerated transformers.arXiv preprint arXiv:2601.23236, 2026
Aleksandr Zimin, Yury Polyanskiy, and Philippe Rigollet. Yuriiformer: A suite of nesterov- accelerated transformers.arXiv preprint arXiv:2601.23236, 2026. A Technical Lemmas for Section 2 A.1 Proof of Lemma 2.4 We repeatedly use that the truncation mapT, the cutoffρ, and the truncated projectionx7→Pρ x are globally Lipschitz and bounded, and thatT(Rd)⊂B(0...
-
[56]
Terms ofO(|u| −3)contribute3|w| 3
-
[57]
Terms ofO(|u| −5)contribute18|w| 3
-
[58]
Summing these gives|D 3LN(u)| ≤36|w| 3/|u|3
Terms ofO(|u| −7)contribute15|w| 3. Summing these gives|D 3LN(u)| ≤36|w| 3/|u|3. The Lagrange remainder|R 2| ≤ 1 3! sup|D 3LN| yields the constant36/6 = 6, completing the proof. We now proceed to provide the proof of Lemma 2.6: Proof of Lemma 2.6.First, we expand the intermediate stepY i ℓ. Using the first-order Taylor expansion of the layer normalization...
-
[59]
Using the projection bound|Pu −P v| ≤2|u−v|, |PY i ℓ w−P X i ℓ w| ≤2|Y i ℓ −X i ℓ||w| ≤2|w|c∆t. 44
-
[60]
Using the decomposition|A(Y)Y−A(X)X| ≤ |A(Y)−A(X)||Y|+ |A(X)||Y−X|and recalling|Y i ℓ |= 1: 1 2 |PY i ℓ w|2 − 1 2 |PX i ℓ w|2 = 1 2 ((X i ℓ)⊤w)2 −((Y i ℓ )⊤w)2 ≤ |w|2c∆t
LetA(u) = 1 2 |Puw|2. Using the decomposition|A(Y)Y−A(X)X| ≤ |A(Y)−A(X)||Y|+ |A(X)||Y−X|and recalling|Y i ℓ |= 1: 1 2 |PY i ℓ w|2 − 1 2 |PX i ℓ w|2 = 1 2 ((X i ℓ)⊤w)2 −((Y i ℓ )⊤w)2 ≤ |w|2c∆t. Since|A(X)| ≤ 1 2 |w|2, the total bound for this term is: |w|2c∆t+ 1 2 |w|2c∆t= 3 2 |w|2c∆t
-
[61]
Summing these contributions yields|Ebase| ≤2|w|c∆t+ 9 2 |w|2c∆t
Using the product rule on((u)⊤w)Puw: ((Y i ℓ )⊤w)PY i ℓ w−((X i ℓ)⊤w)PX i ℓ w ≤ |(Y i ℓ −X i ℓ)⊤w||PY i ℓ w|+|(X i ℓ)⊤w||PY i ℓ w−P X i ℓ w| ≤(c∆t|w|)(|w|) + (|w|)(2|w|c∆t) = 3|w| 2c∆t. Summing these contributions yields|Ebase| ≤2|w|c∆t+ 9 2 |w|2c∆t. Finally, the current expression still depends onw= √ ∆tGm ℓ+1(Y i ℓ ). We substitute this with the point-m...
-
[62]
Linear term: |PX i ℓ w−P X i ℓ w′| ≤ |δ|
-
[63]
First quadratic term: 1 2 |PX i ℓ w|2X i ℓ − 1 2 |PX i ℓ w′|2X i ℓ ≤ |w||δ|+ 1 2 |δ|2
-
[64]
Summing these yields|E noise| ≤(1 + 3|w|)|δ|+ 3 2 |δ|2
Second quadratic term: ((X i ℓ)⊤w)PX i ℓ w−((X i ℓ)⊤w′)PX i ℓ w′ ≤2|w||δ|+|δ| 2. Summing these yields|E noise| ≤(1 + 3|w|)|δ|+ 3 2 |δ|2. Collecting all terms (Ri ℓ =E att +E mlp + Ebase +E noise) concludes the proof. Lemma A.2.Letw:= √ ∆tGm ℓ+1(Y i ℓ )and letE mlp be the corresponding MLP remainder from (A.2). Then |Emlp| ≤c|w| 3 almost surely. Proof.LetA...
-
[65]
BoundingA 1 t (Spatial discretization error): A1 t :=E " sup 0≤s≤t 1 N NX i=1 Z s 0 a(X i tℓu ,X tℓu )−a( ˆX i tℓu , ˆXtℓu ) du 2# . Using the Cauchy-Schwarz inequality ( R s 0 fdu 2 ≤s R s 0 |f|2 du≤T R t 0 |f|2 du) and the global Lipschitz assumption (A.4): A1 t ≤T Z t 0 1 N NX i=1 E a(X i tℓu ,X tℓu )−a( ˆX i tℓu , ˆXtℓu ) 2 du ≤T K 2 Z t 0 E " 1 N NX ...
-
[66]
Using Cauchy-Schwarz and the Lipschitz assumption (A.4): A2 t ≤T K 2 Z t 0 1 N NX i=1 E h |X i u −X i tℓu |2 i du
BoundingA 2 t (Time regularity of the drift): A2 t :=E " sup 0≤s≤t 1 N NX i=1 Z s 0 a(X i u,X u)−a(X i tℓu ,X tℓu ) du 2# . Using Cauchy-Schwarz and the Lipschitz assumption (A.4): A2 t ≤T K 2 Z t 0 1 N NX i=1 E h |X i u −X i tℓu |2 i du. By Lemma A.3 and the assumption∆t≤1,E[|X i u −X i tℓu |2]≤2K 3∆t(1 + ∆t)≤4K 3∆t. Thus: A2 t ≤T K 2 Z t 0 4K3∆tdu= 4T 2K2K3∆t
-
[67]
Z t 0 ∞X k=1 ˜σk(X i tℓu )−˜σk( ˆX i tℓu ) 2 du # ≤4K 2 Z t 0 E
BoundingB 1 t (Martingale spatial error): B1 t :=E sup 0≤s≤t 1 N NX i=1 ∞X k=1 Z s 0 ˜σk(X i tℓu )−˜σk( ˆX i tℓu ) dBk u 2 . Using Doob’s maximal inequality and the Itô isometry: B1 t ≤4 1 N NX i=1 E "Z t 0 ∞X k=1 ˜σk(X i tℓu )−˜σk( ˆX i tℓu ) 2 du # ≤4K 2 Z t 0 E " 1 N NX i=1 |X i tℓu − ˆX i tℓu |2 # du≤4K 2 Z t 0 Z(u) du
-
[68]
Again using Doob’s inequality, the Itô isometry, and the Lipschitz condition (A.4): B2 t ≤4K 2 Z t 0 1 N NX i=1 E h |X i u −X i tℓu |2 i du
BoundingB 2 t (Time regularity of the diffusion): B2 t :=E sup 0≤s≤t 1 N NX i=1 ∞X k=1 Z s 0 ˜σk(X i u)−˜σk(X i tℓu ) dBk u 2 . Again using Doob’s inequality, the Itô isometry, and the Lipschitz condition (A.4): B2 t ≤4K 2 Z t 0 1 N NX i=1 E h |X i u −X i tℓu |2 i du. Applying the bound from Lemma A.3 yields: B2 t ≤4K 2 Z t 0 4K3∆tdu= 16T K 2K3∆t. ...
2048
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.