From Saddle Points Toward Global Minima: A Newton-Type Method on Wasserstein Space
Pith reviewed 2026-05-20 09:46 UTC · model grok-4.3
The pith
The Wasserstein Saddle-Free Newton method escapes saddle points in polynomial time and converges linearly to global minimizers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose Wasserstein Saddle-Free Newton (WSFN), a second-order method that preconditions the Wasserstein gradient by a regularized square root of the squared Wasserstein Hessian. This construction preserves attraction toward directions of positive curvature while inducing repulsion along directions of negative curvature, thereby overcoming the tendency of standard Wasserstein Newton dynamics to be attracted to saddles. We also establish second-order sufficient optimality conditions on Wasserstein space for strict local minimality. Under regularity and benign landscape assumptions, we prove that WSFN escapes saddle regions and reaches an α-neighborhood of a global minimizer in polynomial tr
What carries the argument
Wasserstein Saddle-Free Newton (WSFN) that preconditions the Wasserstein gradient by a regularized square root of the squared Wasserstein Hessian to induce repulsion along negative-curvature directions.
If this is right
- Escapes saddle regions in polynomial time with improved dependence on saddle parameters compared to perturbed first-order methods.
- Converges linearly in L2-Wasserstein distance to a non-degenerate global minimizer once inside the alpha-neighborhood.
- Satisfies second-order sufficient optimality conditions for strict local minimality on Wasserstein space.
- Admits a particle-based implementation that makes the dynamics practical.
Where Pith is reading between the lines
- The preconditioning construction could be adapted to other Riemannian structures arising in optimal transport problems.
- Applications in generative modeling or distribution learning might see faster escape from poor stationary points.
- Extensions to stochastic or discrete-particle versions could be tested on benchmark functionals.
- The same curvature-repulsion idea may combine with higher-order or quasi-Newton updates on Wasserstein space.
Load-bearing premise
The benign landscape assumptions together with the regularity conditions on the functional that are required for the polynomial-time escape guarantee and the linear convergence rate to hold.
What would settle it
A concrete non-convex functional on Wasserstein space that meets all stated regularity and benign landscape assumptions yet where WSFN iterates remain trapped near a saddle for super-polynomial time or fail to exhibit linear convergence once inside the alpha-neighborhood.
Figures
read the original abstract
We study the minimization of non-convex functionals over the Wasserstein space. While recent work has showed that perturbed Wasserstein gradient methods can avoid saddle points for benign landscapes, existing approaches remain essentially first-order and do not provide fast local convergence once the iterates enter a neighborhood of a global minimizer. We propose Wasserstein Saddle-Free Newton (WSFN), a second-order method that preconditions the Wasserstein gradient by a regularized square root of the squared Wasserstein Hessian. This construction preserves attraction toward directions of positive curvature while inducing repulsion along directions of negative curvature, thereby overcoming the tendency of standard Wasserstein Newton dynamics to be attracted to saddles. We also establish second-order sufficient optimality conditions on Wasserstein space for strict local minimality. Under regularity and benign landscape assumptions, we prove that WSFN escapes saddle regions and reaches an $\alpha$-neighborhood of a global minimizer in polynomial time, with improved dependence on saddle parameters compared with prior perturbed first-order methods. Once inside this neighborhood, we show that WSFN converges linearly in $L^2$-Wasserstein distance to a non-degenerate global minimizer. Finally, we present a particle-based implementation of the method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Wasserstein Saddle-Free Newton (WSFN), a second-order method on the Wasserstein space for minimizing non-convex functionals. It preconditions the Wasserstein gradient using a regularized square root of the squared Wasserstein Hessian to repel from negative-curvature directions while attracting to positive-curvature ones. Under regularity and benign-landscape assumptions, the method is shown to escape saddle regions and reach an α-neighborhood of a global minimizer in polynomial time (with improved dependence on saddle parameters relative to perturbed first-order methods), followed by linear convergence in L²-Wasserstein distance to a non-degenerate global minimizer. Second-order sufficient optimality conditions are established, and a particle-based implementation is presented.
Significance. If the central claims hold, the work meaningfully extends saddle-free Newton ideas from Euclidean space to the infinite-dimensional Wasserstein setting, delivering both global escape guarantees and fast local rates that improve on existing first-order perturbed Wasserstein gradient methods. Explicit credit is due for the adaptation of second-order analysis to the Wasserstein tangent space, the derivation of second-order sufficient optimality conditions, and the internally consistent escape-plus-linear-convergence argument under the stated hypotheses. The particle implementation provides a concrete practical bridge, though it is presented as an approximation rather than part of the formal guarantees.
major comments (2)
- [§4] §4 (Escape analysis): the polynomial-time bound on escape from saddle regions improves the dependence on the saddle parameter relative to prior first-order work, but the proof sketch invokes the benign-landscape assumption without an explicit quantitative statement of how the regularization parameter for the Hessian square root enters the escape time; this constant must be tracked to confirm the claimed improvement is not absorbed into hidden factors.
- [Theorem 5.2] Theorem 5.2 (linear convergence): the contraction rate in L²-Wasserstein distance is stated to be linear once inside the α-neighborhood, yet the argument relies on the smallest eigenvalue of the Wasserstein Hessian at the non-degenerate minimizer; an explicit lower bound on this eigenvalue (or a concrete test for verifying non-degeneracy in applications) is needed to make the rate fully operational.
minor comments (3)
- [§3] Notation for the regularized square-root preconditioner is introduced in §3 but the precise functional-analytic setting (e.g., domain of the square-root operator on the tangent space) is only sketched; a short paragraph clarifying the Sobolev or L² regularity required would improve readability.
- [Figure 1] Figure 1 (particle trajectories) lacks axis labels on the Wasserstein-distance plot and does not indicate the value of the regularization parameter used; adding these would make the numerical illustration self-contained.
- [Abstract and Theorem 4.1] The abstract claims 'polynomial time' escape but does not specify the degree; the main theorem statement should include the explicit polynomial degree in the problem parameters.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the precise comments, which help clarify the presentation of our results. We respond to each major comment below.
read point-by-point responses
-
Referee: [§4] §4 (Escape analysis): the polynomial-time bound on escape from saddle regions improves the dependence on the saddle parameter relative to prior first-order work, but the proof sketch invokes the benign-landscape assumption without an explicit quantitative statement of how the regularization parameter for the Hessian square root enters the escape time; this constant must be tracked to confirm the claimed improvement is not absorbed into hidden factors.
Authors: We agree that the dependence on the regularization parameter λ in the square-root Hessian preconditioner should be made fully explicit. In the revised manuscript we will augment the escape-time analysis in Section 4 with an additional lemma that isolates the contribution of λ, showing that the overall polynomial bound retains its improved scaling with respect to the negative-curvature threshold (relative to first-order perturbed methods) provided λ is chosen smaller than a landscape-dependent constant that is independent of the saddle parameters. The revised proof sketch will track this factor explicitly. revision: yes
-
Referee: [Theorem 5.2] Theorem 5.2 (linear convergence): the contraction rate in L²-Wasserstein distance is stated to be linear once inside the α-neighborhood, yet the argument relies on the smallest eigenvalue of the Wasserstein Hessian at the non-degenerate minimizer; an explicit lower bound on this eigenvalue (or a concrete test for verifying non-degeneracy in applications) is needed to make the rate fully operational.
Authors: The linear rate is indeed governed by the smallest eigenvalue λ_min of the Wasserstein Hessian at the target minimizer, which is necessarily functional-dependent and therefore cannot be bounded by a universal constant. In the revision we will add a short discussion after Theorem 5.2 that (i) states the contraction factor explicitly as 1−Θ(λ_min) and (ii) supplies a practical numerical test for non-degeneracy based on the particle discretization already introduced in the paper: the empirical Hessian spectrum can be computed from the particle system and checked for a positive lower bound. This renders the rate operational under the stated non-degeneracy hypothesis. revision: partial
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper introduces WSFN as a regularized square-root preconditioned Newton method on the Wasserstein tangent space and derives polynomial-time saddle escape plus linear convergence under explicitly stated regularity conditions and benign landscape assumptions. These hypotheses are invoked as external inputs to the theorems rather than being constructed from the method's outputs or fitted parameters. No load-bearing step reduces by definition or self-citation to a quantity defined inside the paper; the second-order optimality conditions and escape analysis follow standard Riemannian geometry arguments adapted to Wasserstein space without circular reduction. The particle implementation is explicitly separated as a practical approximation outside the formal guarantees. The derivation remains self-contained against the stated external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- regularization parameter for Hessian square root
axioms (1)
- domain assumption Regularity and benign landscape assumptions on the functional
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
regularized square root of the squared Wasserstein Hessian... (H²_μn + βI)^{−1/2} ∇_μ F
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
second-order sufficient optimality condition... λ_min K_μ* > 0
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
NIST Digital Library of Mathematical Functions.https://dlmf.nist.gov/, Release 1.2.6 of 2026- 03-15. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds
work page 2026
-
[2]
A. Agazzi and J. Lu. Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime. InInternational Conference on Learning Representations, 2021
work page 2021
-
[3]
A. B. Aleksandrov and V. V. Peller. Operator Lipschitz functions.Russian Mathematical Surveys, 71(4):605, 2016
work page 2016
-
[4]
L. Ambrosio, N. Gigli, and G. Savare.Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics. ETH Zürich. Birkhäuser Basel, 2008
work page 2008
- [5]
-
[6]
Arveson.A Short Course on Spectral Theory
W. Arveson.A Short Course on Spectral Theory. Graduate Texts in Mathematics. Springer New York, 2001
work page 2001
-
[7]
D. M. Blei, A. Kucukelbir, and J. D. McAuliffe. Variational Inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017
work page 2017
- [8]
-
[9]
B. Bonnet. A Pontryagin Maximum Principle in Wasserstein spaces for constrained optimal control problems.ESAIM: Control, Optimisation and Calculus of Variations, 25, 2019
work page 2019
-
[10]
S. Boufadene and F.-X. Vialard. On the global convergence of Wasserstein gradient flow of the Coulomb discrepancy.SIAM Journal on Mathematical Analysis, 57(4):4556–4587, 2025
work page 2025
-
[11]
P. Cardaliaguet, F. Delarue, J. Lasry, and P. Lions.The Master Equation and the Convergence Problem in Mean Field Games. Annals of Mathematics Studies. Princeton University Press, 2019
work page 2019
-
[12]
R. A. Carmona and F. Delarue.Probabilistic Theory of Mean Field Games with Applications I: Mean Field FBSDEs, Control, and Games. Springer International Publishing, 2018
work page 2018
- [13]
-
[14]
L. Chizat. Mean-field Langevin dynamics: Exponential convergence and annealing.Transactions on Machine Learning Research, 2022
work page 2022
-
[15]
L. Chizat and F. R. Bach. On the global convergence of gradient descent for over-parameterized models using optimal transport. InNeurIPS, 2018
work page 2018
-
[16]
L. Chizat, M. Colombo, R. Colombo, and X. Fernández-Real. Quantitative convergence of Wasserstein gradient flows of Kernel Mean Discrepancies, 2026. arXiv:2603.01977
- [17]
-
[18]
C. Chu, J. Blanchet, and P. Glynn. Probability functional descent: A unifying perspective on GANs, Variational Inference, and Reinforcement Learning. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 1213–1222. PMLR, 09–15 Jun 2019
work page 2019
-
[19]
Conway.A Course in Functional Analysis
J. Conway.A Course in Functional Analysis. Graduate Texts in Mathematics. Springer New York, 1994
work page 1994
-
[20]
Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. InAdvances in Neural Information Processing Systems, 2014
work page 2014
-
[21]
E. B. Davies. Lipschitz continuity of functions of operators in the schatten classes.Journal of the London Mathematical Society, 37(1):148–157, 1988
work page 1988
-
[22]
R.M.Dudley.Real Analysis and Probability.CambridgeStudiesinAdvancedMathematics.Cambridge University Press, 2002
work page 2002
-
[23]
A. Figalli and F. Glaudo.An Invitation to Optimal Transport, Wasserstein Distances, and Gradient Flows. EMS Press, Berlin, 2023
work page 2023
-
[24]
R. Ge, F. Huang, C. Jin, and Y. Yuan. Escaping from saddle points – online stochastic gradient for tensor decomposition. InProceedings of The 28th Conference on Learning Theory, volume 40 of Proceedings of Machine Learning Research, pages 797–842. PMLR, 03–06 Jul 2015
work page 2015
-
[25]
S. Gustafson and I. Sigal.Mathematical Concepts of Quantum Mechanics. Universitext. Springer International Publishing, 2020
work page 2020
-
[26]
K. Hu, Z. Ren, D. Šiška, and Ł. Szpruch. Mean-field Langevin dynamics and energy landscape of neural networks.Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 57(4):2043 – 2065, 2021
work page 2043
-
[27]
Generative Modeling by Minimizing the Wasserstein-2 Loss
Y.-J. Huang and Z. Malik. Generative modeling by minimizing the Wasserstein-2 loss, 2024. arXiv:2406.13619
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan. How to escape saddle points efficiently. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1724–1732. PMLR, 06–11 Aug 2017
work page 2017
- [29]
-
[30]
Kato.Perturbation Theory for Linear Operators
T. Kato.Perturbation Theory for Linear Operators. Classics in Mathematics. Springer Berlin Heidelberg, 1995
work page 1995
- [31]
-
[32]
E. Kissin and V. S. Shulman. Classes of operator-smooth functions. I. operator-lipschitz functions. Proceedings of the Edinburgh Mathematical Society, 48(1):151–173, 2005
work page 2005
-
[33]
A. Korba, P.-C. Aubin-Frankowski, S. Majewski, and P. Ablin. Kernel Stein discrepancy descent. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021
work page 2021
-
[34]
M. Lambert, S. Chewi, F. R. Bach, S. Bonnabel, and P. Rigollet. Variational inference via Wasserstein gradient flows. InAdvances in Neural Information Processing Systems, volume 35, pages 14434–14447, 2022
work page 2022
-
[35]
N. Lanzetti, S. Bolognani, and F. Dörfler. First-order conditions for optimization in the Wasserstein space.SIAM Journal on Mathematics of Data Science, 7(1):274–300, 2025
work page 2025
-
[36]
R.-A. Lascu and M. B. Majka. Non-convex entropic mean-field optimization via Best Response flow. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[37]
Linear convergence of proximal descent schemes on the
R.-A. Lascu, M. B. Majka, D. Šiška, and Łukasz Szpruch. Linear convergence of proximal descent schemes on the Wasserstein space, 2024. arXiv:2411.15067
-
[38]
J.-M. Leahy, B. Kerimkulov, D. Šiška, and Ł. Szpruch. Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime. InProceedings of the 13 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 12222–12252. PMLR, 17–23 Jul 2022
work page 2022
-
[39]
Z. Li. SSRGD: Simple stochastic recursive gradient descent for escaping saddle points. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019
work page 2019
-
[40]
Y.Lu, C.Ma, J.Lu, andL.Ying.Amean-fieldanalysisofDeepResNetandBeyond: TowardsProvable Optimization Via Overparameterization From Depth. InProceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 6426–6436. PMLR, 2020
work page 2020
- [41]
-
[42]
S. Mei, A. Montanari, and P.-M. Nguyen. A mean field view of the landscape of two-layer neural networks.Proceedings of the National Academy of Sciences of the United States of America, 115:E7665 – E7671, 2018
work page 2018
-
[43]
A. Nitanda, D. Wu, and T. Suzuki. Convex Analysis of the Mean Field Langevin Dynamics.arXiv e-prints, page arXiv:2201.10469, Jan. 2022
-
[44]
F. Otto and C. Villani. Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality.Journal of Functional Analysis, 173(2):361–400, 2000
work page 2000
-
[45]
C. E. Rasmussen and C. K. I. Williams.Gaussian Processes for Machine Learning. The MIT Press, 2005
work page 2005
-
[46]
M. Reed and B. Simon.Methods of Modern Mathematical Physics. I: Functional Analysis. Academic Press, New York, 1980
work page 1980
-
[47]
G. Rotskoff and E. Vanden-Eijnden. Trainability and accuracy of artificial neural networks: An interacting particle system approach.Communications on Pure and Applied Mathematics, 75(9):1889– 1935, 2022
work page 1935
- [48]
-
[49]
F. Santambrogio.Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Progress in Nonlinear Differential Equations and Their Applications. Springer International Publishing, 2015
work page 2015
-
[50]
J. Sirignano and K. Spiliopoulos. Mean field analysis of neural networks: A central limit theorem. Stochastic Processes and their Applications, 130(3):1820–1852, 2020
work page 2020
-
[51]
Villani.Topics in optimal transportation
C. Villani.Topics in optimal transportation. Graduate studies in mathematics. American Mathemati- cal Society, 2003
work page 2003
-
[52]
Villani.Optimal Transport: Old and New
C. Villani.Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2008
work page 2008
-
[53]
Y. Wang, P. Chen, and W. Li. Projected Wasserstein gradient descent for high-dimensional Bayesian inference.SIAM/ASA Journal on Uncertainty Quantification, 10(4):1513–1532, 2022
work page 2022
-
[54]
Y. Wang and W. Li. Information Newton’s flow: second-order optimization method in probability space, 2020. arXiv:2001.04341
- [55]
-
[56]
K. Yamamoto, K. Oko, Z. Yang, and T. Suzuki. Mean field Langevin actor-critic: Faster convergence and global optimality beyond lazy learning. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 55706–55738. PMLR, 21–27 Jul 2024
work page 2024
-
[57]
N. Yamamoto, J. Kim, and T. Suzuki. Hessian-guided perturbed Wasserstein gradient flows for escaping saddle points. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[58]
R. Yao, X. Chen, and Y. Yang. Wasserstein proximal coordinate gradient algorithms.Journal of Machine Learning Research, 25(269):1–66, 2024
work page 2024
- [59]
-
[60]
Convergence Analysis of the Wasserstein Proximal Algorithm beyond Geodesic Convexity
S. Zhu and X. Chen. Convergence analysis of the Wasserstein proximal algorithm beyond geodesic convexity, 2025. arXiv:2501.14993. 14 6.Appendix The appendices are organized to guide the reader from motivating examples and implementation to background material and, finally, the technical analysis. Appendix A presents benign non-convex objectives on Wassers...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[61]
Introduction . . . . . . . . . . . . . . . . . . . . . . . .1 1.1. Non-convex optimization on Wasserstein space . . . . . . . . . . . . . . . 1 1.2. Why first-order methods and Wasserstein Newton are insufficient . . . . . . 2 1.3. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
-
[62]
Wasserstein geometry for second-order optimization . . . . . .3 2.1. Problem setup and minimal notation . . . . . . . . . . . . . . . . . . . 3 2.2. Wasserstein Hessian along transport curves . . . . . . . . . . . . . . . . 4 2.3. Why Wasserstein Newton fails near saddles . . . . . . . . . . . . . . . . 5
-
[63]
Wasserstein Saddle-Free Newton . . . . . . . . . . . . . . .6 3.1. From Newton to saddle-free preconditioning . . . . . . . . . . . . . . . 6 3.2. Regularized WSFN update . . . . . . . . . . . . . . . . . . . . . . . 7 3.3. Second-order structure in the perturbation and preconditioner . . . . . . . 8
-
[64]
Theoretical guarantees . . . . . . . . . . . . . . . . . . . .9 4.1. Second-order optimality and landscape assumptions . . . . . . . . . . . . 9 4.2. Global convergence to a neighborhood of a global minimizer . . . . . . . . 10 4.3. Local linear convergence to a non-degenerate global minimizer . . . . . . . . 11
-
[65]
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .12 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .12 References . . . . . . . . . . . . . . . . . . . . . . . . . . .12
-
[66]
Appendix . . . . . . . . . . . . . . . . . . . . . . . . .15 Appendix A. Examples of benign objectives on Wasserstein space .16 Appendix B. Particle Implementation of WSFN . . . . . . . . .18 B.1. Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . 19 Appendix C. Additional notation for operators onL2 µ . . . . . . .22 15 Appendix D.L ...
-
[67]
By [30, Problem 2.35], ⟨Hµ v, v⟩L2µ ≤ ⟨|H µ|v, v⟩ L2µ, for anyv∈L 2 µ. Using the property of adjoint,(AB)∗ =B ∗A∗, we get (H2 µ)∗ = (Hµ Hµ)∗ = H∗ µ H∗ µ = Hµ Hµ = H2 µ, hence H2 µ is self-adjoint. The identity operatorId×d is trivially self-adjoint and the sum of two self-adjoint operators is self-adjoint. Thus,H2 µ +βI d×d is self-adjoint. Furthermore, b...
-
[68]
Summing fromk= 0tok=n−1yields F(µ 0)−F(µ n)≥ τ √β 2 n−1X k=0 ∥(H2 µk +βI d×d)− 1 2 ∇µF(µ k)∥2 L2 µk
Therefore, using thatH2 µk +βI d×d ⪰βI d×d implies( H2 µk +βI d×d)− 1 2 ⪯ β− 1 2 Id×d gives ∥(H2 µk +βI d×d)− 1 2 ∇µF(µ k)∥2 L2 µk ≤ 1√β D ∇µF(µ k),(H 2 µk +βI d×d)− 1 2 ∇µF(µ k) E L2 µk , Therefore, usingτ≤ √β(CM +C K)−1 gives F(µ k+1)≤F(µ k)−τ p β∥(H2 µk +βI d×d)− 1 2 ∇µF(µ k)∥2 L2 µk + τ √β 2 ∥(H2 µk +βI d×d)− 1 2 ∇µF(µ k)∥2 L2 µk =F(µ k)− τ √β 2 ∥(H2 ...
-
[69]
Using the Taylor expansionlog(1 + x) = Θ(x)2, for small enoughx > 0, we see thatnout = ˜O(˜δ−1)
To do so, we need to study the asymptotic behavior of the parameters η, nout and F0 as δ→ 0(which implies ˜δ→ 0), treating τ, κ,|c|, β, L H, CH and RF as fixed constants. Using the Taylor expansionlog(1 + x) = Θ(x)2, for small enoughx > 0, we see thatnout = ˜O(˜δ−1). Consequently, (38) implies F0 = ˜O(˜δ3). From ε RF√β + 2LH πβ ≤ ˜δ 3 2, we haveε = O(˜δ 3...
-
[70]
Multiplying∆by noutτ and substituting our definition ofF 1/2 0 into q τ nout β F 1/2 0 yields noutτ∆ = 4L H 1 2√β + 2CH πβ (τ nout)3/2 √β F 1/2 0 + 4LH 1 2√β + 2CH πβ τ noutηκ +ετ n out RF√β + 2LH πβ = 1 3 log 3 2 + ˜O(˜δ 1 2 ) + ˜O(˜δ). For sufficiently smallδ, the higher-order terms can be upper bounded by1 3 log 3 2 guaran- teeing thatn outτ∆≤log 3 2. ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.