Gradient flow reaches global minima for infinite-depth transformers
Training Infinitely Deep and Wide Transformers
when initial loss is small and log-sum-exp functions remain linearly independent modulo affine functions
Optimization and Control
Operations research, linear programming, control theory, systems theory, optimal control, game theory
Training Infinitely Deep and Wide Transformers
when initial loss is small and log-sum-exp functions remain linearly independent modulo affine functions
Finite-dimensional Bregman arguments establish convergence at critical damping and improved rates for stronger damping.
On the Nature of Regularity Assumptions in Bilevel Optimization with Constrained Lower-level Problem
Structural invariants cannot be made consistent by small perturbations, yet the conditions hold almost everywhere after generic random ones.
full image
Geometric Asymptotics of Score Mixing and Guidance in Diffusion Models
Small-time dynamics governed by weighted squared distances to data supports, for both mixture and amplified guidance
full image