Introduces graph-to-image prediction of per-node dynamic stability landscapes in oscillator networks from topology, releases two 10k-graph datasets, and shows GNN-CNN models achieve good accuracy with cross-size generalization.
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension.
verdicts
UNVERDICTED 4representative citing papers
Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.
At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.
Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.
citing papers explorer
-
Learning Dynamic Stability Landscapes in Synchronization Networks
Introduces graph-to-image prediction of per-node dynamic stability landscapes in oscillator networks from topology, releases two 10k-graph datasets, and shows GNN-CNN models achieve good accuracy with cross-size generalization.
-
Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data
Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.
-
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.
-
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with piecewise affine activation
Batch gradient descent achieves linear convergence to zero MSE with high probability for sufficiently wide shallow NNs with non-affine piecewise affine activations and distinct inputs.