arXiv preprint arXiv:1805.00915 , year=

· 2018 · arXiv 1805.00915

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks

stat.ML · 2026-05-21 · unverdicted · novelty 7.0

Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.

Mirror Descent-Ascent for mean-field min-max problems

math.OC · 2024-02-12 · unverdicted · novelty 7.0

Establishes O(N^{-1/2}) convergence for simultaneous MDA and O(N^{-2/3}) for alternating MDA to mixed Nash equilibria in mean-field convex-concave min-max problems via dual-space Bregman analysis.

A unified perspective on fine-tuning and sampling with diffusion and flow models

stat.ML · 2026-04-30 · unverdicted · novelty 6.0

A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses with new Crooks and Jarzynski identities.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

Mean-field limit of particle systems with absorption

math.PR · 2023-10-16 · unverdicted · novelty 6.0

Proves mean-field limit and propagation of chaos for 1D particle systems with singular absorption interactions using Girsanov transforms, tightness, and analysis of the nonlinear Fokker-Planck equation.

Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks

cs.LG · 2019-06-30 · unverdicted · novelty 6.0

Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.

Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

stat.ML · 2025-11-04 · unverdicted · novelty 5.0

At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.

citing papers explorer

Showing 7 of 7 citing papers.

Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks stat.ML · 2026-05-21 · unverdicted · none · ref 12
Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.
Mirror Descent-Ascent for mean-field min-max problems math.OC · 2024-02-12 · unverdicted · none · ref 36
Establishes O(N^{-1/2}) convergence for simultaneous MDA and O(N^{-2/3}) for alternating MDA to mixed Nash equilibria in mean-field convex-concave min-max problems via dual-space Bregman analysis.
A unified perspective on fine-tuning and sampling with diffusion and flow models stat.ML · 2026-04-30 · unverdicted · none · ref 133
A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses with new Crooks and Jarzynski identities.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 211
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
Mean-field limit of particle systems with absorption math.PR · 2023-10-16 · unverdicted · none · ref 27
Proves mean-field limit and propagation of chaos for 1D particle systems with singular absorption interactions using Girsanov transforms, tightness, and analysis of the nonlinear Fokker-Planck equation.
Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks cs.LG · 2019-06-30 · unverdicted · none · ref 52
Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks stat.ML · 2025-11-04 · unverdicted · none · ref 26
At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.

arXiv preprint arXiv:1805.00915 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer