pith. sign in

arXiv preprint arXiv:1805.00915 , year=

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

verdicts

UNVERDICTED 7

roles

background 1

polarities

background 1

representative citing papers

Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks

stat.ML · 2026-05-21 · unverdicted · novelty 7.0

Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.

Mirror Descent-Ascent for mean-field min-max problems

math.OC · 2024-02-12 · unverdicted · novelty 7.0

Establishes O(N^{-1/2}) convergence for simultaneous MDA and O(N^{-2/3}) for alternating MDA to mixed Nash equilibria in mean-field convex-concave min-max problems via dual-space Bregman analysis.

Mean-field limit of particle systems with absorption

math.PR · 2023-10-16 · unverdicted · novelty 6.0

Proves mean-field limit and propagation of chaos for 1D particle systems with singular absorption interactions using Girsanov transforms, tightness, and analysis of the nonlinear Fokker-Planck equation.

Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks

cs.LG · 2019-06-30 · unverdicted · novelty 6.0

Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.

citing papers explorer

Showing 7 of 7 citing papers.

  • Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks stat.ML · 2026-05-21 · unverdicted · none · ref 12

    Finite-width shallow networks remain within poly(d) m^{-min(1,c/6)} of their mean-field limit uniformly in time when mean-field excess loss decays as t^{-c} under standard regularity and an integral condition on the loss.

  • Mirror Descent-Ascent for mean-field min-max problems math.OC · 2024-02-12 · unverdicted · none · ref 36

    Establishes O(N^{-1/2}) convergence for simultaneous MDA and O(N^{-2/3}) for alternating MDA to mixed Nash equilibria in mean-field convex-concave min-max problems via dual-space Bregman analysis.

  • A unified perspective on fine-tuning and sampling with diffusion and flow models stat.ML · 2026-04-30 · unverdicted · none · ref 133

    A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses with new Crooks and Jarzynski identities.

  • Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 211

    SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

  • Mean-field limit of particle systems with absorption math.PR · 2023-10-16 · unverdicted · none · ref 27

    Proves mean-field limit and propagation of chaos for 1D particle systems with singular absorption interactions using Girsanov transforms, tightness, and analysis of the nonlinear Fokker-Planck equation.

  • Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks cs.LG · 2019-06-30 · unverdicted · none · ref 52

    Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.

  • Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks stat.ML · 2025-11-04 · unverdicted · none · ref 26

    At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.